Front-End Web & Mobile

Announcing Amazon Cognito Streams

On 1/20 we released a feature that gives developers, utilizing their credentials, full API access to the sync store to read and write user profile data as well as a data browser inside the Amazon Cognito console.

Today we are excited to announce a new feature that gives customers even greater control and insight into their data stored in Cognito: Amazon Cognito streams. Customers can now configure an Amazon Kinesis stream to receive events as data is updated and synchronized. In this blog post, I’ll explain how this feature works, and I’ll show an example application that consumes the events from the stream to build a view of your application data in Amazon Redshift.

Configuring Streams

Configuring Amazon Cognito streams is extremely straightforward. From the console, simply select your identity pool and select “Edit Identity Pool”.

Edit Identity Pool

From the edit screen, expand the “Cognito Streams” section to configure your options. You will need to supply an IAM role and a Kinesis stream, but the Cognito console can walk you through the creation of both of these resources.

Edit Identity Pool

After you’ve successfully configured Amazon Cognito streams, all subsequent updates to datasets in this identity pool will be sent to the stream.

Stream Contents

Each record sent to the stream represents a single synchronization. Here is an example of a record sent to the stream:

{
  "identityPoolId" : "Pool Id"
  "identityId" : "Identity Id "
  "dataSetName" : "Dataset Name"
  "operation" : "(replace|remove)"
  "kinesisSyncRecords" : [
    { 
      "key" : "Key",
      "value" : "Value",
      "syncCount" : 1,
      "lastModifiedDate" : 1424801824343,
      "deviceLastModifiedDate" : 1424801824343,
      "op": "(replace|remove)"
    },
    ...
  ],
  "lastModifiedDate": 1424801824343,
  "kinesisSyncRecordsURL": "S3Url",
  "payloadType" : "(S3Url|Inline)",
  "syncCount" : 1
 }

For updates that are larger than the Kinesis maximum payload size of 50 KB, a presigned Amazon S3 URL will be included that contains the full contents of the update.

Now that you have updates to your data streaming, what about your existing data?

Bulk Publishing

Once you have configured Amazon Cognito streams, you will be able to execute a bulk publish operation for the existing data in your identity pool. After you initiate a bulk publish operation, either via the console or directly via the API, Cognito will start publishing this data to the same stream that is receiving your updates.

You are limited to one ongoing bulk publish operation at any given time and to one successful bulk publish request every 24 hours.

Cognito does not guarantee uniqueness of data sent to the stream when using the bulk publish operation. You may receive the same update both as an update as well as part of a bulk publish. Keep this in mind when processing the records from your stream.

Example Streams Connector for Amazon Redshift

In today’s launch, we are also including an example application that will consume records from a Kinesis stream associated with a Cognito identity pool and then store them in an Amazon Redshift cluster for querying. The source code is available in our awslabs GitHub repo, and we’ve also made an AWS CloudFormation template available that will create all the necessary assets for this sample, including:

  • Amazon Redshift cluster
  • Amazon DynamoDB table used by the Kinesis client library
  • Amazon S3 bucket for intermediate staging of data
  • IAM role for EC2
  • Elastic Beanstalk application to run the code

Click the button below to launch this stack in the US East (Virginia) region:

image

Click the button below to launch this stack in the EU (Ireland) region:

image

Click the button below to launch this stack in the Asia Pacific (Tokyo) region:

image

Once your stack has been created, the output tab in the CloudFormation console will contain a JDBC connection string you can use to connect directly to your Amazon Redshift cluster:

jdbc:postgresql://amazoncognitostreamssample-redshiftcluster-xxxxxxxx.xxxxxxxx.REGION.redshift.amazonaws.com:PORT/cognito?tcpKeepAlive=true

Schema

The example stores all event data in a table called cognito_raw_data with the following schema:

Column Name Type
identityPoolId varchar(1024)
identityId varchar(1024)
datasetName varchar(1024)
operation varchar(64)
key varchar(1024)
value varchar(1024)
op varchar(64)
syncCount int
deviceLastModifiedDate timestamp
lastModifiedDate timestamp

Extracting Data

As every key-value update will be a new row in the cognito_raw_data table, getting the current state of a dataset takes some additional effort. The following query will get the state of specific dataset for a given user:

SELECT distinct temp.*, value FROM 
(select distinct identityid, 
        datasetname, 
        key,
        max(synccount) over (partition by identityid, datasetname, key) as max_synccount 
FROM cognito_raw_data) as temp
INNER JOIN cognito_raw_data raw_data ON 
        (temp.identityid = raw_data.identityid and temp.datasetname = raw_data.datasetname and temp.key = raw_data.key and temp.max_synccount = raw_data.synccount)
WHERE raw_data.identityid = 'IDENTITY_ID' 
        AND raw_data.datasetname = 'DATASET_NAME'
        AND op <> 'remove' 
ORDER by datasetname, key

You may want to set up daily extracts of the data to make your regular queries more efficent.

Conclusions

As you can see, Amazon Cognito streams can be used to give you a complete export of your data as well as a live view of how your data changes over time. We’d love to hear how you plan to use this feature in your applications. Please feel free to leave a comment to share other uses for this feature.

If you encounter any issues or have further comments or questions, please visit our forums and we’ll be happy to assist you.