Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

28
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Presenter: Vyom Nagrani, Sr. Product Manager, AWS Lambda Q&A Moderator: Ajay Nair, Sr. Product Manager, AWS Lambda July 30 th , 2015 Best Practices: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Transcript of Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Page 1: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Presenter: Vyom Nagrani, Sr. Product Manager, AWS LambdaQ&A Moderator: Ajay Nair, Sr. Product Manager, AWS Lambda

July 30th, 2015

Best Practices: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Page 2: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Amazon DynamoDB Streams – time-ordered sequence of item-level changes• Time and partition ordered log

• Provides a stream of inserts, deletes, updates• Old item• New item• Primary key• Change type

• Stream items delivered exactly once

• Streams are asynchronous

• Scales with your table

DynamoDB DynamoDB Streams

Page 3: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Benefits of DynamoDB Streams for real-time data processing

Durability & high availability• High throughput consensus protocol• Replicated across multiple AZs

Managed streams• Simply enable streaming

Performance• Designed for sub-second latency

Native integration with AWS Lambda• DynamoDB Triggers invoke a Lambda

function to run your custom code

DynamoDB DynamoDB Streams

DynamoDB Triggers

Lambda function

Run custom code

Page 4: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

AWS Lambda: A compute service that runs your code in response to events

Lambda functions: Stateless, trigger-based code execution

Triggered by events:• Direct Sync and Async invocations • Put to an Amazon S3 bucket• Table update on Amazon DynamoDB• And many more …

Makes it easy to• Build back-end services that perform at scale • Perform data-driven auditing, analysis, and notification

Page 5: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

High performance at any scale; Cost-effective and efficient

No Infrastructure to manage

Pay only for what you use: Lambda automatically matches capacity to

your request rate. Purchase compute in 100ms increments.

Bring Your Own Code

“Productivity focused compute platform to build powerful, dynamic, modular applications in the cloud”

Run code in a choice of standard languages. Use threads, processes,

files, and shell scripts normally.

Focus on business logic, not infrastructure. You upload code; AWS

Lambda handles everything else.

Benefits of AWS Lambda for building a server-less data processing engine

1 2 3

Page 6: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

DynamoDB Streams + Lambda = Database Triggers

Run multiple real time applications in parallel• DynamoDB Streams natively supports Cross Region Replication• Triggers enables Filtering, Monitoring, Auditing, Notifications, Aggregation, etc.

• No charge for reads/polls that your AWS Lambda function makes to the DynamoDB

Stream associated with the table

Page 7: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Walkthrough of a simple stream logging application workflow

Streams

Amazon DynamoDB

AWS Lambda Amazon CloudWatch Logs

New table updates

Page 8: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console

Page 9: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console

Page 10: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console

Page 11: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console

Page 12: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console

Page 13: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console

Page 14: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console

Page 15: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console

Page 16: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console

Page 17: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console

Page 18: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Walkthrough of setting up DynamoDB Triggers and Lambda functions through the AWS Console

Page 19: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Today’s demo: Workflow of cross-region replication and real-time data auditing

Original Table Data Stream

Amazon DynamoDB

AWS Lambda

Amazon DynamoDB

Amazon SNS

Audit notification

Cross region

replication

Page 20: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Loop through event array

Replicate item to different table

Send notification if suspicious record

In both cases, wait for callbacks before exiting

Page 21: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Demo: Cross region replication and real-time data auditing using Amazon DynamoDB and AWS Lambda

Page 22: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Attaching Lambda functions to DynamoDB Streams

• Automatic Shards: One Lambda function concurrently invoked per DynamoDB shard

• Each individual shard follows ordered processing

• A given key will be present in at most one concurrently active shard

• All changes (insert, remove, modify) available for a rolling 24-hour basis

… …Source

DynamoDB Streams

Destination 1

Lambda

Destination 2

Pollers FunctionsShards

Lambda will scale automaticallyDynamoDB Streams scales by grouping records into shards

Page 23: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Attaching Lambda functions to DynamoDB Streams

• Reading the stream: Stream is exposed via the familiar

Amazon Kinesis Client Library interface

• Read the stream using https://github.com/awslabs/dynamodb-streams-kinesis-adapter

• Records can be retrieved at ~2x rate of the table’s provisioned write capacity

• Automatic Scaling: Both Dynamo DB and Lambda scale automatically with PUT rates

• Default limit of 100 concurrent Lambda functions, can be increased by AWS Support Center

Page 24: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Performance tuning DynamoDB as an event source

• Batch size: Max records that AWS

Lambda will retrieve from DynamoDB at

the time of invoking your function

• Increasing batch size will cause fewer

Lambda function invocations with more

data processed per function

• Starting Position: The position in the

stream where Lambda starts reading

• Set to “Trim Horizon” for starting with

oldest record

• Set to “Latest” for starting with most

recent data

Page 25: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Best practices for creating Lambda functions

• Memory: CPU proportional to the memory configured

• Increasing memory makes your code execute faster (if CPU bound)

• Timeout: Increasing timeout allows for longer functions, but more wait in case of errors

• Retries: For DynamoDB Streams, Lambda has unlimited retries (until data expires)

• Permission model: Lambda pulls data from DynamoDB, so no resource policy needed,

only execution role to allow Lambda access to DynamoDB

Page 26: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Monitoring and Debugging Lambda functions

• Console Dashboard• Lists all Lambda functions• Easy editing of resources,

event sources and other settings

• At-a-glance metrics

• Metrics in CloudWatch• Requests• Errors• Latency• Throttles

• Logging in CloudWatch Logs

Page 27: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Three Next Steps

1. Enable DynamoDB Streams for your existing DynamoDB tables. DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table in the last 24 hours.

2. Create and test your first Lambda function. With AWS Lambda, there are no new languages, tools, or frameworks to learn. You can use any third party library, even native ones.

3. Use AWS Lambda with DynamoDB Streams to create DynamoDB Triggers … no infrastructure to manage, and setup a clean and lightweight implementation of database triggers, NoSQL style!

Page 28: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda

Thank you!

Visit http://aws.amazon.com/dynamodb, the AWS blog, and the DynamoDB forum to learn more and get started using DynamoDB.

Visit http://aws.amazon.com/lambda, the AWS Compute blog, and the Lambda forum to learn more and get started using Lambda.