Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

39
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. ARC301 - Controlling the Flood: Massive Message Processing with Amazon SQS and Amazon DynamoDB Ari Dias Neto, Ecosystem Solution Architect November 14, 2013

description

Amazon Simple Queue Service (SQS) and Amazon DynamoDB build together a really fast, reliable and scalable layer to receive and process high volumes of messages based on its distributed and high available architecture. We propose a full system that would handle any volume of data or level of throughput, without losing messages or requiring other services to be always available. Also, it enables applications to process messages asynchronously and includes more compute resources based on the number of messages enqueued. The whole architecture helps applications reach predefined SLAs as we can add more workers to improve the whole performance. In addition, it decreases the total costs because we use new workers briefly and only when they are required.

Transcript of Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Page 1: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

ARC301 - Controlling the Flood: Massive Message

Processing with Amazon SQS and Amazon DynamoDB

Ari Dias Neto, Ecosystem Solution Architect

November 14, 2013

Page 2: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Who am I?

• The Mailman from Brazil – Delivering messages around the world!

Page 3: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Returning all the

messages…

Page 4: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

How many Mailmen?

When?

How long?

Page 5: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Who am I?

We are going to design and build an application to handle any volume of messages! Right now!

What are we going to do?

Ari Dias Neto – Ecosystem Solutions Architect

Page 6: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Scenario – Super Bowl

Promotion: who is going to win?

Page 7: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Promotion

• We cannot lose any message

• We need to process all the valid messages

• Log all the invalid messages and errors

• Beautiful dashboard at the end

• We must process all the messages during the event!

Requirements

• Subscription based on SMS – Cellphone number is the key

Page 8: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Who is going to be in the front-line?

Fast!

Scalable!

Reliable!

Simple!

Fully managed!

Page 9: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Amazon Simple Queue Service

SQS

Fully Managed

Queue Service

Page 10: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Any volume of data

At any level of throughput

Page 11: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

We cannot lose any message

Page 12: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

No up-front or

fixed expenses

Page 13: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Architecture – Starting with SQS

SQS

We have received all

the messages

Now we need to

process all of it

Page 14: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Architecture – Amazon EC2 Instances

SQS

BUT!

how many

Instances?

Page 15: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Architecture – Multithread application

Reduce

the costs

and

increase

performance EC2

Instances

Page 16: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Architecture

SQS

Threads Workers

Page 17: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

But how many instances do we need?

EC2

m1.xlarge

01 instance 100k

msgs/minute

10 instances 1M

msgs/minute

10 instances 5M messages

5 minutes

Page 18: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Architecture

SQS

Auto Scaling

Group

Auto Scaling

based on the

number of msgs

in the queue

Page 19: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Architecture

SQS

Auto Scaling

Group

Where should

we save all the

messages?

High

Throughput

Needed

Page 20: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Amazon DynamoDB

Page 21: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

DynamoDB

valid-votes

invalid-votes

Two tables…

Page 22: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Architecture

SQS

Auto scaling Group

valid-votes

invalid-votes

Page 23: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

The Dashboard

Page 24: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Final Architecture

SQS DynamoDB

Auto Scaling Group

Workers

Web

Dashboard

AWS Elastic

Beanstalk Container

Page 25: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Benefits

• Ready for any level of throughput

• SQS

• Ready for any required SLA

• Auto Scaling and EC2

• Low Cost

• Fully managed queue service

• Infrastructure is based on the required SLA

• Infrastructure needed for an small period of time

Page 26: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

The challenge!

Process all the

messages from the

queue

in 10 minutes!

Page 27: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Let’s go deep!

Let's code!

Page 28: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Each thread

DynamoDB

Queue

Queue

Connect to SQS queue

Read up to 10 msgs

Validate each

message

Save as valid or invalid

Set “read” in

the queue

Page 29: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Each thread

DynamoDB

Queue

Queue

Connect to SQS queue

Read up to 10 msgs

Validate each

message

Save as valid or invalid

Set “read” in

the queue

Page 30: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Steps to deploy it on AWS

Create the queue. Queue name: votes

Upload application to S3: s3-sa-east-1.amazonaws.com/arineto/processor.jar

Create launch configuration

Create AMI with JRE. Image ID: ami-05355a6c

Create Auto Scaling Group

Create alarms

Launch it!

✔ ✔ ✔

Create bootstrap script: userdata.txt

Page 31: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013
Page 32: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

The Company

• BigData Corp. was founded to help companies

solve the challenges associated with big data,

from collection to processing to information and

knowledge extraction.

Page 33: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

The Challenge

• “How many e-commerce websites exist in your

continent? Can we monitor them on a consistent

basis?” – Build a crawling process that can answer this question in a cost

effective and speedy manner.

Page 34: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Architecture

• Spot Instances + SQS + S3 = Magic – Spot Instances allow us to optimize processing costs

– Amazon SQS allows us to orchestrate the process in a

distributed and asynchronous manner

– Amazon Simple Storage Service (S3) facilitates the storage of

intermediate and final processing results

Page 35: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Main Workers

Execute

crawling and

process data

Maestro

(reserved

instance)

List of crawl

URLs

Spot Instances

Secondary Workers

(queue listeners)

Reprocess

data, query

additional

services, store

data on

MongoDB

Spot Instances

Secondary

work queues –

processed data

MongoDB

cluster

Command and

Control Queue

Architecture

Page 36: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Architecture (3)

• Message Volumes – Processing starts by uploading 10MM+ messages

– Each processed message may generate up to 10 new

intermediate messages

– Peak processing of 70K messages / second

• Command & Control Queue – This queue enables us to adjust processing as we go and

request status checks from instances

Page 37: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Results (1)

$-

$100,000.00

$200,000.00

$300,000.00

$400,000.00

$500,000.00

$600,000.00

$700,000.00

$800,000.00

$900,000.00

0 1 2 3 4 5 6 7 8 9 10 11 12

Estimated cost without AWS Cost with AWS

Page 38: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Results(2)

2+ PB of data processed

40+ Bi web pages visited and parsed

500+ services and technologies mapped

A complete new view of the web market

Page 39: Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

ARC301