Stream processing on AWS

download Stream processing on AWS

If you can't read please download the document

Transcript of Stream processing on AWS

Stream processing on AWS

Stream processing on AWS

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.http://cdn.oreillystatic.com/en/assets/1/event/144/Building%20a%20scalable%20architecture%20for%20processing%20streaming%20data%20on%20AWS%20Presentation.pdf1

Stream processing

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.Stream processingAWSStream processing

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.Stream processing

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.Stream processing

1 KB * 10K record/s * 365 days= 300 TB / yearBig Data

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.Batch processingCVRStream processing

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.Stream processing

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sliding windowSliding windowTreat events as a window 30 seconds windowSlide the window Every 20 seconds

110http://spark.apache.org/docs/latest/streaming-programming-guide.html

Window length: 3The duration of the windowSliding interval: 2The interval at which the window operation is performed

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

8

Stream processingStream processing

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.Stream processing

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Amazon Glacier

Amazon S3

Amazon DynamoDB

Amazon RDS

Amazon EMRAmazon Redshift

Amazon Kinesis Amazon Kinesis-enabled app

AWS LambdaAmazon ML

Amazon SQS

AmazonElastiCache

DynamoDBStreams

Amazon Elasticsearch Service

AWS IoT

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.Data > store > Process > Store > Process > Answers + + AWS

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.Stream processing

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved. ()

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

APIDBIoTData Source

Stream

Files

DisksAWS Direct ConnectMobile AppsWeb AppsAWS Import/Export

Logging

Data Centers

Sensors &IoTDevices

CloudWatchSnowball

MessagesMessagingAWS IoTAWS Direct Connect

MessageTransactions

HotCold

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

KVSStreamingMessaging, QueueStorageData SourceCollect / Store

TransactionsStream

Files

Messages

DisksAmazon Kinesis Streams

Amazon Kinesis Firehose

Amazon DynamoDB Streams

Amazon SQS

ApacheKafka

AWS Direct ConnectSnowball

Mobile AppsWeb AppsAWS Import/Export

Messaging

Logging

Data Centers

AWS IoTDevicesSensors

CloudWatch

HotColdAmazon S3

Events

WarmHotHot

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

AmazonDynamoDB StreamsAmazonKinesisStreamsAmazonKinesis FirehoseApacheKafkaAmazonSQSAmazon S3 Event NotificationsAWS Managed ServiceYesYesYesNoYesYesGuaranteed OrderingYes Yes YesYesNoNoDeliveryexactly-onceat-least-onceexactly-onceat-least-onceat-least-onceat-least-onceData Retention Period24 hours7 daysN/AConfigurable14 days24 hours retry Availability3 AZ3 AZ3 AZConfigurable3 AZ3 AZScale / ThroughputNo limit /~ Table IOPSNo Limit /~ ShardsNo limit /AutomaticNo limit /~ NodesNo Limits /AutomaticNo LimitsParallel Clients YesYesNoYesNoNoStream MapReduceYesYesN/AYesN/AN/ARecord/Object size400KB1MBAmazon Redshift row sizeConfigurable256KB5TB / S3 ObjectCostHigher (table cost) LowLowLow (+admin)Low-MediumLow

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

ServerlessStream processingDistributed computingJob chainData SourceCollect / StoreProcessAmazon KinesisAppsAWS Lambda

TransactionsStream

Files

DisksAmazon Kinesis Analytics

Amazon Kinesis Streams

Amazon Kinesis Firehose

Amazon DynamoDB Streams

ApacheKafka

AWS Direct Connect

Mobile AppsWeb AppsAWS Import/Export

Logging

Data Centers

Sensors &IoTDevices

CloudWatchSnowball

MessagesAmazon SQS

MessagingAWS IoTAWS Direct Connect

MessageAmazon S3

Events

Streaming

EMR

SQS Apps

EC2EC2

Fast

Fast

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Spark StreamingApache StormKinesis KCL ApplicationAWS LambdaAmazon SQS Client AppsScale~ Nodes~ Nodes~ NodesAutomatic~ NodesMicro-Batch or Real-timeMicro-batchReal-timeNear-real-timeNear-real-timeNear-real-timeAWS Managed ServiceYes (EMR)No (EC2)No (KCL + EC2 + Auto Scaling)YesNo (EC2 + Auto Scaling)ScalabilityNo Limits ~ NodesNo Limits~ NodesNo Limits~ NodesNo LimitsNo LimitsAvailabilitySingle AZConfigurableMulti-AZMulti-AZMulti-AZProgramming languagesJava, Python, ScalaAny language via ThriftJava, via MultiLang Daemon (.NET, Python, Ruby, Node.js)Node.js, Java, PythonAWS SDK languages (Java, .NET, Python, )

FastFastFast

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

CacheKVSRDBSearchStorage

DWHDistributedMLProcessAnalyzeStoreAmazon EMRAmazon MLAmazon RDSAmazon DynamoDBAmazon ElastiCacheData SourceCollect /StoreAmazon KinesisAppsAWS Lambda

TransactionsStream

Files

Disks

Amazon S3

Amazon Kinesis Analytics (new)

Amazon Kinesis Streams

Amazon Kinesis Firehose

Amazon DynamoDB Streams

ApacheKafka

AWS Direct ConnectAmazon Elasticsearch Service

Search SQLNoSQLCacheML

Amazon Redshift

DWFileHadoop & Spark

Mobile AppsWeb AppsAWS Import/Export

Logging

Data Centers

Sensors &IoTDevices

CloudWatchSnowball

SQS Apps

MessagesAmazon SQS

MessagingAWS IoTAWS Direct Connect

MessageAmazon S3

Events

Streaming

EMREC2EC2

HotWarm

FastSlow

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Amazon ElastiCacheAmazonDynamoDBAmazonRDS/AuroraAmazonElasticsearchAmazon S3Amazon GlacierAverage latencymsmsms, secms,secms,sec,min(~ size)hrsTypicaldata storedGBGBTBs(no limit)GBTB(64 TB Max)GBTBMBPB(no limit)GBPB(no limit)Typicalitem sizeB-KBKB(400 KB max)KB(64 KB max)KB(2 GB max)KB-TB(5 TB max)GB(40 TB max)Request RateHigh - Very HighVery High(no limit)HighHighLow High(no limit)Very Low

Storage cost GB/month$$/10DurabilityLow - ModerateVery HighVery HighHighVery HighVery HighAvailabilityHigh2 AZVery High 3 AZVery High3 AZHigh2 AZVery High3 AZVery High3 AZ

Hot DataWarm DataCold Data

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

HotWarm

Fast

Fast

ProcessAnalyzeStoreAmazon EMR

Amazon ML

Amazon RDSAmazon DynamoDBAmazon ElastiCache

Data SourceCollect /StoreConsumeAmazon KinesisAppsAWS Lambda

TransactionsStream

Files

Disks

Amazon S3

Amazon Kinesis newAnalytics (new)

Amazon Kinesis Streams

Amazon Kinesis Firehose

Amazon DynamoDB Streams

Amazon QuickSight

Apps & Services

Analysis & VisualizationNotebooks

IDEApacheKafka

AWS Direct ConnectAmazon Elasticsearch Service

Search SQLNoSQLCache

ML

Amazon Redshift

DWFileHadoop & Spark

Mobile AppsWeb AppsAWS Import/Export

Logging

Data Centers

Sensors IoTDevices

CloudWatchSnowball

APISQS Apps

MessagesAmazon SQS

MessagingAWS IoTAWS Direct Connect

MessageReference ArchitectureAmazon S3

Events

Streaming

EMREC2EC2FastSlow

WarmHotCold

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.Stream processing

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1StoreProcessStoreProcessDataAnswers

process

store

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved. 2KinesisAmazon KinesisAWS LambdaDataAmazon DynamoDBAmazon Kinesis S3Connector

process

storeAmazon S3

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved. 3EMRAmazon EMRAmazon KinesisAWS LambdaAmazon S3DataAmazon DynamoDBAnswerSpark StreamingAmazon Kinesis S3Connector

Process

storeSpark SQL

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Spark Streaming Apache StormAWS LambdaKCLAmazon RedshiftAmazonRedshiftHiveSpark Presto

Amazon KinesisApache Kafka

Amazon DynamoDB

Amazon S3

dataHotColdData Temperature

Processing SpeedFastSlow

Answers

HiveNativeKCLAWS Lambda

Real-timeInteractiveBatch

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

27

Amazon EMR

ApacheKafkaKCLAWS LambdaSparkStreamingApache StormAmazon SNSAmazonMLNotifications

AmazonElastiCache (Redis)AmazonDynamoDBAmazonRDSAmazonES

AlertApp stateReal-time PredictionKPI

process

storeDynamoDB StreamsAmazon Kinesis

DataStream

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.&

Amazon S3Amazon EMRHivePigSparkAmazonML

process

storeConsume

Amazon RedshiftAmazon EMRPrestoSparkBatchInteractiveBatch PredictionReal-time PredictionData StreamAmazon KinesisFirehose

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Batch LayerAmazonKinesis

DataStream

process

storeAmazon Kinesis S3 Connector Amazon S3Applications

Amazon RedshiftAmazon EMRPrestoHivePigSparkanswer

Speed Layer

answerServing Layer

AmazonElastiCacheAmazonDynamoDBAmazonRDSAmazonESanswerAmazonMLKCLAWS LambdaStormSpark Streaming on Amazon EMR

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.Stream processing

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.AWS

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.Stream processingBig data

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

TECHNICAL & BUSINESS SUPPORTAccount Management

Support

Professional Services

Solutions Architects

Training & Certification

Security & Pricing Reports

Partner Ecosystem

AWSMARKETPLACEBackup

Big Data& HPC

Business Apps

Databases

Development

IndustrySolutions

Security

APPLICATION SERVICESQueuing

Notifications

Search

Orchestration

Email

ENTERPRISEAPPSVirtualDesktops

StorageGateway

Sharing &Collaboration

Email &Calendaring

Directories

HYBRID CLOUDMANAGEMENTBackups

Deployment

DirectConnect

IdentityFederation

IntegratedManagement

SECURITY &MANAGEMENTVirtual PrivateNetworks

Identity &Access

EncryptionKeys

Configuration

Monitoring

Dedicated

INFRASTRUCTURESERVICESRegions

AvailabilityZones

Compute

Storage

DatabasesSQL, NoSQL, Caching

CDN

Networking

PLATFORMSERVICESAppMobile & WebFront-end

Functions

Identity

Data Store

Real-time

DevelopmentContainers

SourceCode

BuildTools

Deployment

DevOps

MobileSync

Identity

PushNotifications

MobileAnalytics

MobileBackend

AnalyticsDataWarehousing

Hadoop

Streaming

DataPipelines

MachineLearning

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.