Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon...

33
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Olivier Klein Solutions Architect, AWS 23 rd June 2015 Cloud & Big Data Analytics Summit 2015 Hong Kong Real-Time Analytics at Scale in the AWS Cloud

Transcript of Real-Time Analytics at Scale in the AWS Cloud · AWS Data Pipeline Amazon S3 Amazon Lambda Amazon...

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Olivier Klein

Solutions Architect, AWS

23rd June 2015

Cloud & Big Data Analytics Summit 2015

Hong Kong

Real-Time Analytics at Scale in the

AWS Cloud

Three Types of Data Analytics

Retrospective

analysis and

reporting

Here-and-now

real-time processing

and dashboards

Predictions

to enable smart

apps

Three Types of Data Analytics

Retrospective

analysis and

reporting

Here-and-now

real-time processing

and dashboards

Predictions

to enable smart

apps

How Fast is Real-Time?

“There’s no such thing as real time.

There’s only near-real time. Typically

when we talk about real-time, what

we mean is architectures that allow

you to respond to data without

persisting it to a database first!”

John Akred

CTO, Silicon Valley Data Science

So what is near real-time?

• Ability to process data as it arrives

• Roughly speaking, process data in

“the present” rather than “the future”

• But what is “the present”?

• eCommerce – Attention span of a

potential customer

• Options Trader – Milliseconds

• Guided Missile – Microseconds

Solution: Stream Processing

• Stream “storage” which allows processing events as

they come in and react accordingly

A high-throughput distributed messaging system.

What do we expect from a real-time data stream?

Real-Time Data Stream Expectations

• What do we expect from a real-time data stream?

• Highly Available

• Fully Scalable

• Fault Tolerant

• (Temporary) Durable

• How can we achieve this?

• Multiple Datacenter Facilities

• Auto-Scalable Server Infrastructure

• Global Load-Balancers

• etc.

Oregon Beijing

Tokyo

Singapore

Ireland

GovCloud

Northern California Sydney

São Paulo

11 Regions

29 Availability Zones

53 Edge Locations

Continuous Expansion

Frankfurt

N. Virginia

AWS Global Infrastructure

Amazon Web Services

Core Services Compute Storage Database Networking

Infrastructure Regions Availability Zones Edge Locations

Platform Services

Analytics App Deployment Mobile

Access Control

Auditing Monitoring Encryption Security

Virtual Desktops

Collaboration & Sharing

App Delivery E-Mail Applications

API

&

SDKs

Compute Storage Database Networking

Amazon Web Services

Core Services

Infrastructure Regions Availability Zones Edge Locations

Platform Services

Analytics App Deployment Mobile

Access Control

Auditing Monitoring Encryption Security

Virtual Desktops

Collaboration & Sharing

App Delivery E-Mail Applications

API

&

SDKs

Let’s simplify Big Data with AWS!

Ingest Store Process Visualize

Data Answers

Time

Simplified Big Data Pipeline

Amazon S3

Amazon

DynamoDB

Amazon RDS

Ingest Store Process Visualize

Amazon Mobile

Analytics

Amazon

EC2

AWS

Import/Export

Amazon EMR

Amazon Redshift

Amazon

Lambda

Amazon

Kinesis Amazon Machine

Learning

Amazon

CloudSearch AWS Data

Pipeline

Amazon

EC2 Amazon

Glacier

Amazon

DynamoDB

Amazon RDS

Ingest Store Process Visualize

Amazon Mobile

Analytics

Amazon

EC2

AWS

Import/Export

Amazon EMR

Amazon Redshift

Amazon

Kinesis Amazon Machine

Learning

Amazon

CloudSearch AWS Data

Pipeline

Amazon S3 Amazon

Lambda

Amazon

EC2 Amazon

Glacier

Stream in Real Time: Amazon Kinesis

• Real-Time Data Processing over

large distributed streams

• Elastic capacity that scales to

millions of events per second

• React In real-time upon incoming

stream events

• Reliable stream storage replicated

across 3 facilities Amazon Kinesis

Kinesis

for Real-

Time

Amazon Kinesis: Produce and Consume

HTTP Post

AWS SDKs

LOG4J

Flume

Kinesis

Producer

Library (IoT)

Fluentd

App.4

[Machine Learning]

App.1

[Aggregate & De-Duplicate]

App.2

[Metric Extraction]

Amazon S3

Amazon

DynamoDB

Apache Storm

App.3

[Decision Making Tree]

Amazon EMR

Amazon Kinesis

Amazon

DynamoDB

Amazon RDS

Ingest Store Process Visualize

Amazon Mobile

Analytics

Amazon

EC2

AWS

Import/Export

Amazon EMR

Amazon Redshift

Amazon

Lambda

Amazon

Kinesis Amazon Machine

Learning

Amazon

CloudSearch AWS Data

Pipeline

Amazon S3 Amazon

EC2 Amazon

Glacier

React in Real-Time: Amazon Lambda

• Run your code in the cloud, fully

managed and highly-available

• Triggered through invocation or

state changes in your setup

• Scales automatically to match the

incoming event rate

• Can be connected to an Amazon

Kinesis stream to react upon every

incoming event

• Charged per 100ms execution time

Amazon Kinesis

Amazon Lambda

Amazon

DynamoDB

Amazon RDS

Ingest Store Process Visualize

Amazon Mobile

Analytics

Amazon

EC2

AWS

Import/Export

Amazon EMR

Amazon Redshift

Amazon

Kinesis Amazon Machine

Learning

Amazon

CloudSearch AWS Data

Pipeline

Amazon

Lambda Amazon S3 Amazon

EC2 Amazon

Glacier

Amazon DynamoDB

• Schemaless Data Model

• Seamless scalability

• No storage or throughput limits

• Consistent low latency performance

• High durability and availability

• Replicated across 3 facilities

DynamoDB

table

items

attributes

Fully Managed NoSQL Database Service

500,000 writes / second to their Amazon

DynamoDB tables

200 additional servers during Superbowl

0 additional servers right after

1 instance x 100 hours = 100 instances x 1 hour

Let’s put it all together: Demo Time!

Amazon

Kinesis Twitter Stream

Amazon

DynamoDB

Amazon SNS

Amazon

Lambda

Demo: Live Twitter Feed Analysis

Amazon S3

Visualization with

D3.js

Demo: Live Twitter Feed Analysis

Cost of running this demo?

Kinesis Shard: $0.15/h

DynamoDB: $0.0065/h + $0.25/GB

Lambda: $0.000000208/100ms

S3: $0.03/GB

Total: $0.436502080 ~ $0.43

Highly available with virtually unlimited scalability.

What’s next?

• Many AWS Services can help your Big

Data Roadmap

• Talk to us at the AWS and Masterson

booth to learn how to build a cost-

effective data analytics platform on us

• US$50 AWS Credits to get you started

$50

Thank you!

Olivier Klein

Solutions Architect, AWS

23rd June 2015

Cloud & Big Data Analytics Summit 2015

Hong Kong