AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

31
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS as a Data Platform Chris Keyser [email protected]

description

Come hear about the services that AWS provides to manage data and when to use which tools to manage data appropriately. You will learn about both data movement and coordination, as well as data storage and analysis, including when to use relational and NoSQL approaches, Hadoop, and data warehousing. This session will highlight how AWS data services have helped real-world customers.

Transcript of AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

Page 1: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

AWS as a Data PlatformChris Keyser

[email protected]

Page 2: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Ease of useLower costs

Why AWS?

Page 3: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

no capital investment

pay as you go

no subscriptions

only pay for what you use

Ease of useLower costs

Page 4: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

programmable

zero admin easy to configure

integrate with existing tools

Ease of useLower costs

Page 5: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

One tool to rule them all

Page 6: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

II

Use the right tools

Page 7: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Movement and Coordination

Data PipelineDirect Connect Storage GatewayImport / Export

Page 8: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Storage and Analysis Services

EC2EBS

Instance Storage

RedshiftRDS

SQL Stores

EMR

Hadoop

DynamoDB

NOSQL

Kinesis

Stream

CloudSearch

Search

S3

Storage Services

CloudFrontGlacier

Page 9: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Movement and Coordination

Page 10: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Movement and Coordination - Plumbing

Ship us your disks

Direct Connect

Storage Gateway

Import / Export

Dedicated network pipes

Storage backup & archiving

Page 11: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

AWS Data Pipeline

Resource management

Scheduling, execution, and retry

Dependency tracking

Failure notification

Movement and Coordination - Orchestration

Page 12: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Data Storage and Analysis

Page 13: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Storage Services – Object Store

Amazon S3

> 1.5 million peak requests/sec

Designed for 99.999999999% durability

Trillions of objects

Stores anything

Lifecycle and Versioning

Page 14: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Storage Services - Archive Storage

Low cost, durable archiving

“Cold Storage”

Infrequently accessed data

Integrated S3 lifecycle policies

Amazon Glacier

Page 15: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Storage Services – Edge Caching

Simple to use with global footprint

Streaming support

Large file distribution

Private content

S3, EC2 and ELB integration

Geo restrictions

Amazon CloudFront

Page 16: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Page 17: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Instance Storage - Options

Ephemeral Storage (“local”)You manage backup/restoralHigh Storage instances available i2.8xlarge – 6.4 TB SSD (350K IOPS) hs1.8xlarge – 48 TB Disk Storage

AmazonEC2

Elastic Block Storage“Network Attached Storage”Snapshot, EncryptionProvisioned throughput (IOPS)

Page 18: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Instance Storage - Build Your Own

AmazonEC2

NFS

MongoDB

Cassandra

GraphLab

Titan

Kafka

Luster

Gluster

Flume

Scribe

Presto

…and more

Page 19: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

MySQL, Oracle, SQLServer, Postgres

Backup/Restore, High Availability

Push Button Scalability

Up to 3 TB and 30K IOPS

Amazon RDS

SQL Stores - Managed Relational DB

Page 20: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Relational data warehouse

Massively parallel

Petabyte scale

Fully managed

$1,000/TB/Year

Amazon Redshift

SQL Stores- Petabyte Data Warehouse

Page 21: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

SQL Stores- Amazon Redshift Architecture

• Leader Node– SQL endpoint

– Stores metadata

– Coordinates query execution

• Compute Nodes– Local, columnar storage

– Execute queries in parallel

– Backup and restore via S3

– Parallel load from S3, EMR, or DynamoDB

• HW optimized for data processing– DW1: 2TB – 1.6PB Magnetic

– DW2: 160GB – 256TB SSD

10 GigE(HPC)

IngestionBackupRestore

SQL Clients/BI Tools

128GB RAM

16TB disk

16 cores

Amazon S3 / DynamoDB / SSH

JDBC/ODBC

128GB RAM

16TB disk

16 coresCompute Node

128GB RAM

16TB disk

16 coresCompute Node

128GB RAM

16TB disk

16 coresCompute Node

LeaderNode

Page 22: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

NoSQL Database

Seamless scalability

Zero admin

Single digit millisecond latency

Amazon DynamoDB

NoSQL – Dial Up Capacity

Page 23: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

WRITESContinuously replicated to 3 AZ’s

Quorum acknowledgmentPersisted to disk (custom SSD)

READSStrongly or eventually consistent

No trade-off in latency

NoSQL - Durable Low Latency at Scale

Page 24: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Hive, Impala, Spark, Pig, MapReduce

Easy to use; fully managed

On-demand and spot pricing

Persistent and transient clusters

Deep integration with S3

Amazon Elastic Map

Reduce

Hadoop – On Demand

Page 25: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Master instance group

Task instance groupCore instance group

HDFS HDFS

Amazon S3Amazon Redshift

Amazon DynamoDB

Hadoop – Tuned for AWS

Page 26: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Page 27: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Real-time data collection

Seamlessly scale to gigabytes/s

Low cost managed service

EMR integration

Low cost managed service

Streaming - at Scale

Amazon Kinesis

Page 28: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Streaming - Amazon Kinesis Architecture

Amazon Web Services

AZ AZ AZ

Durable, highly consistent storage replicates dataacross three data centers (availability zones)

Millions ofsources producing100s of terabytes

per hour

FrontEnd

AuthenticationAuthorization

Ordered streamof events supportsmultiple readers

Inexpensive: $0.028 per million puts

Aggregate analysis in Hadoop or data Warehouse

Machine learning algorithms or sliding window analytics

Real-time dashboards and alarms

Aggregate andArchive to S3

Page 29: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Fully managed search engine

Simple to operate

Highly available

User configurable scaling

Advanced feature support

Search – Made Simple

AmazonCloudSearch

34 languagesAlgorithmic stemmingGeospatial searchFaceted search

SuggestionsHighlightingField weighting…

Page 30: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

The right tool. At the right time. At the right scale.

Page 31: AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Thank YouChris Keyser

[email protected]