Structured, Unstructured and Streaming Big Data on the AWS
-
Upload
amazon-web-services -
Category
Technology
-
view
2.034 -
download
4
Transcript of Structured, Unstructured and Streaming Big Data on the AWS
![Page 1: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/1.jpg)
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Markku Lepistö
Principal Technology Evangelist, APAC
Structured, Unstructured and Streaming Big Data
on Amazon Web Services
![Page 2: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/2.jpg)
Agenda
1:00pm - 2:00pm Registration – Lunch & Meet AWS SAs 2:00pm - 2:20pm Welcome & Introduction 2:20pm - 3:40pm Structured, unstructured and streaming Big Data on the AWS Platform 3:40pm - 4:00pm Break 4:00pm - 5:15pm Building an Amazon RedShift Data warehouse 5:15pm - 5:30pm Q&A 5.30pm Close
![Page 3: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/3.jpg)
![Page 4: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/4.jpg)
Big Data End to End Framework
![Page 5: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/5.jpg)
Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Apache Storm
PIG
Amazon Machine Learning
Amazon EMR
Amazon Glacier
Amazon DynamoDB
![Page 6: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/6.jpg)
”I got kicked out of the bookshop last week, because I moved all of the Big Data books
into the Religion sec<on”
![Page 7: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/7.jpg)
Ingest Store Process Analyse Data Answers
Simplify Big Data Processing
![Page 8: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/8.jpg)
Databases
Database Flat Files Database
Data
File Data
IoT Device
Android iOS
Streaming Data
Sales Data Customer Data
Web Logs Server Logs
Clickstream data Sensor data
Database
INGEST STORE
![Page 9: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/9.jpg)
Databases
Database Flat Files Database
Data
File Data
IoT Device
Android iOS
Streaming Data
Sales Data Customer Data
Web Logs Server Logs
Clickstream data Sensor data
Database
INGEST
Amazon Redshift
Amazon RDS
STORE
![Page 10: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/10.jpg)
Data Tier
Search Cache Object Store
RDBMS NoSQL Data Warehouse
logging analy)cs
webscale transac)ons
rich search hot reads complex queries and transac)ons
Data Tier
Amazon DynamoDB
Amazon RDS
Amazon ElastiCache
Amazon S3
Amazon Redshift
Amazon CloudSearch
Traditional Relational Database
![Page 11: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/11.jpg)
Amazon
Redshift Amazon
RDS
Scaling Vertical Horizontal
Storage Row Column
Workload Transactional Analytical
Architecture SMP MPP
Type SQL Relational SQL Relational
![Page 12: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/12.jpg)
Databases
Database Flat Files Database
Data
File Data
Event Producer
Android iOS
Streaming Data
Sales Data Customer Data
Web Logs Server Logs
Clickstream data Sensor data
Storage
INGEST
Amazon Redshift
Amazon RDS
Application
Amazon S3
STORE
![Page 13: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/13.jpg)
Impala PIG
Amazon EMR
![Page 14: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/14.jpg)
Amazon S3
Amazon Redshift
Amazon EMR
Glacier
Amazon
DynamoDB
Amazon Machine Learning
Applications
![Page 15: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/15.jpg)
Amazon
Redshift
Scaling Add nodes Automatic
Speed Fastest Fast
Cost Higher Lower
Durability Configurable Built-in
Amazon S3
![Page 16: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/16.jpg)
Databases
Database Flat Files Database
Data
File Data
Event Producer
Android iOS
Streaming Data
Sales Data Customer Data
Web Logs Server Logs
Clickstream data Sensor data
Stream Processor
INGEST
Amazon Redshift
Amazon RDS
Amazon S3
Amazon Kinesis
STORE
![Page 17: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/17.jpg)
Why Stream Storage?
Sensors Amazon Kinesis
Apache Kafka
![Page 18: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/18.jpg)
Availability Zone
Availability Zone
Availability Zone
Data Sources
Data Sources
Data Sources
Data Sources
Data Sources
Logging
Metrics
Analysis
Processing
S3
DynamoDB
Redshift
Lambda Amazon Kinesis
Stream
![Page 19: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/19.jpg)
Amazon
Redshift
Ordering Yes Yes
Persistence 24 Hours Configurable
Size 50 KB Configurable
Scaling High High
Latency Low Low
Managed Yes No
Amazon Kinesis
![Page 20: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/20.jpg)
”The world of gaming never sleeps. We owe every player a great experience, and AWS is our main tool to make that happen.” -‐ Sami Yliharju, Services Lead
![Page 21: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/21.jpg)
INGEST STORE PROCESS
Event Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Amazon EMR
Flat Files Database
Data
Event Data
Streaming Data
Inte
ract
ive
Bat
ch
Stre
amin
g
Hadoop
![Page 22: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/22.jpg)
Amazon
Redshift
Scaling 2 PB+ Nodes
Storage Native HDFS/S3
BI Tools High Medium
Durability High High
Latency Low Low
Managed Fully Semi (EMR)
Amazon
Redshift
Nodes
HDFS
Medium
High
Low
Semi (EMR)
Amazon Redshift Impala
![Page 23: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/23.jpg)
INGEST STORE PROCESS
Event Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Flat Files Database
Data
Event Data
Streaming Data
Inte
ract
ive
Bat
ch
PIG
Stre
amin
g
Amazon EMR
Hadoop
![Page 24: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/24.jpg)
PIG
SQL on Hadoop
Eats anything
New Processing Engine
![Page 25: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/25.jpg)
Amplab Big Data Benchmark https://amplab.cs.berkeley.edu/benchmark/
![Page 26: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/26.jpg)
INGEST STORE PROCESS
Event Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Flat Files Database
Data
Event Data
Streaming Data
Inte
ract
ive
Bat
ch
Stre
amin
g
PIG
Amazon EMR
Hadoop
AWS Lambda
![Page 27: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/27.jpg)
INGEST STORE PROCESS
Event Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Flat Files Database
Data
Event Data
Streaming Data
Inte
ract
ive
Bat
ch
Stre
amin
g
PIG
ANALYSE
Amazon Machine Learning
L
Amazon EMR
Hadoop
AWS Lambda
![Page 28: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/28.jpg)
Use Cases
![Page 29: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/29.jpg)
FOMO
![Page 30: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/30.jpg)
Amazon EMR
Hadoop
Amazon Machine Learning
Kinesis Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Kinesis Consumer
Flat Files Database
Data
Event Data
Streaming Data
Databases Amazon Redshift
Amazon Redshift
Database Data
SQL Analytics
![Page 31: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/31.jpg)
Amazon Machine Learning
Kinesis Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Kinesis Consumer
Am
azon
Ela
stic
Map
Red
uce
Flat Files Database
Data
Event Data
Streaming Data
Clickstream Analysis - Batch
Am
azon
Ela
stic
Map
Red
uce
Event Data
Amazon EMR
Hadoop
![Page 32: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/32.jpg)
Amazon Machine Learning
Kinesis Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Kinesis Consumer
Am
azon
Ela
stic
Map
Red
uce
Flat Files Database
Data
Event Data
Streaming Data
Clickstream Analysis – Near Real Time
Event Producer
Amazon Kinesis
Amazon S3
Amazon Redshift
Kinesis Consumers Streaming
Data
![Page 33: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/33.jpg)
Demo
Realtime Twitter analytics using AWS Kinesis, Lambda and Open Source Software
![Page 34: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/34.jpg)
vs
![Page 35: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/35.jpg)
Amazon Kinesis
Twitter Stream AWS Lambda
Demo: Live Twitter Feed Analysis
* https://blog.twitter.com/2013/new-tweets-per-second-record-and-how
Twitter - On a typical day: More than 500 million Tweets sent* • Average 5,700 TPS
![Page 36: Structured, Unstructured and Streaming Big Data on the AWS](https://reader031.fdocuments.net/reader031/viewer/2022021815/58781e991a28aba12d8b60d7/html5/thumbnails/36.jpg)
Thank You!