(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014
-
Upload
amazon-web-services -
Category
Technology
-
view
694 -
download
2
description
Transcript of (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014
![Page 1: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/1.jpg)
November 14, 2014 | Las Vegas, NV
Steve McPherson
![Page 2: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/2.jpg)
instance AMI DB on
instance
instance with
CloudWatch
Elastic IP optimized
instance
Amazon
WorkSpaces
assignment/
task
Amazon EMR cluster MapR M3
engine
MapR M5
engine
MapR M7
engine
engine
Kinesis-enabled
appnew!
Amazon
Route 53
hosted zone route table
solid state disks
AWS Direct Connect
router
Amazon RDS
customer
gateway
attribute
VPC peering
Auto Scaling
Amazon S3 bucket with
objects
object AWS Import/Export
AWS Storage
Gateway
volume snapshotAmazon EBS cached
volume
virtual tape
library
Elastic Beanstalk
Amazon Glacier archive vault
CloudFront download
distribution Node.js
streaming
distribution
items
tableDynamoDB attributes global
secondary
index
Amazon
KinesisRDS DB
instance
RDS DB
instance standby
(Multi-AZ) Oracle DB
instance
MS SQL
instance
PostgreSQL
instance
PIOP MemcachedRedis
new! new! new! new!
AWS CloudTrail
instances
domain Amazon RedshiftAmazon SimpleDB
new!
DW1
Dense Compute
ElastiCache
DW2
Dense Compute
edge location
AWS Toolkit for
Visual Studio
JavaScriptapplication
stack
Amazon VPC VPN
connection
virtual private
gateway
alarm
stack
Internet
gateway
.NET
RDS DB
instance read
replica
IAMJava Python (boto)
AWS CLI
permissions role
MFA token
new!
new! new!
AWS OpsWorks
elastic network
instance
PHPdata encryption
keyAWS Data Pipeline
monitoring
new!
new!
deployment CloudWatch
Elastic Load
Balancing
SQL master
new!new!
Amazon EC2
new!
SQL slave
encrypted
data
AWS Tools for
Windows
PowerShellnon-cached
volume
users
IAM add-on
deployments
bucketdeployments
new!
permissions
iOS
resources
cache node
stack
AWS OpsWorks layers
apps
new!
new! apps
new!
Amazon SNS
new!
Human Intelligence
Tasks (HIT)
AWS Simple Icons: Deployment & Management
instances
new!
new!new!
Ruby
new!
instances
new!
permissionsresources
new!
topic
new!
templateAWS Toolkit
for Eclipse
Amazon SES
traditional server
Elastic
Transcoder
monitoring
Requester
email notification HTTP notification
Amazon
CloudSearchSDF metadata
Amazon SQSitem
message
Amazon SWF
decider
layers
worker
tape storagedisk
userInternet
Amazon
Mechanical Turk
client mobile client multimedia
workers
corporate
data centergeneric database
Android
AWS Security
Token Service
AWS cloud
AWS Management
Console
virtual private cloud forums
MySQL DB
instance
queueAMAZON
EMR
![Page 3: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/3.jpg)
Big decisions need Big Data
Server
Purchase Social
Media
![Page 4: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/4.jpg)
Extract Transform Load to
Data Warehouse
Report Generation
Ad Hoc Analysis
![Page 5: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/5.jpg)
Hadoop
Hadoop can help
![Page 6: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/6.jpg)
Difficult, expensive, and time consuming to operate
Hadoop
But Hadoop needs help
![Page 7: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/7.jpg)
Amazon EMR makes Hadoop easy
![Page 8: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/8.jpg)
Extract Transform & Load Data Warehouse Report Generation & Ad Hoc Analysis
Amazon S3
• MapReduce API
• Scoop
• Spark
• Cascading
• Pig
• MR
• Hive
• Spark
• Cascading
• Pig
• Presto
• Hive
• Spark-SQL
• Lingual
• Parquet
• ORC
• SEQ
• Text
Extract Transform & Load
Data Warehouse Report Generation
Ad Hoc Analysis
write read
![Page 9: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/9.jpg)
Amazon S3 is your Data Lake
Amazon S3
![Page 10: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/10.jpg)
Amazon EMR with Amazon S3 is your Data Warehouse
Hive, Pig,
Cascading
Spark
Presto HBase
Amazon S3
![Page 11: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/11.jpg)
Disaster Recovery built in
Cluster 1 Cluster 2
Cluster 3 Cluster 4
Amazon S3
Availability Zone Availability Zone
![Page 12: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/12.jpg)
Amazon EMR reads from and writes to AWS data sources
Amazon S3
bucket
Amazon
Kinesis
Amazon
DynamoDB
Amazon S3
bucket
Amazon
DynamoDB
Amazon
Redshift
![Page 13: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/13.jpg)
![Page 14: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/14.jpg)
![Page 15: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/15.jpg)
Client/Sensor Recording Service
Aggregator/ Sequencer
Continuous Processor
Data Warehouse
Analytics and Reporting
![Page 16: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/16.jpg)
Client/Sensor Recording Service
Aggregator/ Sequencer
Continuous Processor
Data Warehouse
Analytics and Reporting
Kafka
![Page 17: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/17.jpg)
Streaming Data Repository
Amazon Kinesis
![Page 18: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/18.jpg)
Amazon Kinesis + Amazon EMR= Fewer Moving Parts
Client/ Sensor Recording Service
Aggregator/ Sequencer
Continuous Processor for Dashboard
Data Warehouse
Analytics and Reporting
Amazon Kinesis Amazon EMR
Streaming Data RepositoryLogging Data Processing
Log4J
![Page 19: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/19.jpg)
Processing
Input
•User
•Dev
push to
HivePig
Cascading
pull from
Spark
Amazon Kinesis
Amazon DynamoDB
![Page 20: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/20.jpg)
Processing Amazon Kinesis data from Amazon EMR using Hive
private static final KinesisAppender.class
SELECT
FROM
WHERE
InstanceTime | InstnaceID | Message
11/13/2014:07:51 InstanceID123 Cannot find resource XYZ
![Page 21: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/21.jpg)
![Page 22: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/22.jpg)
![Page 23: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/23.jpg)
![Page 24: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/24.jpg)
Amazon S3
![Page 25: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/25.jpg)
Long-Running Clusters Scheduled Jobs
![Page 26: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/26.jpg)
Amazon EMR integrates with your tools
![Page 27: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/27.jpg)
Recent Integrations
![Page 28: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/28.jpg)
Recent Integrations
![Page 29: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/29.jpg)
http://emr.looker.com
![Page 30: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/30.jpg)
Flexible and cost effective – Burst when you
need to
672
0.113
75.936
On Demand pricing
672
14.784
Reserved Instance
672
accepted
10.08
Spot Instance
0.015
481
0.022
Bill
$10.08
![Page 31: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/31.jpg)
Setup for security
![Page 32: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/32.jpg)
Flexible, Reliable, Scalable, Secure, and Low-Cost
Data Warehouse
![Page 33: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/33.jpg)
AWS Big Data Blog
• R
• Amazon Kinesis
• Visualization with
Tableau
• Bootstrap actions and
steps
http://blogs.aws.amazon.com/bigdata/
![Page 34: (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014](https://reader034.fdocuments.net/reader034/viewer/2022042715/5594450a1a28ab06308b4865/html5/thumbnails/34.jpg)
Get started today
http://aws.amazon.com/elasticmapreduce/