Hw09 Making Hadoop Easy On Amazon Web Services
-
Upload
cloudera-inc -
Category
Technology
-
view
2.859 -
download
3
Transcript of Hw09 Making Hadoop Easy On Amazon Web Services
![Page 1: Hw09 Making Hadoop Easy On Amazon Web Services](https://reader031.fdocuments.net/reader031/viewer/2022032122/55d4faeebb61eb764c8b45b8/html5/thumbnails/1.jpg)
Amazon Elastic MapReduce
Peter Sirota
![Page 2: Hw09 Making Hadoop Easy On Amazon Web Services](https://reader031.fdocuments.net/reader031/viewer/2022032122/55d4faeebb61eb764c8b45b8/html5/thumbnails/2.jpg)
Amazon Elas+c MapReduce
! Enables customers to easily and cost-‐effec+vely process vast amounts of data.
! U+lizes a hosted Hadoop framework running on the web-‐scale infrastructure of Amazon.
! Launched in the US in April and EU in July of 2009
![Page 3: Hw09 Making Hadoop Easy On Amazon Web Services](https://reader031.fdocuments.net/reader031/viewer/2022032122/55d4faeebb61eb764c8b45b8/html5/thumbnails/3.jpg)
Amazon Elas+c MapReduce
! Large scale data processing has a lot of MUCK and we want to remove it for our customers
! Hard to manage compute clusters ! Hard to tune Hadoop ! Hadoop issues preven+ng smooth opera+on in the cloud
Amazon.com Confiden+al 3
![Page 4: Hw09 Making Hadoop Easy On Amazon Web Services](https://reader031.fdocuments.net/reader031/viewer/2022032122/55d4faeebb61eb764c8b45b8/html5/thumbnails/4.jpg)
Hadoop made simple and easy
![Page 5: Hw09 Making Hadoop Easy On Amazon Web Services](https://reader031.fdocuments.net/reader031/viewer/2022032122/55d4faeebb61eb764c8b45b8/html5/thumbnails/5.jpg)
Input S3 bucket
Output S3 bucket
Amazon S3
Hadoop
Amazon EC2 Instances
Input dataset
output results
Deploy Application
Web Console, Command line tools
End
Notify
Get Results Input Data
Amazon Elastic MapReduce
Hadoop Hadoop
Hadoop
Hadoop
Hadoop
Elastic MapReduce
Elastic MapReduce
![Page 6: Hw09 Making Hadoop Easy On Amazon Web Services](https://reader031.fdocuments.net/reader031/viewer/2022032122/55d4faeebb61eb764c8b45b8/html5/thumbnails/6.jpg)
Amazon Elastic MapReduce Benefits
Elastic Uses as many or as few EC2 instances as needed. Spin up large or small job flows in minutes.
Easy to use Get up and running quickly with easy-to-use web console, robust command line clients and sample jobs. No configuration necessary.
Reliable Fault tolerant service built on top of battle-tested AWS infrastructure. Automatically retries failed tasks.
Cost Effective We monitor progress of your jobs and turn off resources when job flow is done.
![Page 7: Hw09 Making Hadoop Easy On Amazon Web Services](https://reader031.fdocuments.net/reader031/viewer/2022032122/55d4faeebb61eb764c8b45b8/html5/thumbnails/7.jpg)
Problems customers solve with Elas+c MapReduce
! Data mining (Log processing, click stream analysis, similari+es, etc.)
! Bio-‐informa+cs (Genome analysis)
! Financial simula+on (Monte Carlo simula+on)
! File processing (resize jpegs) ! Web indexing
7 Amazon.com Confiden+al
![Page 8: Hw09 Making Hadoop Easy On Amazon Web Services](https://reader031.fdocuments.net/reader031/viewer/2022032122/55d4faeebb61eb764c8b45b8/html5/thumbnails/8.jpg)
Customer Feedback
! Pros: ! Amazon Elas+c MapReduce makes it easy to run Hadoop applica+ons.
! Reliable plaZorm for produc+on data-‐processing
! Challenges: ! Simple tasks such as log processing require fluency in MapReduce
! Hadoop applica+ons are difficult to develop
![Page 9: Hw09 Making Hadoop Easy On Amazon Web Services](https://reader031.fdocuments.net/reader031/viewer/2022032122/55d4faeebb61eb764c8b45b8/html5/thumbnails/9.jpg)
New Features
! Support for Apache Pig – August 2009 ! Batch and interac+ve mode
! Concurrent access to mul+ple file systems
! Loading resources from Amazon S3
! Addi+onal Piggybank func+ons ! Integra+on with Elas+c MapReduce Client and Web Console
![Page 10: Hw09 Making Hadoop Easy On Amazon Web Services](https://reader031.fdocuments.net/reader031/viewer/2022032122/55d4faeebb61eb764c8b45b8/html5/thumbnails/10.jpg)
New Features
! Support for Apache Hive 0.4 – Today ! Batch and interac+ve mode
! Integra+on with Elas+c MapReduce Client and Web Console
! Addi+ons to Hive • Load table par++ons automa+cally from Amazon S3
• Specify an off-‐instance metadata store
• Op+mized data writes to Amazon S3 • Reference resources on Amazon S3
![Page 11: Hw09 Making Hadoop Easy On Amazon Web Services](https://reader031.fdocuments.net/reader031/viewer/2022032122/55d4faeebb61eb764c8b45b8/html5/thumbnails/11.jpg)
Amazon Elas+c MapReduce Ecosystem
! Karmasphere Studio for Hadoop – NetBeans IDE for development, debugging, deployment and management of Hadoop jobs ! Deploy Hadoop jobs to Elas+c MapReduce
! Monitor progress of Elas+c MapReduce job flows ! Amazon S3 file browser ! Elas+c MapReduce HDFS browser
![Page 12: Hw09 Making Hadoop Easy On Amazon Web Services](https://reader031.fdocuments.net/reader031/viewer/2022032122/55d4faeebb61eb764c8b45b8/html5/thumbnails/12.jpg)
Amazon Elas+c MapReduce Ecosystem
! Support for Cloudera’s Hadoop distribu+on (private beta) ! Op+onally use Cloudera’s Hadoop while execu+ng Elas+c MapReduce job flows
! Get support from Cloudera for the Elas+c MapReduce job flows
![Page 13: Hw09 Making Hadoop Easy On Amazon Web Services](https://reader031.fdocuments.net/reader031/viewer/2022032122/55d4faeebb61eb764c8b45b8/html5/thumbnails/13.jpg)
Q&A