Amazon Elastic MapReduceの紹介(英語)

27
Amazon Elastic MapReduce

description

『Hadoop on クラウド / Amazon Elastic MapReduceの真価』(Amazon Web Services, Jeff Barr)の資料です。http://www.eventbrite.com/event/1278974447/efblike

Transcript of Amazon Elastic MapReduceの紹介(英語)

Page 1: Amazon Elastic MapReduceの紹介(英語)

Amazon Elastic MapReduce

Page 2: Amazon Elastic MapReduceの紹介(英語)

MY BACKGROUND

• Based in Seattle, WA

• Education:– BS in Computer Science, The American University, 1985– Graduate student in Digital Media, University of Washington, 2010

• Background:– Microsoft Visual Studio team– Consulting to startups and VC’s– Amazon employee since 2002

• Evangelist:– Speak– Write– Tweet

• Author, “Host Your Web Site in the Cloud”

• Email: [email protected]• Twitter: @jeffbarr

Page 3: Amazon Elastic MapReduceの紹介(英語)

• What is Big Data

• Elastic MapReduce Overview

• Example Use Cases

• Ecosystem and Tools

• Upcoming Features

• Discussion

AGENDA

Page 4: Amazon Elastic MapReduceの紹介(英語)

• Doesn’t refer just to volume– You can benefit from Big Data infrastructure

without having a ton of data

– Many existing technologies have little problem physically handling large volumes

• Challenges result from the combination of data volume, data structure, and usage demands from that data, usually tied to timeliness

• Big Data Tools are needed to provide a holistic view of enterprise data and systematically harness it for insights and trends

WHAT IS BIG DATA?

Page 5: Amazon Elastic MapReduceの紹介(英語)

• Enables customers to easily, securely and

cost-effectively process vast amounts of

data:

– Spin-up hundreds of instances

– Process hundreds of terabytes of data

• Hosted Hadoop framework running on

Amazon’s web-scale infrastructure

WHAT IS AMAZON ELASTIC MAPREDUCE

Page 6: Amazon Elastic MapReduceの紹介(英語)

• Launch and monitor job flows

• AWS Management Console

• Command line interface

• REST API

Page 7: Amazon Elastic MapReduceの紹介(英語)

WHY USE AMAZON ELASTIC MAPREDUCE

• Elastic MapReduce removes “MUCK” from Big Data processing

– Hard to manage compute clusters

– Hard to tune Hadoop

– Hard to monitor running Job Flows

– Hard to debug Hadoop jobs

– Hadoop issues prevent smooth operation in the cloud

Page 8: Amazon Elastic MapReduceの紹介(英語)

PROBLEMS CUSTOMERS SOLVE WITH

ELASTIC MAPREDUCE

• Targeted advertising / Clickstream analysis

• Data warehousing applications

• Bio-informatics (Genome analysis)

• Financial simulation (Monte Carlo simulation)

• File processing (resize jpegs)

• Web indexing

• Data mining and BI

Page 9: Amazon Elastic MapReduceの紹介(英語)

• Data or I/O Intensive (m1/m2 instances)

– Data Warehouse

– Data Mining

• Click stream, logs, events, etc.

• Compute or I/O Intensive (c1, cc1/HPC instances)

– Credit Ratings

– Fraud Models

– Portfolio analysis

– VaR calculation

HARDWARE REQUIREMENTS FOR USE CASES

Page 10: Amazon Elastic MapReduceの紹介(英語)

CLICKSTREAM ANALYSIS – RAZORFISH AND BEST BUY

• Best Buy came to Razorfish– 3.5 billion records, 71 million unique cookies, 1.7 million targeted ads

required per day

Targeted Ad

User recently

purchased a

home theater

system and is

searching for

video games

(1.7 Million per day)

• Leveraged AWS and Elastic MapReduce– 100 node cluster on demand

– Processing time dropped from 2+ days to 8 hours

– Increased ROAS (Return on Advertising Spend) by 500%

Page 11: Amazon Elastic MapReduceの紹介(英語)

CLICKSTREAM ANALYSIS - ARCHITECTURE

Page 12: Amazon Elastic MapReduceの紹介(英語)

• Invented by Google

• New processing model

• Highly scalable

• Easy to understand

• Industry standard

• Something worth knowing

WHAT IS MAPREDUCE?

Page 13: Amazon Elastic MapReduceの紹介(英語)

• Take input data

• Break in to sub-problems

• Distribute to worker nodes

• Worker nodes process sub-problems in parallel

• Take output of worker nodes and reduce to answer

ELASTIC MAPREDUCE MODEL – OVERVIEW

Page 14: Amazon Elastic MapReduceの紹介(英語)

MAPREDUCE EXAMPLE – WORD COUNT

Input

Map Phase

Mapper

Mapper

Mapper

“This”, Doc1

“Word”, Doc1

“This”, Doc2

“This”, Doc3

Sort

“This”, Doc1

“Word”, Doc1

“This”, Doc2

“This”, Doc3

“Word”, Doc3“Word”, Doc3

Reduce Phase

Reducer

Reducer

Output

“This”, 3

“Word”, 2

Page 15: Amazon Elastic MapReduceの紹介(英語)

ELASTIC MAPREDUCE MODEL – DETAILED

Page 16: Amazon Elastic MapReduceの紹介(英語)

ELASTIC MAPREDUCE IN ACTION – S3 LOG FILE

Page 17: Amazon Elastic MapReduceの紹介(英語)

ELASTIC MAPREDUCE IN ACTION – STEP 1

Page 18: Amazon Elastic MapReduceの紹介(英語)

ELASTIC MAPREDUCE IN ACTION – STEP 2

Page 19: Amazon Elastic MapReduceの紹介(英語)

ELASTIC MAPREDUCE IN ACTION – STEP 3

Page 20: Amazon Elastic MapReduceの紹介(英語)

ELASTIC MAPREDUCE IN ACTION – STEP 4

Page 21: Amazon Elastic MapReduceの紹介(英語)

ELASTIC MAPREDUCE IN ACTION – STEP 5

Page 22: Amazon Elastic MapReduceの紹介(英語)

ELASTIC MAPREDUCE IN ACTION – STEP 6

Page 23: Amazon Elastic MapReduceの紹介(英語)

ELASTIC MAPREDUCE IN ACTION – STEP 7

Page 24: Amazon Elastic MapReduceの紹介(英語)

ELASTIC MAPREDUCE IN ACTION - RESULTS

Page 25: Amazon Elastic MapReduceの紹介(英語)

• Mapper and Reducer in Java JAR files

• Scale as large as needed

– Data

– Processing

– Add nodes (even while running) to speed up

• No need to manage intermediate data

• Suitable for certain types of problems

– Record-oriented input

– No dependencies between records

• No more MUCK – focus on your problem

NOTES / ATTRIBUTES

Page 26: Amazon Elastic MapReduceの紹介(英語)

HADOOP + R

Page 27: Amazon Elastic MapReduceの紹介(英語)

Thank You