Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job...
Transcript of Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job...
MapReduce ParadigmJob Tracker
Example
Developing a MapReduce Application
Oguzhan Gencoglu
TIE 12206 - Apache HadoopTampere University of Technology, Finland
November, 2014
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Outline
1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow
2 Job TrackerHadoop Default Ports
3 ExampleWord CountJob TrackerKey Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
What is MapReduceMapReduce Workflow
Outline
1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow
2 Job TrackerHadoop Default Ports
3 ExampleWord CountJob TrackerKey Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
What is MapReduceMapReduce Workflow
What is MapReduce
MapReduce is a software framework for processing (large) datasets in a distributed fashion over several machines.
Core idea
< key, value > pairs
Almost all data can be mapped into key, value pairs.
Keys and values may be of any type.
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
What is MapReduceMapReduce Workflow
What is MapReduce
MapReduce is a software framework for processing (large) datasets in a distributed fashion over several machines.
Core idea
< key, value > pairs
Almost all data can be mapped into key, value pairs.
Keys and values may be of any type.
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
What is MapReduceMapReduce Workflow
What is MapReduce
MapReduce is a software framework for processing (large) datasets in a distributed fashion over several machines.
Core idea
< key, value > pairs
Almost all data can be mapped into key, value pairs.
Keys and values may be of any type.
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
What is MapReduceMapReduce Workflow
Outline
1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow
2 Job TrackerHadoop Default Ports
3 ExampleWord CountJob TrackerKey Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
What is MapReduceMapReduce Workflow
MapReduce Workflow
Write your map and reduce functions
Test with a small subset of data
If it fails use your IDE’s debugger to find the problem
Run on full dataset
If it fails Hadoop provides some debugging tools
e.g. IsolationRunner : runs a task over the same input which itfailed.
Do profiling to tune the performance
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
What is MapReduceMapReduce Workflow
MapReduce Workflow
Write your map and reduce functions
Test with a small subset of data
If it fails use your IDE’s debugger to find the problem
Run on full dataset
If it fails Hadoop provides some debugging tools
e.g. IsolationRunner : runs a task over the same input which itfailed.
Do profiling to tune the performance
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
What is MapReduceMapReduce Workflow
MapReduce Workflow
Write your map and reduce functions
Test with a small subset of data
If it fails use your IDE’s debugger to find the problem
Run on full dataset
If it fails Hadoop provides some debugging tools
e.g. IsolationRunner : runs a task over the same input which itfailed.
Do profiling to tune the performance
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
What is MapReduceMapReduce Workflow
MapReduce Workflow
Write your map and reduce functions
Test with a small subset of data
If it fails use your IDE’s debugger to find the problem
Run on full dataset
If it fails Hadoop provides some debugging tools
e.g. IsolationRunner : runs a task over the same input which itfailed.
Do profiling to tune the performance
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
What is MapReduceMapReduce Workflow
MapReduce Workflow
Write your map and reduce functions
Test with a small subset of data
If it fails use your IDE’s debugger to find the problem
Run on full dataset
If it fails Hadoop provides some debugging tools
e.g. IsolationRunner : runs a task over the same input which itfailed.
Do profiling to tune the performance
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
What is MapReduceMapReduce Workflow
MapReduce Workflow
Write your map and reduce functions
Test with a small subset of data
If it fails use your IDE’s debugger to find the problem
Run on full dataset
If it fails Hadoop provides some debugging tools
e.g. IsolationRunner : runs a task over the same input which itfailed.
Do profiling to tune the performance
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
What is MapReduceMapReduce Workflow
MapReduce Workflow
Write your map and reduce functions
Test with a small subset of data
If it fails use your IDE’s debugger to find the problem
Run on full dataset
If it fails Hadoop provides some debugging tools
e.g. IsolationRunner : runs a task over the same input which itfailed.
Do profiling to tune the performance
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
ExampleHadoop Default Ports
Outline
1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow
2 Job TrackerHadoop Default Ports
3 ExampleWord CountJob TrackerKey Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
ExampleHadoop Default Ports
Hadoop Default Ports
Handful of ports over TCP.Some used by Hadoop itself (to schedule jobs, replicateblocks, etc.).Some are directly for users (either via an interposed Javaclient or via plain old HTTP)
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Word CountJob TrackerKey Points
Outline
1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow
2 Job TrackerHadoop Default Ports
3 ExampleWord CountJob TrackerKey Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Word CountJob TrackerKey Points
Word Count
Task: Counting the word occurances (frequencies) in a text file (orset of files).
< word, count > as < key, value > pair
Mapper: Emits < word, 1 > for each word (no counting at thispart).
Shuffle in between: pairs with same keys grouped together andpassed to a single machine.
Reducer: Sums up the values (1s) with the same key value.
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Word CountJob TrackerKey Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Word CountJob TrackerKey Points
Outline
1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow
2 Job TrackerHadoop Default Ports
3 ExampleWord CountJob TrackerKey Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Word CountJob TrackerKey Points
Job Tracker
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Word CountJob TrackerKey Points
Tasks
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Word CountJob TrackerKey Points
Name Node
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Word CountJob TrackerKey Points
Outline
1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow
2 Job TrackerHadoop Default Ports
3 ExampleWord CountJob TrackerKey Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Word CountJob TrackerKey Points
Key Points
Test mapper and reducer outside hadoop.
Copy your MapReduce function and files to DFS.
Test mapper and reducer with hadoop using a small portion ofthe data.
Track the jobs, debug, do profiling
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Word CountJob TrackerKey Points
Key Points
Test mapper and reducer outside hadoop.
Copy your MapReduce function and files to DFS.
Test mapper and reducer with hadoop using a small portion ofthe data.
Track the jobs, debug, do profiling
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Word CountJob TrackerKey Points
Key Points
Test mapper and reducer outside hadoop.
Copy your MapReduce function and files to DFS.
Test mapper and reducer with hadoop using a small portion ofthe data.
Track the jobs, debug, do profiling
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Word CountJob TrackerKey Points
Key Points
Test mapper and reducer outside hadoop.
Copy your MapReduce function and files to DFS.
Test mapper and reducer with hadoop using a small portion ofthe data.
Track the jobs, debug, do profiling
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce ParadigmJob Tracker
Example
Word CountJob TrackerKey Points
Questions/Comments
Oguzhan Gencoglu Developing a MapReduce Application