Tackling Big Data with Hadoop
-
Upload
poorlytrainedape -
Category
Technology
-
view
1.814 -
download
7
description
Transcript of Tackling Big Data with Hadoop
TACKLING BIG DATA WITH HADOOP
David Howell
Sunday, September 11, 11
WHAT IS BIG DATA?
Sunday, September 11, 11
WHAT IS BIG DATA?Google web crawl
Sunday, September 11, 11
WHAT IS BIG DATA?stream of Twitter messages
Sunday, September 11, 11
WHAT IS BIG DATA?Annoying Farmville requests on Facebook
Sunday, September 11, 11
WHAT IS BIG DATA?terabyte-scale data sets
awkward to work with using traditional tools
Sunday, September 11, 11
WHAT IS BIG DATA?requires distributed computing
Sunday, September 11, 11
MEDIUM DATAdozens to hundreds of gigabytes
still awkward to work with using traditional tools
Sunday, September 11, 11
MAP-REDUCEhttp://labs.google.com/papers/mapreduce.html
Sunday, September 11, 11
Sunday, September 11, 11
Sunday, September 11, 11
COUNTING AT SCALE
Sunday, September 11, 11
function map_1(t, search_phrase)emit(search_phrase, 1)
function reduce_1(search_phrase, counts)total = 0for count in countstotal += count
emit(search_phrase, total)
function map_2(search_phrase, total)emit(total, search_phrase)
function reduce_2(total, search_phrases)for search_phrase in search_phrasesemit(search_phrase, total)
sort and shuffle
sort and shuffle
Sunday, September 11, 11
cat IN | sort | uniq -c > OUTmap shuffle reduce
awk ‘{print $2,$1}’ OUT | sort > FINAL map shuffle reduce
Sunday, September 11, 11
WHY BOTHER?
Sunday, September 11, 11
HADOOP
Sunday, September 11, 11
DISTRIBUTED COMPUTING PLATFORM
Sunday, September 11, 11
TOOLS IN THE PLATFORM
Higher Level APIs•Hive•Cascading•Pig
Map-Reduce APIs•Java•C++•UNIX pipes
Sunday, September 11, 11
THE ORIGIN STORY
Sunday, September 11, 11
WHO’S USING IT?
Sunday, September 11, 11
HADOOPHow does it work?
Sunday, September 11, 11
Sunday, September 11, 11
Sunday, September 11, 11
Sunday, September 11, 11
Sunday, September 11, 11
DEMO!
Sunday, September 11, 11
YOUR DATA PLATFORM
ad hocunstructuredprototypingexperimentdata-driven
curiosityplay
Sunday, September 11, 11
LEARN MORE
http://hadoop.apache.org/http://www.cloudera.com/
Hadoop: The Definitive Guide
http://github.com/dehowell/hadoop-crypto-demo
Sunday, September 11, 11