Introduction to SARA's Hadoop Hackathon - dec 7th 2010

10
SARA Hadoop Hackathon [email protected] December 7, 2010

description

This was the first of two introduction presentations to the first Hadoop Hackathon at SARA, the Dutch center for High Performance Computing and Networking.

Transcript of Introduction to SARA's Hadoop Hackathon - dec 7th 2010

Page 1: Introduction to SARA's Hadoop Hackathon - dec 7th 2010

SARA Hadoop [email protected] 7, 2010

Page 2: Introduction to SARA's Hadoop Hackathon - dec 7th 2010

SARA Hadoop Hackathon, December 7, 2010

DJOERD HIEMSTRA(UTwente)

EDGAR MEIJ(UvA)

Page 3: Introduction to SARA's Hadoop Hackathon - dec 7th 2010

SARA Hadoop Hackathon, December 7, 2010

Nutch*2002 2004

MR/GFS**20062004

Hadoop

*  http://nutch.apache.org/** http://labs.google.com/papers/mapreduce.html   http://labs.google.com/papers/gfs.html

Page 4: Introduction to SARA's Hadoop Hackathon - dec 7th 2010

SARA Hadoop Hackathon, December 7, 2010

http://wiki.apache.org/hadoop/PoweredBy

2010: A Hype in Production

Page 5: Introduction to SARA's Hadoop Hackathon - dec 7th 2010

SARA Hadoop Hackathon, December 7, 2010

Super computingSuper computing

Cluster computingCluster computing

Grid computingGrid computingCloud computingCloud computing

GPU computingGPU computing

http://www.sara.nl/

Page 6: Introduction to SARA's Hadoop Hackathon - dec 7th 2010

SARA Hadoop Hackathon, December 7, 2010

ComputationExpensive!

:-(:-)

DataCheaper!

Data

Computation

Ref: Luiz André Barroso and Urs Hölzle, Google Inc.   The Datacenter as a Computer: An Introduction to the Design of Warehouse­Scale Machines

Page 7: Introduction to SARA's Hadoop Hackathon - dec 7th 2010

SARA Hadoop Hackathon, December 7, 2010

DN TT DN TT DN TT DN TT

DN TT DN TT DN TT DN TT

NameNode JobTracker

DN

TT

DataNode

TaskTracker

Page 8: Introduction to SARA's Hadoop Hackathon - dec 7th 2010

SARA Hadoop Hackathon, December 7, 2010

File Map ReduceShuffle Output

$ echo “${email#*@}, ${name}” $ sort $ wc ­l

ewi.utwente.nl, 1gmail.com,      2nbic.nl,        1nikhef.nl,      3sara.nl,        1

Page 9: Introduction to SARA's Hadoop Hackathon - dec 7th 2010

SARA Hadoop Hackathon, December 7, 2010

From: Hadoop, The Definitive Guide (2nd Edition), Tom White

Page 10: Introduction to SARA's Hadoop Hackathon - dec 7th 2010

SARA Hadoop Hackathon, December 7, 2010

Today

09.30 - 09.50 Welcome & Introduction09.50 - 10.15 Map/Reduce @ University of Twente10.15 - 10.30 Kick-off hackathon14.00 - 15.00 Optional: SARA tour10.30 - 17.00 Hackathon17.00 - 17.30 Results and closing