Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014
-
Upload
scape-project -
Category
Technology
-
view
103 -
download
0
description
Transcript of Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014
![Page 1: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014](https://reader034.fdocuments.net/reader034/viewer/2022051323/547a039db4af9fce158b4989/html5/thumbnails/1.jpg)
Per Møldrup-Dalum State and University Library
SCAPE Information Day State and University Library, Denmark, 2014-06-25
Hadoop and its applications at the State and University Library
![Page 2: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014](https://reader034.fdocuments.net/reader034/viewer/2022051323/547a039db4af9fce158b4989/html5/thumbnails/2.jpg)
• A bit on Hadoop in general • A bit on our experience in deploying Hadoop at the
library
2
Agenda
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
![Page 3: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014](https://reader034.fdocuments.net/reader034/viewer/2022051323/547a039db4af9fce158b4989/html5/thumbnails/3.jpg)
• MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Senjay Ghemawat, 2004
• In 2005 Cutting and Cafarella created Hadoop at Yahoo! • Now an Apache project • Commercial distributions, community editions, DIY
3
Origins
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
![Page 4: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014](https://reader034.fdocuments.net/reader034/viewer/2022051323/547a039db4af9fce158b4989/html5/thumbnails/4.jpg)
4
Map/Reduce
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
MAP
REDUCE
![Page 5: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014](https://reader034.fdocuments.net/reader034/viewer/2022051323/547a039db4af9fce158b4989/html5/thumbnails/5.jpg)
5
Lorem ipsum
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
• Count addresses that have fruits etc in their street name • Kirsebærhaven • Jordbærvej • Nødde allé
• Result • Kirsebær: 1203 • Nødder: 34 • Jordbær: 543
![Page 6: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014](https://reader034.fdocuments.net/reader034/viewer/2022051323/547a039db4af9fce158b4989/html5/thumbnails/6.jpg)
6
The Zoo
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
HDFS – data locality MapReduce
•••
![Page 7: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014](https://reader034.fdocuments.net/reader034/viewer/2022051323/547a039db4af9fce158b4989/html5/thumbnails/7.jpg)
7
Hadoop at the Library
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
![Page 8: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014](https://reader034.fdocuments.net/reader034/viewer/2022051323/547a039db4af9fce158b4989/html5/thumbnails/8.jpg)
• Blade servers with no local storage • Storage exclusively on NAS • We‘ve done several experiments
8
Can it be done?
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
Existing infrastructure
CPU Storage
![Page 9: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014](https://reader034.fdocuments.net/reader034/viewer/2022051323/547a039db4af9fce158b4989/html5/thumbnails/9.jpg)
4 CPU nodes • Two 6-core CPU • Intel® Xeon® Processor
X5670 with 12M Cache, 2.93 GHz, and 6.40 GT/s Intel® QPI
• 96GB RAM • 2Gbit Ethernet interface • CentOS • NFS mount point on NAS for
HDFS • Reachable NAS storage: ~4PB
9
Cluster topology
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
Science Museum/Science & Society Picture Library
![Page 10: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014](https://reader034.fdocuments.net/reader034/viewer/2022051323/547a039db4af9fce158b4989/html5/thumbnails/10.jpg)
10
Cloudera Hadoop Distribution
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
![Page 11: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014](https://reader034.fdocuments.net/reader034/viewer/2022051323/547a039db4af9fce158b4989/html5/thumbnails/11.jpg)
11
Interface
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
![Page 12: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014](https://reader034.fdocuments.net/reader034/viewer/2022051323/547a039db4af9fce158b4989/html5/thumbnails/12.jpg)
• http://hadoop.apache.org • http://www.cloudera.com • http://static.googleusercontent.com/media/research.g
oogle.com/en//archive/mapreduce-osdi04.pdf
12
References
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
![Page 13: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014](https://reader034.fdocuments.net/reader034/viewer/2022051323/547a039db4af9fce158b4989/html5/thumbnails/13.jpg)
13 This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).