Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine...

31
Working with Hadoop

description

Start the Virtual Machine

Transcript of Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine...

Page 1: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Working with Hadoop

Page 2: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Requirement

• Virtual machine software – VM Ware– VirtualBox

• Virtual machine images– Download from Cloudera (Founded by leaders in the field, including father of Hadoop)

Page 3: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Start the Virtual Machine

Page 4: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Inside the Virtual machine

•CentOS 6.4•JDK•Hadoop 2.5.0•Eclipse 4.2.6 (Juno)

Page 5: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Basics of HDFS (routine)

5

• With Terminal– hadoop– hadoop version– hadoop jar– hadoop fs …– hadoop fs -ls : List all file in HDFS– hadoop fs –put / -get / -mkdir / -rmdir...

Page 6: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Copy Files from Windows to VM

• WinSCP (see Demo at bin\scp_ssh\winscp575)– Protocol scp– Hostname (Get from ifconfig in Terminal)– Username/Passoword = cloudera/cloudera

6

Page 7: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Copy Files from VM (CentOS) to HDFS

• hadoop fs -put localfiles /user/cloudera

7

Page 8: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Copy Files from Windows to HDFS

• Via HUE services

8

Page 9: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Using web server – port 8888 (File manager)

Page 10: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Hadoop Administration

• http://hostname:50070/dfshealth.html#tab-overview

10

Page 11: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

WordCount Example in HadoopWordCount Example in Hadoop

• #1: Via guidelines in Cloudera website• #2: Directly in Eclipse (Preferred)

Page 12: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

WordCount in Cloudera Website

• http://www.cloudera.com/content/cloudera/en/documentation/hadoop-tutorial/CDH5/Hadoop-Tutorial/ht_wordcount1.html

• Source code downloaded from http://tiny.cloudera.com/hadoopTutorialSample

• Source code details and explanations: http://www.cloudera.com/content/cloudera/en/documentation/hadoop-tutorial/CDH5/Hadoop-Tutorial/ht_wordcount1_source.html

12

Page 13: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

WordCount in Cloudera Website

• Create directory in HDFS– $ hadoop fs -mkdir /user/cloudera – $ hadoop fs -chown cloudera /user/cloudera– $ hadoop fs -mkdir /user/cloudera/wordcount

/user/cloudera/wordcount/input• Create sample text

– 1: Directly in CentOS $$ echo "Hadoop is an elephant" > file0 $ echo "Hadoop is as yellow as can be" > file1 $ echo "Oh what a yellow fellow is Hadoop" > file2And then move to HDFS$ hadoop fs -put file* /user/cloudera/wordcount/input– 2: Create in Windows and Copy to HDFS via HUE

13

Page 14: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

WordCount in Cloudera Website

• Compilation error

14

Page 15: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

WordCount Example in HadoopWordCount Example in Hadoop

• #1: Via guidelines in Cloudera website• #2: Directly in Eclipse (Preferred)

Page 16: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

WordCount in Eclipse environment

• http://kishorer.in/2014/10/22/running-a-wordcount-mapreduce-example-in-hadoop-2-4-1-single-node-cluster-in-ubuntu-14-04-64-bit/

• https://www.youtube.com/watch?v=hJsaChh2Yhk (Some parts are different for ClouderaVM)

16

Page 17: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.
Page 18: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

18

Page 19: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

19

Page 20: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Update source codes (from website)

20

Page 21: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Adding JAR files to Project

21

Page 22: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

usr/lib/hadoop; usr/lib/hadoop/lib;usr/lib/hadoop-mapreduce; usr/lib/hadoop-mapreduce/lib

22

Page 23: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Run ConfigRun Run Configurations

23

Page 24: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

File Export

24

Page 25: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

25

Page 26: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Update Properties in jar file

26

Page 27: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Prepare for run

• Make HDFS directory

27

Page 28: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Copy sample input to HDFS (via HUE)

28

Page 29: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Run the example (in .jar folder)(Make sure to remove output folder before

use)

29

Page 30: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

View the result

30

Page 31: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.

Other sources

• Very nice example @ https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

31