Hadoop single cluster installation
-
Upload
minh-tran -
Category
Technology
-
view
2.771 -
download
1
description
Transcript of Hadoop single cluster installation
![Page 1: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/1.jpg)
Hadoop Single Cluster Installation
Minh Tran – Software Architect
05/2013
![Page 2: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/2.jpg)
Prerequisites
• Ubuntu Server 10.04 (Lucid Lynx)• JDK 6u34 Linux• Hadoop 1.0.4• VMWare Player / VMWare Workstation /
VMWare Server• Ubuntu Server VMWare Image:
http://www.thoughtpolice.co.uk/vmware/#ubuntu10.04 (notroot / thoughtpolice)
![Page 3: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/3.jpg)
Install SSH
• sudo apt-get update• sudo apt-get install openssh-server
![Page 4: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/4.jpg)
Install JDK
• wget -c -O jdk-6u34-linux-i586.bin http://download.oracle.com/otn/java/jdk/6u34-b04/jdk-6u34-linux-i586.bin?AuthParam=1347897296_c6dd13e0af9e099dc731937f95c1cd01
• chmod 777 jdk-6u34-linux-i586.bin• ./jdk-6u34-linux-i586.bin• sudo mv jdk1.6.0_34 /usr/local• sudo ln -s /usr/local/jdk1.6.0_34 /usr/local/jdk
![Page 5: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/5.jpg)
Create group / account for Hadoop
• sudo addgroup hadoop• sudo adduser --ingroup hadoop hduser
![Page 6: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/6.jpg)
Install Local Hadoop
• wget http://mirrors.digipower.vn/apache/hadoop/common/hadoop-1.0.4/hadoop-1.0.4.tar.gz
• tar -zxvf hadoop-1.0.4.tar.gz• sudo mv hadoop-1.0.4 /usr/local• sudo chown -R hduser:hadoop /usr/local/hadoop-1.0.4• sudo ln -s /usr/local/hadoop-1.0.4 /usr/local/hadoop
![Page 7: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/7.jpg)
Install Apache Ant
• wget http://mirrors.digipower.vn/apache/ant/binaries/apache-ant-1.9.0-bin.tar.gz
• tar -zxvf apache-ant-1.9.0-bin.tar.gz• sudo mv apache-ant-1.9.0 /usr/local• sudo ln -s /usr/local/apache-ant-1.9.0 /usr/local/apache-ant
![Page 8: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/8.jpg)
Modify environment variables
• su - hduser• vi .bashrc• export JAVA_HOME=/usr/local/jdk• export HADOOP_PREFIX=/usr/local/hadoop• export PATH=${JAVA_HOME}/bin:${HADOOP_PREFIX}/bin:$
{PATH}• . .bashrc
![Page 9: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/9.jpg)
Try 1st examplehduser@ubuntu:/usr/local/hadoop$ cd $HADOOP_PREFIX
hduser@ubuntu:/usr/local/hadoop$ hadoop jar hadoop-examples-1.0.4.jar pi 2 10Number of Maps = 2
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Starting Job
13/04/03 15:01:40 INFO mapred.FileInputFormat: Total input paths to process : 2
13/04/03 15:01:41 INFO mapred.JobClient: Running job: job_201304031458_0003
13/04/03 15:01:42 INFO mapred.JobClient: map 0% reduce 0%
13/04/03 15:02:00 INFO mapred.JobClient: map 100% reduce 0%
13/04/03 15:02:15 INFO mapred.JobClient: map 100% reduce 100%
13/04/03 15:02:19 INFO mapred.JobClient: Job complete: job_201304031458_0003
13/04/03 15:02:19 INFO mapred.JobClient: Counters: 30
13/04/03 15:02:19 INFO mapred.JobClient: Job Counters
…
13/04/03 15:02:19 INFO mapred.JobClient: Reduce output records=0
13/04/03 15:02:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1118670848
13/04/03 15:02:19 INFO mapred.JobClient: Map output records=4
Job Finished in 39.148 seconds
Estimated value of Pi is 3.80000000000000000000
![Page 10: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/10.jpg)
Setup Single Node Cluster
• Disabling ipv6• Configuring SSH• Configuration
– hadoop-env.sh– conf/*-site.xml
• Start / Stop node cluster• Running MapReduce job
![Page 11: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/11.jpg)
Disabling ipv6
• Open /etc/sysctl.conf, add following lines# disable ipv6net.ipv6.conf.all.disable_ipv6 = 1net.ipv6.conf.default.disable_ipv6 = 1net.ipv6.conf.lo.disable_ipv6 = 1
• Reboot your machine
• Verify ipv6 enabled / disabledcat /proc/sys/net/ipv6/conf/all/disable_ipv6
(0 – enabled, 1 – disabled)
![Page 12: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/12.jpg)
Configuring SSH
• Create SSH keys in the localhostsu - hduser
ssh-keygen -t rsa -P "“
• Put the key id_rsa.pub to localhosttouch ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
![Page 13: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/13.jpg)
Configuration
• Edit the configuration in /usr/local/hadoop/conf/hadoop-env.sh, add following lines:
export JAVA_HOME=/usr/local/jdk
![Page 14: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/14.jpg)
Configuration (cont.)
• Create a folder to store data for nodesudo mkdir -p /hadoop_data/name
sudo mkdir -p /hadoop_data/data
sudo mkdir -p /hadoop_data/temp
sudo chown hduser:hadoop /hadoop_data/name
sudo chown hduser:hadoop /hadoop_data/data
sudo chown hduser:hadoop /hadoop_data/temp
![Page 15: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/15.jpg)
conf/core-site.xml<configuration> <property> <name>hadoop.tmp.dir</name> <value>/hadoop_data/temp</value> <description>A base for other temporary directories.</description> </property>
<property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property></configuration>
![Page 16: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/16.jpg)
conf/mapred-site.xml<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property></configuration>
![Page 17: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/17.jpg)
conf/hdfs-site.xml<configuration> <property> <name>dfs.name.dir</name> <!-- Path to store namespace and transaction logs --> <value>/hadoop_data/name</value> </property> <property> <name>dfs.data.dir</name> <!-- Path to store data blocks in datanode --> <value>/hadoop_data/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property></configuration>
![Page 18: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/18.jpg)
Format a new systemnotroot@ubuntu:/usr/local/hadoop/conf$ su - hduserPassword:hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format13/04/03 13:41:24 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = ubuntu.localdomain/127.0.1.1STARTUP_MSG: args = [-format]STARTUP_MSG: version = 1.0.4STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012************************************************************/Re-format filesystem in /hadoop_data/name ? (Y or N) Y13/04/03 13:41:26 INFO util.GSet: VM type = 32-bit13/04/03 13:41:26 INFO util.GSet: 2% max memory = 19.33375 MB13/04/03 13:41:26 INFO util.GSet: capacity = 2^22 = 4194304 entries….13/04/03 13:41:28 INFO common.Storage: Storage directory /hadoop_data/name has been successfully formatted.13/04/03 13:41:28 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at ubuntu.localdomain/127.0.1.1************************************************************/
Do not format a running Hadoop file system as you will lose all the data currently in the cluster (in HDFS)!
![Page 19: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/19.jpg)
Start Single Node Clusterhduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.shstarting namenode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-namenode-ubuntu.outlocalhost: starting datanode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-datanode-ubuntu.outlocalhost: starting secondarynamenode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-secondarynamenode-ubuntu.outstarting jobtracker, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-jobtracker-ubuntu.outlocalhost: starting tasktracker, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-tasktracker-ubuntu.out
![Page 20: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/20.jpg)
How to verify Hadoop process• A nifty tool for checking whether the expected Hadoop processes are running is jps
(part of Sun JDK tool)hduser@ubuntu:~$ jps1203 NameNode1833 Jps1615 JobTracker1541 SecondaryNameNode1362 DataNode1788 TaskTracker
• You can also check with netstat if Hadoop is listening on the configured ports.notroot@ubuntu:/usr/local/hadoop/conf$ sudo netstat -plten | grep javatcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 1001 7167 2438/javatcp 0 0 127.0.0.1:54311 0.0.0.0:* LISTEN 1001 7949 2874/javatcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1001 7898 2791/javatcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1001 8035 2874/javatcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 7202 2438/javatcp 0 0 0.0.0.0:57143 0.0.0.0:* LISTEN 1001 7585 2791/javatcp 0 0 0.0.0.0:41943 0.0.0.0:* LISTEN 1001 7222 2608/javatcp 0 0 0.0.0.0:58936 0.0.0.0:* LISTEN 1001 6969 2438/javatcp 0 0 127.0.0.1:50234 0.0.0.0:* LISTEN 1001 8158 3050/javatcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 7697 2608/javatcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1001 7775 2608/javatcp 0 0 0.0.0.0:40067 0.0.0.0:* LISTEN 1001 7764 2874/javatcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1001 7939 2608/java
![Page 21: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/21.jpg)
Stop your single node clusterhduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.sh
stopping jobtrackerlocalhost: stopping tasktrackerstopping namenodelocalhost: stopping datanodelocalhost: stopping secondarynamenode
![Page 22: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/22.jpg)
Running a MapReduce job
• We will use three ebooks from Project Gutenberg for this example:– The Outline of Science, Vol. 1 (of 4) by J. Arthur Thomson– The Notebooks of Leonardo Da Vinci– Ulysses by James Joyce
• Download each ebook as text files in Plain Text UTF-8 encoding and store the files in /tmp/gutenberg
![Page 23: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/23.jpg)
Running a MapReduce job (cont.)
• Copy these files into HDFShduser@ubuntu:~$ hadoop dfs -copyFromLocal /tmp/gutenberg /user/hduser/gutenberghduser@ubuntu:~$ hadoop dfs -ls /user/hduser/gutenbergFound 3 items-rw-r--r-- 1 hduser supergroup 661807 2013-04-03 14:01 /user/hduser/gutenberg/pg20417.txt-rw-r--r-- 1 hduser supergroup 1540092 2013-04-03 14:01 /user/hduser/gutenberg/pg4300.txt-rw-r--r-- 1 hduser supergroup 1391684 2013-04-03 14:01 /user/hduser/gutenberg/pg5000.txt
![Page 24: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/24.jpg)
Running a MapReduce job (cont.)
hduser@ubuntu:~$ cd /usr/local/hadoop
hduser@ubuntu:/usr/local/hadoop$ hadoop jar hadoop-examples-1.0.4.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output13/04/03 14:02:45 INFO input.FileInputFormat: Total input paths to process : 313/04/03 14:02:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library13/04/03 14:02:45 WARN snappy.LoadSnappy: Snappy native library not loaded13/04/03 14:02:45 INFO mapred.JobClient: Running job: job_201304031352_000113/04/03 14:02:46 INFO mapred.JobClient: map 0% reduce 0%13/04/03 14:03:09 INFO mapred.JobClient: map 66% reduce 0%13/04/03 14:03:32 INFO mapred.JobClient: map 100% reduce 0%13/04/03 14:03:47 INFO mapred.JobClient: map 100% reduce 100%13/04/03 14:03:53 INFO mapred.JobClient: Job complete: job_201304031352_000113/04/03 14:03:53 INFO mapred.JobClient: Counters: 2913/04/03 14:03:53 INFO mapred.JobClient: Job Counters13/04/03 14:03:53 INFO mapred.JobClient: Launched reduce tasks=1
…13/04/03 14:03:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5911413/04/03 14:03:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=36113/04/03 14:03:53 INFO mapred.JobClient: Reduce input records=10232113/04/03 14:03:53 INFO mapred.JobClient: Reduce input groups=8233413/04/03 14:03:53 INFO mapred.JobClient: Combine output records=10232113/04/03 14:03:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=57606963213/04/03 14:03:53 INFO mapred.JobClient: Reduce output records=8233413/04/03 14:03:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=149048115213/04/03 14:03:53 INFO mapred.JobClient: Map output records=629172
![Page 25: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/25.jpg)
Check the result• hduser@ubuntu:/usr/local/hadoop$ hadoop dfs -ls /user/hduser/gutenberg-output
Found 3 items-rw-r--r-- 1 hduser supergroup 0 2013-04-03 14:03 /user/hduser/gutenberg-output/_SUCCESSdrwxr-xr-x - hduser supergroup 0 2013-04-03 14:02 /user/hduser/gutenberg-output/_logs-rw-r--r-- 1 hduser supergroup 880829 2013-04-03 14:03 /user/hduser/gutenberg-output/part-r-00000
• hduser@ubuntu:/usr/local/hadoop$ hadoop dfs -cat /user/hduser/gutenberg-output/part-r-00000 | more"(Lo)cra" 1"1490 1"1498," 1"35" 1"40," 1"A 2"AS-IS". 1"A_ 1"Absoluti 1"Alack! 1"Alack!" 1"Alla 1"Allegorical 1"Alpha 1"Alpha," 1…
![Page 26: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/26.jpg)
Hadoop Interfaces
• NameNode Web UI: http://192.168.65.134:50070/
• JobTracker Web UI: http://192.168.65.134:50030/
• TaskTracker Web UI: http://192.168.65.134:50060/
![Page 27: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/27.jpg)
NameNode Web UI daemon
![Page 28: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/28.jpg)
JobTracker Web UI
![Page 29: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/29.jpg)
TaskTracker Web UI
![Page 30: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/30.jpg)
Troubleshooting
• VMware Ubuntu image lost eth0 after moving it http://www.whiteboardcoder.com/2012/03/vmware-ubuntu-image-lost-eth0-after.html
• Hadoop Troubleshooting: http://wiki.apache.org/hadoop/TroubleShooting
• Error when formatting the Hadoop filesystem: http://askubuntu.com/questions/35551/error-when-formatting-the-hadoop-filesystem
![Page 31: Hadoop single cluster installation](https://reader034.fdocuments.net/reader034/viewer/2022050905/54b796444a79591d4a8b4703/html5/thumbnails/31.jpg)
THANK YOU