Ha Do Op Tutorial
Transcript of Ha Do Op Tutorial
-
8/14/2019 Ha Do Op Tutorial
1/13
Hands-On HadoopHands-On Hadoop
TutorialTutorialChris SosaChris Sosa
Wolfgang RichterWolfgang RichterMay 23, 2008May 23, 2008
-
8/14/2019 Ha Do Op Tutorial
2/13
General InformationGeneral Information
Hadoop uses HDFS, a distributed fileHadoop uses HDFS, a distributed filesystem based on GFS, as its sharedsystem based on GFS, as its sharedfilesystemfilesystem
HDFS architecture divides files intoHDFS architecture divides files intolarge chunks (~64MB) distributedlarge chunks (~64MB) distributed
across data serversacross data servers
HDFS has a global namespaceHDFS has a global namespace
-
8/14/2019 Ha Do Op Tutorial
3/13
General Information (contd)General Information (contd)
Provided a script for your convenienceProvided a script for your convenience Run source /localtmp/hadoop/setupVars fromRun source /localtmp/hadoop/setupVars from
centurtion064centurtion064
Changes all uses of {somePath}/command to justChanges all uses of {somePath}/command to just
commandcommand
GotoGoto http://www.cs.virginia.edu/~cbs6n/hadoophttp://www.cs.virginia.edu/~cbs6n/hadoopfor web access. These slides and morefor web access. These slides and moreinformation are also available there.information are also available there.
Once you use the DFS (put something in it),Once you use the DFS (put something in it),relative paths are from /usr/{your usr id}. E.G. ifrelative paths are from /usr/{your usr id}. E.G. ifyour id is tb28 your home dir is /usr/tb28your id is tb28 your home dir is /usr/tb28
http://www.cs.virginia.edu/~cbs6n/hadoophttp://www.cs.virginia.edu/~cbs6n/hadoophttp://www.cs.virginia.edu/~cbs6n/hadoop -
8/14/2019 Ha Do Op Tutorial
4/13
Master NodeMaster Node
Hadoop currently configured withHadoop currently configured with
centurion064 as the master nodecenturion064 as the master node
Master nodeMaster node
Keeps track of namespace andKeeps track of namespace and
metadata about itemsmetadata about items
Keeps track of MapReduce jobs in theKeeps track of MapReduce jobs in the
systemsystem
-
8/14/2019 Ha Do Op Tutorial
5/13
Slave NodesSlave Nodes
Centurion064 also acts as a slaveCenturion064 also acts as a slave
nodenode
Slave nodesSlave nodes
Manage blocks of data sent from masterManage blocks of data sent from master
nodenode
In terms of GFS, these are theIn terms of GFS, these are the
chunkserverschunkservers
Currently centurion060 is alsoCurrently centurion060 is also
-
8/14/2019 Ha Do Op Tutorial
6/13
Hadoop PathsHadoop Paths
Hadoop is locally installed on eachHadoop is locally installed on eachmachinemachine Installed location is inInstalled location is in
/localtmp/hadoop/hadoop-0.15.3/localtmp/hadoop/hadoop-0.15.3
Slave nodes store their data inSlave nodes store their data in/localtmp/hadoop/hadoop-dfs (this is/localtmp/hadoop/hadoop-dfs (this isautomatically created by the DFS)automatically created by the DFS)
/localtmp/hadoop is owned by group gbg/localtmp/hadoop is owned by group gbg
(someone in this group must administer this or(someone in this group must administer this ora cs admin)a cs admin)
Files are divided into 64 MB chunks (this isFiles are divided into 64 MB chunks (this is
configurable)configurable)
-
8/14/2019 Ha Do Op Tutorial
7/13
Starting / Stopping HadoopStarting / Stopping Hadoop
For the purposes of this tutorial, weFor the purposes of this tutorial, we
assume you have run the setupVarsassume you have run the setupVars
from earlierfrom earlier
start-all.sh starts all slave nodesstart-all.sh starts all slave nodes
and master nodeand master node
stop-all.sh stops all slave nodes andstop-all.sh stops all slave nodes and
master nodemaster node
-
8/14/2019 Ha Do Op Tutorial
8/13
Using HDFS (1/2)Using HDFS (1/2)
hadoop dfshadoop dfs [-ls ][-ls ] [-du ][-du ] [-cp ][-cp ] [-rm ][-rm ] [-put ][-put ]
[-copyFromLocal ][-copyFromLocal ] [-moveFromLocal ][-moveFromLocal ] [-get [-crc] ][-get [-crc] ] [-cat ][-cat ] [-copyToLocal [-crc] ][-copyToLocal [-crc] ] [-moveToLocal [-crc] ][-moveToLocal [-crc] ] [-mkdir ][-mkdir ] [-touchz ][-touchz ] [-test -[ezd] ][-test -[ezd] ] [-stat [format] ][-stat [format] ] [-help [cmd]][-help [cmd]]
-
8/14/2019 Ha Do Op Tutorial
9/13
Using HDFS (2/2)Using HDFS (2/2)
Want to reformat?Want to reformat?
EasyEasy
hadoop namenode formathadoop namenode format
Basically we see most commands lookBasically we see most commands look
similarsimilar hadoop some command optionshadoop some command options
If you just type hadoop you get all possibleIf you just type hadoop you get all possible
commands (including undocumented ones commands (including undocumented ones
hooray)hooray)
-
8/14/2019 Ha Do Op Tutorial
10/13
To Add Another SlaveTo Add Another Slave
This adds another data node / jobThis adds another data node / jobexecution site to the poolexecution site to the pool Hadoop dynamically uses filesystemHadoop dynamically uses filesystem
underneath itunderneath it
If more space is available on the HDD, HDFSIf more space is available on the HDD, HDFSwill try to use it when it needs towill try to use it when it needs to Modify the slaves fileModify the slaves file
In centurion064:/localtmp/hadoop/hadoop-In centurion064:/localtmp/hadoop/hadoop-0.15.3/conf0.15.3/conf
Copy code installation dir toCopy code installation dir tonewMachine:/localtmp/hadoop/hadoop-0.15.3newMachine:/localtmp/hadoop/hadoop-0.15.3(very small)(very small)
Restart HadoopRestart Hadoop
-
8/14/2019 Ha Do Op Tutorial
11/13
Configure HadoopConfigure Hadoop
Can configure in {$installation dir}/confCan configure in {$installation dir}/conf hadoop-default.xml for globalhadoop-default.xml for global
hadoop-site.xml for site specific (overrideshadoop-site.xml for site specific (overrides
global)global)
-
8/14/2019 Ha Do Op Tutorial
12/13
Thats it for Configuration!Thats it for Configuration!
-
8/14/2019 Ha Do Op Tutorial
13/13
Real-time AccessReal-time Access