How to Set Up a Hadoop Cluster Using Oracle Solaris

download How to Set Up a Hadoop Cluster Using Oracle Solaris

of 15

description

hadoop with oracle solaris

Transcript of How to Set Up a Hadoop Cluster Using Oracle Solaris

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 1/15

    Archive

    AutoServiceRequest(ASR)

    AllSystemAdminArticles

    AllSystemsTopics

    CoolThreads

    DST

    EndofNotices

    FAQ

    HandsOnLabs

    HighPerformanceComputing

    Interoperability

    Patches

    Security

    SoftwareStacks

    SolarisDeveloper

    SolarisHowTo

    SolarisStudioIDETopics

    SysadminDays

    SystemAdminDocs

    Upgrade

    VMServerforSPARC

    DidyouKnow

    JetToolkit

    OracleACESforSystems

    OracleonDell

    Wanttocommentonthisarticle?PostthelinkonFacebook'sOTN

    Garagepage.Haveasimilararticletoshare?

    BringituponFacebookorTwitterandlet'sdiscuss.

    HowtoSetUpaHadoopClusterUsingOracleSolarisHandsOnLabsoftheSystemAdminandDeveloperCommunityofOTN

    byOrgadKimchi

    HowtosetupaHadoopclusterusingtheOracleSolarisZones,ZFS,andnetworkvirtualizationtechnologies.

    PublishedOctober2013

    TableofContentsLabIntroductionPrerequisitesSystemRequirementsSummaryofLabExercisesTheCaseforHadoopExercise1:InstallHadoopExercise2:EdittheHadoopConfigurationFilesExercise3:ConfiguretheNetworkTimeProtocolExercise4:CreatetheVirtualNetworkInterfacesExercise5:CreatetheNameNodeandSecondaryNameNodeZonesExercise6:SetUptheDataNodeZonesExercise7:ConfiguretheNameNodeExercise8:SetUpSSHExercise9:FormatHDFSfromtheNameNodeExercise10:StarttheHadoopClusterExercise11:RunaMapReduceJobExercise12:UseZFSEncryptionExercise13:UseOracleSolarisDTraceforPerformanceMonitoringSummarySeeAlsoAbouttheAuthor

    Expectedduration:180minutes

    LabIntroductionThishandsonlabpresentsexercisesthatdemonstratehowtosetupanApacheHadoopclusterusingOracleSolaris11technologiessuchasOracleSolarisZones,ZFS,andnetworkvirtualization.KeytopicsincludetheHadoopDistributedFileSystem(HDFS)andtheHadoopMapReduceprogrammingmodel.

    WewillalsocovertheHadoopinstallationprocessandtheclusterbuildingblocks:NameNode,asecondaryNameNode,andDataNodes.Inaddition,youwillseehowyoucancombinetheOracleSolaris11technologiesforbetterscalabilityanddatasecurity,andyouwilllearnhowtoloaddataintotheHadoopclusterandrunaMapReducejob.

    PrerequisitesThishandsonlabisappropriateforsystemadministratorswhowillbesettingupormaintainingaHadoopclusterinproductionordevelopmentenvironments.BasicLinuxorOracleSolarissystemadministrationexperienceisaprerequisite.PriorknowledgeofHadoopisnotrequired.

    SystemRequirementsThishandsonlabisrunonOracleSolaris11inOracleVMVirtualBox.Thelabisselfcontained.AllyouneedisintheOracleVMVirtualBoxinstance.

    ForthoseattendingthelabatOracleOpenWorld,yourlaptopsarealreadypreloadedwiththecorrectOracleVMVirtualBoximage.

    IfyouwanttotrythislaboutsideofOracleOpenWorld,youwillneedanOracleSolaris11system.Dothefollowingtosetupyourmachine:

    IfyoudonothaveOracleSolaris11,downloadithere.DownloadtheOracleSolaris11.1VirtualBoxTemplate(filesize1.7GB).Installthetemplateasdescribedhere.(Note:Onstep4ofExercise2forinstallingthetemplate,settheRAMsizeto4GBinordertogetgoodperformance.)NotesforOracleOpenWorldAttendeesEachattendeewillhavehisorherownlaptopforthelab.Theloginnameandpasswordforthislabareprovidedina"onepager."OracleSolaris11usestheGNOMEdesktop.IfyouhaveusedthedesktopsonLinuxorotherUNIXoperatingsystems,theinterfaceshouldbefamiliar.Herearesomequickbasicsincasetheinterfaceisnewforyou.

    InordertoopenaterminalwindowintheGNOMEdesktopsystem,rightclickthebackgroundofthedesktop,andselectOpenTerminalinthepopupmenu.Thefollowingsourcecodeeditorsareprovidedonthelabmachines:vi(typeviinaterminalwindow)andemacs(typeemacsinaterminalwindow).SummaryofLabExercisesThishandsonlabconsistsof13exercisescoveringvariousOracleSolarisandApacheHadooptechnologies:

    InstallHadoop.EdittheHadoopconfigurationfiles.ConfiguretheNetworkTimeProtocol.Createthevirtualnetworkinterfaces(VNICs).CreatetheNameNodeandthesecondaryNameNodezones.SetuptheDataNodezones.ConfiguretheNameNode.SetupSSH.FormatHDFSfromtheNameNode.StarttheHadoopcluster.RunaMapReducejob.SecuredataatrestusingZFSencryption.UseOracleSolarisDTraceforperformancemonitoring.TheCaseforHadoopTheApacheHadoopsoftwareisaframeworkthatallowsforthedistributedprocessingoflargedatasetsacrossclustersofcomputersusingsimpleprogrammingmodels.

    Tostoredata,HadoopusestheHadoopDistributedFileSystem(HDFS),whichprovideshighthroughputaccesstoapplicationdataandissuitableforapplicationsthathavelargedatasets.

    FormoreinformationaboutHadoopandHDFS,seehttp://hadoop.apache.org/.

    TheHadoopclusterbuildingblocksareasfollows:

    NameNode:ThecenterpieceofHDFS,whichstoresfilesystemmetadata,directstheslaveDataNodedaemonstoperformthelowlevelI/Otasks,

    OracleTechnologyNetwork SystemAdminsandDevelopers HandsOnLabs

    Products Solutions Downloads Store Support Training Partners About OTN

    Account SignOut Help Country Communities Iama... Iwantto...

    Welcomerushi

    Search

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 2/15

    andalsorunstheJobTrackerprocess.SecondaryNameNode:PerformsinternalchecksoftheNameNodetransactionlog.DataNodes:NodesthatstorethedatainHDFS,whicharealsoknownasslavesandruntheTaskTrackerprocess.Intheexamplepresentedinthislab,alltheHadoopclusterbuildingblockswillbeinstalledusingtheOracleSolarisZones,ZFS,andnetworkvirtualizationtechnologies.Figure1showsthearchitecture:

    Figure1

    Exercise1:InstallHadoopInOracleVMVirtualBox,enableabidirectional"sharedclipboard"betweenthehostandtheguestinordertoenablecopyingandpastingtextfromthisfile.

    Figure2

    OpenaterminalwindowbyrightclickinganypointinthebackgroundofthedesktopandselectingOpenTerminalinthepopupmenu.

    Figure3

    Next,switchtotherootuserusingthefollowingcommand.

    Note:ForOracleOpenWorldattendees,therootpasswordhasbeenprovidedintheonepagerassociatedwiththislab.ForthoserunningthislaboutsideofOracleOpenWorld,entertherootpasswordyouenteredwhenyoufollowedthestepsinthe"SystemRequirements"section.

    root@global_zone:~# su -Password:Oracle Corporation SunOS 5.11 11.1 September 2012Setupthevirtualnetworkinterfacecard(VNIC)inordertoenablenetworkaccesstotheglobalzonefromthenonglobalzones.

    Note:OracleOpenWorldattendeescanskipthisstep(becausethepreloadedOracleVMVirtualBoximagealreadyprovidesconfiguredVNICs)andgodirectlytostep16,"Browsethelabsupplementmaterials."

    root@global_zone:~# dladm create-vnic -l net0 vnic0root@global_zone:~# ipadm create-ip vnic0root@global_zone:~# ipadm create-addr -T static -a local=192.168.1.100/24 vnic0/addrVerifytheVNICcreation:

    root@global_zone:~# ipadm show-addr vnic0ADDROBJ TYPE STATE ADDRvnic0/addr static ok 192.168.1.100/24Createthehadoopholdirectorywewilluseittostorethelabsupplementmaterialsassociatedwiththislab,suchasscriptsandinputfiles.

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 3/15

    root@global_zone:~# mkdir -p /usr/local/hadoopholCreatetheBindirectorywewillputtheHadoopbinaryfilethere.

    root@global_zone:~# mkdir /usr/local/hadoophol/BinInthislab,wewillusetheApacheHadoop"23Jul2013,2013:Release1.2.1"release.YoucandownloadtheHadoopbinaryfileusingawebbrowser.OpentheFirefoxwebbrowserfromthedesktopanddownloadthefile.

    Figure4

    CopytheHadooptarballto/usr/local/hadoophol/Bin.

    root@global_zone:~# cp /export/home/oracle/Downloads/hadoop-1.2.1.tar.gz /usr/local/hadoophol/Bin/

    Note:Bydefault,thefileisdownloadedtotheuser'sDownloadsdirectory.

    Next,wearegoingtocreatethelabscripts,socreateadirectoryforthem:

    root@global_zone:~# mkdir /usr/local/hadoophol/ScriptsCreatethecreatezonescriptusingyourfavoriteeditor,asshowninListing1.WewillusethisscripttosetuptheOracleSolarisZones.

    root@global_zone:~# vi /usr/local/hadoophol/Scripts/createzone

    Listing1#!/bin/ksh

    # FILENAME: createzone# Create a zone with a VNIC# Usage:# createzone

    if [ $# != 2 ]then echo "Usage: createzone " exit 1fi

    ZONENAME=$1VNICNAME=$2

    zonecfg -z $ZONENAME > /dev/null 2>&1

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 4/15

    for transaction in _; do

    for i in name-node sec-name-node data-node1 data-node2 data-node3 do

    cmd="zlogin $i ls /usr/local > /dev/null 2>&1 " eval $cmd || break 2

    done

    for i in name-node sec-name-node data-node1 data-node2 data-node3 do cmd="zlogin $i ping name-node > /dev/null 2>&1" eval $cmd || break 2 done

    for i in name-node sec-name-node data-node1 data-node2 data-node3 do cmd="zlogin $i ping sec-name-node > /dev/null 2>&1" eval $cmd || break 2 done

    for i in name-node sec-name-node data-node1 data-node2 data-node3 do cmd="zlogin $i ping data-node1 > /dev/null 2>&1" eval $cmd || break 2 done

    for i in name-node sec-name-node data-node1 data-node2 data-node3 do cmd="zlogin $i ping data-node2 > /dev/null 2>&1" eval $cmd || break 2 done

    for i in name-node sec-name-node data-node1 data-node2 data-node3 do cmd="zlogin $i ping data-node3 > /dev/null 2>&1" eval $cmd || break 2 done

    RET=0done

    if [ $RET == 0 ] ; thenecho "The cluster is verified"elseecho "Error: unable to verify the cluster"fiexit $RET

    CreatetheDocdirectorywewillputtheHadoopinputfilesthere.

    root@global_zone:~# mkdir /usr/local/hadoophol/DocDownloadthefollowingeBookfromProjectGutenbergasaplaintextfilewithUTF8encoding:TheOutlineofScience,Vol.1(of4)byJ.ArthurThomson.Copythedownloadedfile(pg20417.txt)intothe/usr/local/hadoophol/Docdirectory.

    root@global_zone:~# cp ~oracle/Downloads/pg20417.txt /usr/local/hadoophol/Doc/Browsethelabsupplementmaterialsbytypingthefollowingonthecommandline:

    root@global_zone:~# cd /usr/local/hadoophol Onthecommandline,typels -ltoseethecontentofthedirectory:

    root@global_zone:~# ls -ltotal 9drwxr-xr-x 2 root root 2 Jul 8 15:11 Bindrwxr-xr-x 2 root root 2 Jul 8 15:11 Docdrwxr-xr-x 2 root root 2 Jul 8 15:12 Scripts

    Youcanseethefollowingdirectorystructure:

    BinTheHadoopbinarylocationDocTheHadoopinputfilesScriptsThelabscriptsInthislabwewillusetheApacheHadoop"23Jul2013,2013:Release1.2.1"release.CopytheHadooptarballinto/usr/local:

    root@global_zone:~# cp /usr/local/hadoophol/Bin/hadoop-1.2.1.tar.gz /usr/localUnpackthetarball:

    root@global_zone:~# cd /usr/localroot@global_zone:~# tar -xvfz /usr/local/hadoop-1.2.1.tar.gzCreatethehadoopgroup:

    root@global_zone:~# groupadd hadoopAddthehadoopuser:

    root@global_zone:~# useradd -m -g hadoop hadoopSettheuser'sHadooppassword.Youcanusewhateverpasswordthatyouwant,butbesureyourememberthepassword.

    root@global_zone:~# passwd hadoopCreateasymlinkfortheHadoopbinaries:

    root@global_zone:~# ln -s /usr/local/hadoop-1.2.1 /usr/local/hadoopGiveownershiptothehadoopuser:

    root@global_zone:~# chown -R hadoop:hadoop /usr/local/hadoop*Changethepermissions:

    root@global_zone:~# chmod -R 755 /usr/local/hadoop*Exercise2:EdittheHadoopConfigurationFilesInthisexercise,wewilledittheHadoopconfigurationfiles,whichareshowninTable1:

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 5/15

    Table1.HadoopConfigurationFilesFileName Description

    hadoop-env.shSpecifiesenvironmentvariablesettingsusedbyHadoop.core-site.xmlSpecifiesparametersrelevanttoallHadoopdaemonsandclients.mapred-site.xml SpecifiesparametersusedbytheMapReducedaemonsandclients.

    masters ContainsalistofmachinesthatruntheSecondaryNameNode.

    slaves ContainsalistofmachinenamesthatruntheDataNodeandTaskTrackerpairofdaemons.

    TolearnmoreabouthowtheHadoopframeworkiscontrolledbytheseconfigurationfiles,seehttp://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/conf/Configuration.html.

    Runthefollowingcommandtochangetotheconfdirectory:

    root@global_zone:~# cd /usr/local/hadoop/confRunthefollowingcommandstochangethehadoop-env.shscript:

    Note:TheclusterconfigurationwillsharetheHadoopdirectorystructure(/usr/local/hadoop)acrossthezonesasareadonlyfilesystem.EveryHadoopclusternodeneedstobeabletowriteitslogstoanindividualdirectory.Thedirectory/var/log/hadoopisabestpracticedirectoryforeveryOracleSolarisZone.

    root@global_zone:~# echo "export JAVA_HOME=/usr/java" >> hadoop-env.shroot@global_zone:~# echo "export HADOOP_LOG_DIR=/var/log/hadoop" >> hadoop-env.shEditthemastersfiletoreplacethelocalhostentrywiththelineshowninListing3:

    root@global_zone:~# vi masters

    Listing3sec-name-nodeEdittheslavesfiletoreplacethelocalhostentrywiththelinesshowninListing4:

    root@global_zone:~# vi slaves

    Listing4data-node1data-node2data-node3Editthecore-site.xmlfilesoitlookslikeListing5:

    root@global_zone:~# vi core-site.xmlNote:fs.default.nameistheURIthatdescribestheNameNodeaddress(protocolspecifier,hostname,andport)forthecluster.EachDataNodeinstancewillregisterwiththisNameNodeandmakeitsdataavailablethroughit.Inaddition,theDataNodessendheartbeatstotheNameNodetoconfirmthateachDataNodeisoperatingandtheblockreplicasithostsareavailable.

    Listing5

    fs.default.name hdfs://name-node

    Editthehdfs-site.xmlfilesoitlookslikeListing6:

    root@global_zone:~# vi hdfs-site.xml

    Notes:

    dfs.data.diristhepathonthelocalfilesysteminwhichtheDataNodeinstanceshouldstoreitsdata.dfs.name.diristhepathonthelocalfilesystemoftheNameNodeinstancewheretheNameNodemetadataisstored.ItisonlyusedbytheNameNodeinstancetofinditsinformation.dfs.replicationisthedefaultreplicationfactorforeachblockofdatainthefilesystem.(Foraproductioncluster,thisshouldusuallybeleftatitsdefaultvalueof3.)Listing6

    dfs.data.dir /hdfs/data/

    dfs.name.dir /hdfs/name/

    dfs.replication 3

    Editthemapred-site.xmlfilesoitlookslikeListing7:

    root@global_zone:~# vi mapred-site.xml

    Note:mapred.job.trackerisahost:portstringspecifyingtheJobTracker'sRPCaddress.

    Listing7

    mapred.job.tracker name-node:8021

    Exercise3:ConfiguretheNetworkTimeProtocol

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 6/15

    WeshouldensurethatthesystemclockontheHadoopzonesissynchronizedbyusingtheNetworkTimeProtocol(NTP).

    Note:ItisbesttoselectanNTPserverthatcanbeadedicatedtimesynchronizationsourcesothatotherservicesarenotnegativelyaffectedifthemachineisbroughtdownforplannedmaintenance.

    Inthefollowingexample,theglobalzoneisconfiguredasanNTPserver.

    ConfigureanNTPserver:

    root@global_zone:~# cd /etc/inetroot@global_zone:~# cp ntp.server ntp.confroot@global_zone:~# chmod +w /etc/inet/ntp.confroot@global_zone:~# touch /var/ntp/ntp.driftEdittheNTPserverconfigurationfile,asshowninListing8:

    root@global_zone:~# vi /etc/inet/ntp.conf

    Listing8server 127.127.1.0 preferbroadcast 224.0.1.1 ttl 4enable auth monitordriftfile /var/ntp/ntp.driftstatsdir /var/ntp/ntpstats/filegen peerstats file peerstats type day enablefilegen loopstats file loopstats type day enablefilegen clockstats file clockstats type day enablekeys /etc/inet/ntp.keystrustedkey 0requestkey 0controlkey 0

    EnabletheNTPserverservice:

    root@global_zone:~# svcadm enable ntpVerifythattheNTPserverisonlinebyusingthefollowingcommand:

    root@global_zone:~# svcs -a | grep ntponline 16:04:15 svc:/network/ntp:defaultExercise4:CreatetheVirtualNetworkInterfacesConceptBreak:OracleSolaris11NetworkingVirtualizationTechnologyOracleSolarisprovidesareliable,secure,andscalableinfrastructuretomeetthegrowingneedsofdatacenterimplementations.Itspowerfulnetworkstackarchitecture,alsoknownasProjectCrossbow,providesthefollowing.

    NetworkvirtualizationwithvirtualNICs(VNICs)andvirtualswitchingTightintegrationwithOracleSolarisZonesandOracleSolaris10ZonesNetworkresourcemanagement,whichprovidesanefficientandeasywaytomanageintegratedQoStoenforcebandwidthlimitsonVNICsandtrafficflowsAnoptimizednetworkstackthatreactstonetworkloadlevelsTheabilitytobuilda"datacenterinabox"OracleSolarisZonesonthesamesystemcanbenefitfromveryhighnetworkI/Othroughput(uptofourtimesfaster)withverylowlatencycomparedtosystemswith,say,1Gbphysicalnetworkconnections.ForaHadoopcluster,thismeansthattheDataNodescanreplicatetheHDFSblocksmuchfaster.

    Formoreinformationaboutnetworkvirtualizationbenchmarks,see"HowtoControlYourApplication'sNetworkBandwidth."

    Createaseriesofvirtualnetworkinterfaces(VNICs)forthedifferentzones:

    root@global_zone:~# dladm create-vnic -l net0 name_node1root@global_zone:~# dladm create-vnic -l net0 secondary_name1root@global_zone:~# dladm create-vnic -l net0 data_node1root@global_zone:~# dladm create-vnic -l net0 data_node2root@global_zone:~# dladm create-vnic -l net0 data_node3VerifytheVNICscreation:

    root@global_zone:~# dladm show-vnicLINK OVER SPEED MACADDRESS MACADDRTYPE VIDname_node1 net0 1000 2:8:20:c6:3e:f1 random 0secondary_name1 net0 1000 2:8:20:b9:80:45 random 0data_node1 net0 1000 2:8:20:30:1c:3a random 0data_node2 net0 1000 2:8:20:a8:b1:16 random 0data_node3 net0 1000 2:8:20:df:89:81 random 0

    WecanseethatwehavefiveVNICsnow.Figure5showsthearchitecturelayout:

    Figure5

    Exercise5:CreatetheNameNodeandSecondaryNameNodeZonesConceptBreak:OracleSolarisZonesOracleSolarisZonesletyouisolateoneapplicationfromothersonthesameOS,allowingyoutocreateanisolatedenvironmentinwhichuserscanloginanddowhattheywantfrominsideanOracleSolarisZonewithoutaffectinganythingoutsidethatzone.Inaddition,OracleSolarisZonesaresecurefromexternalattacksandinternalmaliciousprograms.EachOracleSolarisZonecontainsacompleteresourcecontrolledenvironment

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 7/15

    thatallowsyoutoallocateresourcessuchasCPU,memory,networking,andstorage.

    Ifyouaretheadministratorwhoownsthesystem,youcanchoosetocloselymanagealltheOracleSolarisZonesoryoucanassignrightstootheradministratorsforspecificOracleSolarisZones.Thisflexibilityletsyoutailoranentirecomputingenvironmenttotheneedsofaparticularapplication,allwithinthesameOS.

    FormoreinformationaboutOracleSolarisZones,see"HowtoGetStartedCreatingOracleSolarisZonesinOracleSolaris11."

    AlltheHadoopnodesforthislabwillbeinstalledusingOracleSolarisZones.

    Ifyoudon'talreadyhaveafilesystemfortheNameNodeandSecondaryNameNodezones,runthefollowingcommand:

    root@global_zone:~# zfs create -o mountpoint=/zones rpool/zonesVerifytheZFSfilesystemcreation:

    root@global_zone:~# zfs list rpool/zonesNAME USED AVAIL REFER MOUNTPOINTrpool/zones 31K 51.4G 31K /zonesCreatethename-nodezone:

    root@global_zone:~# zonecfg -z name-nodeUse 'create' to begin configuring a new zone.Zonecfg:name-node> createcreate: Using system default template 'SYSdefault'zonecfg:name-node> set autoboot=truezonecfg:name-node> set limitpriv=default,dtrace_proc,dtrace_user,sys_timezonecfg:name-node> set zonepath=/zones/name-nodezonecfg:name-node> add fszonecfg:name-node:fs> set dir=/usr/localzonecfg:name-node:fs> set special=/usr/localzonecfg:name-node:fs> set type=lofszonecfg:name-node:fs> set options=[ro,nodevices]zonecfg:name-node:fs> endzonecfg:name-node> add netzonecfg:name-node:net> set physical=name_node1zonecfg:name-node:net> endzonecfg:name-node> verifyzonecfg:name-node> exit (Optional)Youcancreatethename-nodezoneusingthefollowingscript,whichwillcreatethezoneconfigurationfile.Forarguments,thescriptneedsthezonenameandVNICname,forexample:createzone .

    root@global_zone:~# /usr/local/hadoophol/Scripts/createzone name-node name_node1Createthesec-name-nodezone:

    root@global_zone:~# zonecfg -z sec-name-nodeUse 'create' to begin configuring a new zone.Zonecfg:sec-name-node> createcreate: Using system default template 'SYSdefault'zonecfg:sec-name-node> set autoboot=truezonecfg:sec-name-node> set limitpriv=default,dtrace_proc,dtrace_user,sys_timezonecfg:sec-name-node> set zonepath=/zones/sec-name-nodezonecfg:sec-name-node> add fszonecfg:sec-name-node:fs> set dir=/usr/localzonecfg:sec-name-node:fs> set special=/usr/localzonecfg:sec-name-node:fs> set type=lofszonecfg:sec-name-node:fs> set options=[ro,nodevices]zonecfg:sec-name-node:fs> endzonecfg:sec-name-node> add netzonecfg:sec-name-node:net> set physical=secondary_name1zonecfg:sec-name-node:net> endzonecfg:sec-name-node> verifyzonecfg:sec-name-node> exit

    (Optional)Youcancreatethesec-name-nodezoneusingthefollowingscript,whichwillcreatethezoneconfigurationfile.Forarguments,thescriptneedsthezonenameandVNICname,forexample:createzone .

    root@global_zone:~: /usr/local/hadoophol/Scripts/createzone sec-name-node secondary_name1Exercise6:SetUptheDataNodeZonesInthisexercise,wewillleveragetheintegrationbetweenOracleSolarisZonesvirtualizationtechnologyandtheZFSfilesystemthatisbuiltintoOracleSolaris.

    Table2showsasummaryoftheHadoopzonesconfigurationwewillcreate:

    Table2.ZoneSummaryFunction ZoneName ZFSMountPoint VNICName IPAddress

    NameNode name-node /zones/name-node name_node1 192.168.1.1SecondaryNameNode sec-name-node/zones/sec-name-nodesecondary_name1192.168.1.2DataNode data-node1 /zones/data-node1 data_node1 192.168.1.3DataNode data-node2 /zones/data-node2 data_node2 192.168.1.4DataNode data-node3 /zones/data-node3 data_node3 192.168.1.5

    Createthedata-node1zone:

    root@global_zone:~# zonecfg -z data-node1Use 'create' to begin configuring a new zone.zonecfg:data-node1> createcreate: Using system default template 'SYSdefault'zonecfg:data-node1> set autoboot=truezonecfg:data-node1> set limitpriv=default,dtrace_proc,dtrace_user,sys_timezonecfg:data-node1> set zonepath=/zones/data-node1 zonecfg:data-node1> add fszonecfg:data-node1:fs> set dir=/usr/localzonecfg:data-node1:fs> set special=/usr/localzonecfg:data-node1:fs> set type=lofszonecfg:data-node1:fs> set options=[ro,nodevices]zonecfg:data-node1:fs> endzonecfg:data-node1> add netzonecfg:data-node1:net> set physical=data_node1zonecfg:data-node1:net> endzonecfg:data-node1> verifyzonecfg:data-node1> commitzonecfg:data-node1> exit

    (Optional)Youcancreatethedata-node1zoneusingthefollowingscript:

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 8/15

    root@global_zone:~# /usr/local/hadoophol/Scripts/createzone data-node1 data_node1Createthedata-node2zone:

    root@global_zone:~# zonecfg -z data-node2Use 'create' to begin configuring a new zone.zonecfg:data-node2> createcreate: Using system default template 'SYSdefault'zonecfg:data-node2> set autoboot=truezonecfg:data-node2> set limitpriv=default,dtrace_proc,dtrace_user,sys_timezonecfg:data-node2> set zonepath=/zones/data-node2 zonecfg:data-node2> add fszonecfg:data-node2:fs> set dir=/usr/localzonecfg:data-node2:fs> set special=/usr/localzonecfg:data-node2:fs> set type=lofszonecfg:data-node2:fs> set options=[ro,nodevices]zonecfg:data-node2:fs> endzonecfg:data-node2> add netzonecfg:data-node2:net> set physical=data_node2zonecfg:data-node2:net> endzonecfg:data-node2> verifyzonecfg:data-node2> commitzonecfg:data-node2> exit

    (Optional)Youcancreatethedata-node2zoneusingthefollowingscript:

    root@global_zone:~# /usr/local/hadoophol/Scripts/createzone data-node2 data_node2Createthedata-node3zone:

    root@global_zone:~# zonecfg -z data-node3Use 'create' to begin configuring a new zone.zonecfg:data-node3> createcreate: Using system default template 'SYSdefault'zonecfg:data-node3> set autoboot=truezonecfg:data-node3> set limitpriv=default,dtrace_proc,dtrace_user,sys_timezonecfg:data-node3> set zonepath=/zones/data-node3zonecfg:data-node3> add fszonecfg:data-node3:fs> set dir=/usr/localzonecfg:data-node3:fs> set special=/usr/localzonecfg:data-node3:fs> set type=lofszonecfg:data-node3:fs> set options=[ro,nodevices]zonecfg:data-node3:fs> endzonecfg:data-node3> add netzonecfg:data-node3:net> set physical=data_node3zonecfg:data-node3:net> endzonecfg:data-node3> verifyzonecfg:data-node3> commitzonecfg:data-node3> exit

    (Optional)Youcancreatethedata-node3zoneusingthefollowingscript:

    root@global_zone:~# /usr/local/hadoophol/Scripts/createzone data-node3 data_node3Exercise7:ConfiguretheNameNodeNow,installthename-nodezonelaterwewillcloneitinordertoacceleratezonecreationtime.

    root@global_zone:~# zoneadm -z name-node installThe following ZFS file system(s) have been created: rpool/zones/name-nodeProgress being logged to /var/log/zones/zoneadm.20130106T134835Z.name-node.install Image: Preparing at /zones/name-node/root.Bootthename-nodezone:

    root@global_zone:~# zoneadm -z name-node bootCheckthestatusofthezoneswe'vecreated:

    root@global_zone:~# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared 1 name-node running /zones/name-node solaris excl - sec-name-node configured /zones/sec-name-node solaris excl - data-node1 configured /zones/data-node1 solaris excl - data-node2 configured /zones/data-node2 solaris excl - data-node3 configured /zones/data-node3 solaris exclLogintothename-nodezone:

    root@global_zone:~# zlogin -C name-nodeProvidethezonehostinformationbyusingthefollowingconfigurationforthename-nodezone:

    Forthehostname,usename-node.Selectmanualnetworkconfiguration.Ensurethenetworkinterfacename_node1hasanIPaddressof192.168.1.1andanetmaskof255.255.255.0.Ensurethenameserviceisbasedonyournetworkconfiguration.Inthislab,wewilluse/etc/hostsfornameresolution,sowewon'tsetupDNSforhostnameresolution.SelectDonotconfigureDNS.ForAlternateNameService,selectNone.ForTimeZoneRegion,selectAmericas.ForTimeZoneLocation,selectUnitedStates.ForTimeZone,selectPacificTime.Enteryourrootpassword.Afterfinishingthezonesetup,youwillgettheloginprompt.Logintothezoneasuserroot.

    name-node console login: rootPassword:DevelopingforHadooprequiresaJavaprogrammingenvironment.YoucaninstallJavaDevelopmentKit(JDK)6usingthefollowingcommand:

    root@name-node:~# pkg install jdk-6VerifytheJavainstallation:

    root@name-node:~# which java/usr/bin/java

    root@name-node:~# java -versionjava version "1.6.0_35"Java(TM) SE Runtime Environment (build 1.6.0_35-b10)Java HotSpot(TM) Client VM (build 20.10-b01, mixed mode)CreateaHadoopuserinsidethename-nodezone:

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 9/15

    root@name-node:~# groupadd hadooproot@name-node:~# useradd -m -g hadoop hadooproot@name-node:~# passwd hadoop

    Note:ThepasswordshouldbethesamepasswordasyouenteredinStep22ofExercise1whenyousettheuser'sHadooppassword.

    CreateadirectoryfortheHadooplogfiles:

    root@name-node:~# mkdir /var/log/hadooproot@name-node:~# chown hadoop:hadoop /var/log/hadoopConfigureanNTPclient,asshowninthefollowingexample:

    InstalltheNTPpackage:

    root@name-node:~# pkg install ntpCreatetheNTPclientconfigurationfiles:

    root@name-node:~# cd /etc/inetroot@name-node:~# cp ntp.client ntp.confroot@name-node:~# chmod +w /etc/inet/ntp.confroot@name-node:~# touch /var/ntp/ntp.driftEdittheNTPclientconfigurationfile,asshowninListing9:

    root@name-node:~# vi /etc/inet/ntp.conf

    Note:Inthislab,weareusingtheglobalzoneasatimeserversoweadditsname(forexample,global-zone)to/etc/inet/ntp.conf.

    Listing9server global-zone preferdriftfile /var/ntp/ntp.driftstatsdir /var/ntp/ntpstats/filegen peerstats file peerstats type day enablefilegen loopstats file loopstats type day enableAddtheHadoopclustermembers'hostnamesandIPaddressesto/etc/hosts,asshowninListing10:

    root@name-node:~# vi /etc/hosts

    Listing10::1 localhost127.0.0.1 localhost loghost192.168.1.1 name-node192.168.1.2 sec-name-node192.168.1.3 data-node1192.168.1.4 data-node2192.168.1.5 data-node3192.168.1.100 global-zoneEnabletheNTPclientservice:

    root@name-node:~# svcadm enable ntpVerifytheNTPclientstatus:

    root@name-node:~# svcs ntpSTATE STIME FMRIonline 11:15:59 svc:/network/ntp:defaultCheckwhethertheNTPclientcansynchronizeitsclockwiththeNTPserver:

    root@name-node:~# ntpq -pExercise8:SetUpSSHSetupSSHkeybasedauthenticationfortheHadoopuseronthename_nodezoneinordertoenablepasswordlesslogintotheSecondaryDataNodeandtheDataNodes:

    root@name-node:~# su - hadoophadoop@name-node $ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsahadoop@name-node $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keysEdit$HOME/.profiletoappendtotheendofthefilethelinesshowninListing11:

    hadoop@name-node $ vi $HOME/.profile

    Listing11# Set JAVA_HOME export JAVA_HOME=/usr/java# Add Hadoop bin/ directory to PATHexport PATH=$PATH:/usr/local/hadoop/bin

    Thenrunthefollowingcommand:

    hadoop@name-node $ source $HOME/.profileCheckthatHadooprunsbytypingthefollowingcommand:

    hadoop@name-node:~$ hadoop versionHadoop 1.2.1Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152Compiled by mattf on Mon Jul 22 15:23:09 PDT 2013From source with checksum 6923c86528809c4e7e6f493b6b413a9a

    Note:Press~.toexitfromthename-nodeconsoleandreturntotheglobalzone.

    Youcanverifythatyouareintheglobalzoneusingthezonenamecommand:

    root@global_zone:~# zonenameglobalFromtheglobalzone,runthefollowingcommandtocreatethesec-name-nodezoneasacloneofname-node:

    root@global_zone:~# zoneadm -z name-node shutdown root@global_zone:~# zoneadm -z sec-name-node clone name-nodeBootthesec-name-nodezone:

    root@global_zone:~# zoneadm -z sec-name-node bootroot@global_zone:~# zlogin -C sec-name-nodeAsweexperiencedpreviously,thesystemconfigurationtoolislaunched(seeFigure6),sodothefinalconfigurationforsec-name-nodezone:

    Note:Allthezonesmusthavethesametimezoneconfigurationandthesamerootpassword.

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 10/15

    Figure6

    Forthehostname,usesec-name-node.Selectmanualnetworkconfigurationandforthenetworkinterface,usesecondary_name1.UseanIPaddressof192.168.1.2andanetmaskof255.255.255.0.SelectDonotconfigureDNSintheDNSnameservicewindow.EnsureAlternateNameServiceissettoNone.ForTimeZoneRegion,selectAmericas.ForTimeZoneLocation,selectUnitedStates.ForTimeZone,selectPacificTime.Enteryourrootpassword.

    Note:Press~.toexitfromthesec-name-nodeconsoleandreturntotheglobalzone.

    Performsimilarstepsfordata-node1,data-node2,anddata-node3:

    Dothefollowingfordata-node1:

    root@global_zone:~# zoneadm -z data-node1 clone name-noderoot@global_zone:~# zoneadm -z data-node1 bootroot@global_zone:~# zlogin -C data-node1

    Forthehostname,usedata-node1.Selectmanualnetworkconfigurationandforthenetworkinterface,usedata_node1.UseanIPaddressof192.168.1.3andanetmaskof255.255.255.0.SelectDonotconfigureDNSintheDNSnameservicewindow.EnsureAlternateNameServiceissettoNone.ForTimeZoneRegion,selectAmericas.ForTimeZoneLocation,selectUnitedStates.ForTimeZone,selectPacificTime.Enteryourrootpassword.Dothefollowingfordata-node2:

    root@global_zone:~# zoneadm -z data-node2 clone name-noderoot@global_zone:~# zoneadm -z data-node2 bootroot@global_zone:~# zlogin -C data-node2

    Forthehostname,usedata-node2.Forthenetworkinterface,usedata_node2.UseanIPaddressof192.168.1.4andanetmaskof255.255.255.0.SelectDonotconfigureDNSintheDNSnameservicewindow.EnsureAlternateNameServiceissettoNone.ForTimeZoneRegion,selectAmericas.ForTimeZoneLocation,selectUnitedStates.ForTimeZone,selectPacificTime.Enteryourrootpassword.Dothefollowingfordata-node3:

    root@global_zone:~# zoneadm -z data-node3 clone name-noderoot@global_zone:~# zoneadm -z data-node3 bootroot@global_zone:~# zlogin -C data-node3

    Forthehostname,usedata-node3.Forthenetworkinterface,usedata_node3.UseanIPaddressof192.168.1.5andanetmaskof255.255.255.0.SelectDonotconfigureDNSintheDNSnameservicewindow.EnsureAlternateNameServiceissettoNone.ForTimeZoneRegion,selectAmericas.ForTimeZoneLocation,selectUnitedStates.ForTimeZone,selectPacificTime.Enteryourrootpassword.Bootthename_nodezone:

    root@global_zone:~# zoneadm -z name-node bootVerifythatallthezonesareupandrunning:

    root@global_zone:~# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared 10 sec-name-node running /zones/sec-name-node solaris excl 12 data-node1 running /zones/data-node1 solaris excl 14 data-node2 running /zones/data-node2 solaris excl 16 data-node3 running /zones/data-node3 solaris excl 17 name-node running /zones/name-node solaris exclToverifyyourSSHaccesswithoutusingapasswordfortheHadoopuser,dothefollowing.

    Fromname_node,loginviaSSHintoname-node(thatis,toitself):

    root@global_zone:~# zlogin name-noderoot@name-node:~# su - hadoop

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 11/15

    hadoop@name-node $ ssh name-node

    The authenticity of host 'name-node (192.168.1.1)' can't be established.RSA key fingerprint is 04:93:a9:e0:b7:8c:d7:8b:51:b8:42:d7:9f:e1:80:ca.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'name-node,192.168.1.1' (RSA) to the list of known hosts.Now,trytologintosec-name-nodeandtheDataNodes(data-node1,data-node2,anddata-node3).TryloggingintothehostsagainusingSSH.Youshouldn'tgetaprompttoaddthehosttotheknownkeyslist.Editthe/etc/hostsfilesinsidesec-name-nodeandtheDataNodesinordertoaddthename-nodeentry:

    root@global_zone:~# zlogin sec-name-node 'echo "192.168.1.1 name-node" >> /etc/hosts' root@global_zone:~# zlogin data-node1 'echo "192.168.1.1 name-node" >> /etc/hosts'root@global_zone:~# zlogin data-node2 'echo "192.168.1.1 name-node" >> /etc/hosts'root@global_zone:~# zlogin data-node3 'echo "192.168.1.1 name-node" >> /etc/hosts'VerifynameresolutionbyensuringthattheglobalzoneandalltheHadoopzoneshavethehostentriesshowninListing12in/etc/hosts:

    # cat /etc/hosts

    Listing12::1 localhost127.0.0.1 localhost loghost192.168.1.1 name-node192.168.1.2 sec-name-node192.168.1.3 data-node1192.168.1.4 data-node2192.168.1.5 data-node3192.168.1.100 global-zone

    Note:IfyouareusingtheglobalzoneasanNTPserver,youmustalsoadditshostnameandIPaddressto/etc/hosts.

    Verifytheclusterusingtheverifyclusterscript:

    root@global_zone:~# /usr/local/hadoophol/Scripts/verifycluster

    Iftheclustersetupisfine,youwillgetacluster is verifiedmessage.

    Note:Iftheverifyclusterscriptfailswithanerrormessage,checkthatthe/etc/hostsfileineveryzoneincludesallthezonesnamesasdescribedintheStep12,andthenreruntheverifiabilityscriptagain.

    Exercise9:FormatHDFSfromtheNameNodeConceptBreak:HadoopDistributedFileSystem(HDFS)HDFSisadistributed,scalablefilesystem.HDFSstoresmetadataontheNameNode.ApplicationdataisstoredontheDataNodes,andeachDataNodeservesupblocksofdataoverthenetworkusingablockprotocolspecifictoHDFS.ThefilesystemusestheTCP/IPlayerforcommunication.ClientsuseRemoteProcedureCall(RPC)tocommunicatewitheachother.

    TheDataNodesdonotrelyondataprotectionmechanisms,suchasRAID,tomakethedatadurable.Instead,thefilecontentisreplicatedonmultipleDataNodesforreliability.

    Withthedefaultreplicationvalue(3),whichissetupinthehdfs-site.xmlfile,dataisstoredonthreenodes.DataNodescantalktoeachotherinordertorebalancedata,tomovecopiesaround,andtokeepthereplicationofdatahigh.InFigure7,wecanseethateverydatablockisreplicatedacrossthreedatanodesbasedonthereplicationvalue.

    AnadvantageofusingHDFSisdataawarenessbetweentheJobTrackerandTaskTracker.TheJobTrackerschedulesmaporreducejobstoTaskTrackerwithanawarenessofthedatalocation.AnexampleofthiswouldbeifnodeAcontaineddata(x,y,z)andnodeBcontaineddata(a,b,c).ThentheJobTrackerwillschedulenodeBtoperformmaporreducetaskson(a,b,c)andnodeAwouldbescheduledtoperformmaporreducetaskson(x,y,z).Thisreducestheamountoftrafficthatgoesoverthenetworkandpreventsunnecessarydatatransfer..Thisdataawarenesscanhaveasignificantimpactonjobcompletiontimes,whichhasbeendemonstratedwhenrunningdataintensivejobs.

    FormoreinformationaboutHadoopHDFSseehttps://en.wikipedia.org/wiki/Hadoop.

    Figure7

    ToformatHDFS,runthefollowingcommandsandanswerYattheprompt:

    root@global_zone:~# zlogin name-noderoot@name-node:~# mkdir -p /hdfs/nameroot@name-node:~# chown -R hadoop:hadoop /hdfsroot@name-node:~# su - hadoophadoop@name-node:$ /usr/local/hadoop/bin/hadoop namenode -format13/10/13 09:10:52 INFO namenode.NameNode: STARTUP_MSG: /************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = name-node/192.168.1.1STARTUP_MSG: args = [-format]STARTUP_MSG: version = 1.2.1STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013STARTUP_MSG: java = 1.6.0_35************************************************************/

    hadoop@name-node:$ Re-format filesystem in /hdfs/name ? (Y or N) YOneveryDataNode(data-node1,data-node2,anddata-node3),createaHadoopdatadirectorytostoretheHDFSblocks:

    root@global_zone:~# zlogin data-node1root@data-node1:~# mkdir -p /hdfs/dataroot@data-node1:~# chown -R hadoop:hadoop /hdfs

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 12/15

    root@global_zone:~# zlogin data-node2root@data-node2:~# mkdir -p /hdfs/dataroot@data-node2:~# chown -R hadoop:hadoop /hdfs

    root@global_zone:~# zlogin data-node3root@data-node3:~# mkdir -p /hdfs/dataroot@data-node3:~# chown -R hadoop:hadoop /hdfsExercise10:StarttheHadoopClusterTable3describesthestartupscripts.

    Table3.StartupScriptsFileName Description

    start-dfs.shStartstheHDFSdaemons,theNameNode,andtheDataNodes.Usethisbeforestart-mapred.sh.stop-dfs.sh StopstheHadoopDFSdaemons.start-mapred.sh StartstheHadoopMapReducedaemons,theJobTracker,andtheTaskTrackers.

    stop-mapred.sh StopstheHadoopMapReducedaemons.

    Fromthename-nodezone,starttheHadoopDFSdaemons,theNameNode,andtheDataNodesusingthefollowingcommands:

    root@global_zone:~# zlogin name-noderoot@name-node:~# su - hadoophadoop@name-node:$ start-dfs.shstarting namenode, logging to /var/log/hadoop/hadoop--namenode-name-node.outdata-node2: starting datanode, logging to /var/log/hadoop/hadoop-hadoop-datanode-data-node2.outdata-node1: starting datanode, logging to /var/log/hadoop/hadoop-hadoop-datanode-data-node1.outdata-node3: starting datanode, logging to /var/log/hadoop/hadoop-hadoop-datanode-data-node3.outsec-name-node: starting secondarynamenode, logging to /var/log/hadoop/hadoop-hadoop-secondarynamenode-sec-name-node.outStarttheHadoopMap/Reducedaemons,theJobTracker,andtheTaskTrackersusingthefollowingcommand:

    hadoop@name-node:$ start-mapred.shstarting jobtracker, logging to /var/log/hadoop/hadoop--jobtracker-name-node.outdata-node1: starting tasktracker, logging to /var/log/hadoop/hadoop-hadoop-tasktracker-data-node1.outdata-node3: starting tasktracker, logging to /var/log/hadoop/hadoop-hadoop-tasktracker-data-node3.outdata-node2: starting tasktracker, logging to /var/log/hadoop/hadoop-hadoop-tasktracker-data-node2.outToviewacomprehensivestatusreport,executethefollowingcommandtochecktheclusterstatus.Thecommandwilloutputbasicstatisticsabouttheclusterhealth,suchasNameNodedetails,thestatusofeachDataNode,anddiskcapacityamounts.

    hadoop@name-node:$ hadoop dfsadmin -reportConfigured Capacity: 171455269888 (159.68 GB)Present Capacity: 169711053357 (158.06 GB)DFS Remaining: 169711028736 (158.06 GB)DFS Used: 24621 (24.04 KB)DFS Used%: 0%Under replicated blocks: 0Blocks with corrupt replicas: 0Missing blocks: 0-------------------------------------------------Datanodes available: 3 (3 total, 0 dead)...

    YoushouldseethatthreeDataNodesareavailable.

    Note:YoucanfindthesameinformationontheNameNodewebstatuspage(showninFigure8)athttp://:50070/dfshealth.jsp.ThenamenodeIPaddressis192.168.1.1.

    Figure8

    Exercise11:RunaMapReduceJobConceptBreak:MapReduceMapReduceisaframeworkforprocessingparallelizableproblemsacrosshugedatasetsusingaclusterofcomputers.

    TheessentialideaofMapReduceisusingtwofunctionstograbdatafromasource:usingtheMap()functionandthenprocessingthedataacrossaclusterofcomputersusingtheReduce()function.Specifically,Map()willapplyafunctiontoallthemembersofadatasetandpostaresultset,whichReduce()willthencollateandresolve.

    Map()andReduce()canberuninparallelandacrossmultiplesystems.

    FormoreinformationaboutMapReduce,seehttp://en.wikipedia.org/wiki/MapReduce.

    WewillusetheWordCountexample,whichreadstextfilesandcountshowoftenwordsoccur.Theinputandoutputconsistoftextfiles,eachlineof

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 13/15

    whichcontainsawordandthenumberoftimesthewordoccurred,separatedbyatab.FormoreinformationaboutWordCount,seehttp://wiki.apache.org/hadoop/WordCount.

    Createtheinputdatadirectorywewillputtheinputfilesthere.

    hadoop@name-node:$ hadoop fs -mkdir /input-dataVerifythedirectorycreation:

    hadoop@name-node:$ hadoop dfs -ls /Found 1 itemsdrwxr-xr-x - hadoop supergroup 0 2013-10-13 23:45 /input-dataCopythepg20417.txtfileyoudownloadedearliertoHDFSusingthefollowingcommand:

    Note:OracleOpenWorldattendeescanfindthepg20417.txtfileinthe/usr/local/hadoophol/Docdirectory.

    hadoop@name-node:$ hadoop dfs -copyFromLocal /usr/local/hadoophol/Doc/pg20417.txt /input-dataVerifythatthefileislocatedonHDFS:

    hadoop@name-node:$ hadoop dfs -ls /input-dataFound 1 items-rw-r--r-- 3 hadoop supergroup 674570 2013-10-13 10:20 /input-data/pg20417.txtCreatetheoutputdirectorytheMapReducejobwillputitsoutputsinthisdirectory:

    hadoop@name-node:$ hadoop fs -mkdir /output-dataStarttheMapReducejobusingthefollowingcommand:

    hadoop@name-node:$ hadoop jar /usr/local/hadoop/hadoop-examples-1.2.1.jar wordcount /input-data/pg20417.txt /output-data/output1

    13/10/13 10:23:08 INFO input.FileInputFormat: Total input paths to process : 113/10/13 10:23:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable13/10/13 10:23:08 WARN snappy.LoadSnappy: Snappy native library not loaded13/10/13 10:23:09 INFO mapred.JobClient: Running job: job_201310130918_001013/10/13 10:23:10 INFO mapred.JobClient: map 0% reduce 0%13/10/13 10:23:19 INFO mapred.JobClient: map 100% reduce 0%13/10/13 10:23:29 INFO mapred.JobClient: map 100% reduce 33%13/10/13 10:23:31 INFO mapred.JobClient: map 100% reduce 100%13/10/13 10:23:34 INFO mapred.JobClient: Job complete: job_201310130918_001013/10/13 10:23:34 INFO mapred.JobClient: Counters: 26

    Theprogramtakesabout60secondstoexecuteonthecluster.

    Allofthefilesintheinputdirectory(input-datainthecommandlineshownabove)arereadandthecountsforthewordsintheinputarewrittentotheoutputdirectory(calledoutput-data/output1).

    Verifytheoutputdata:

    hadoop@name-node:$ hadoop dfs -ls /output-data/output1Found 3 items-rw-r--r-- 3 hadoop supergroup 0 2013-10-13 10:30 /output-data/output1/_SUCCESSdrwxr-xr-x - hadoop supergroup 0 2013-10-13 10:30 /output-data/output1/_logs-rw-r--r-- 3 hadoop supergroup 196192 2013-10-13 10:30 /output-data/output1/part-r-00000Exercise12:UseZFSEncryptionConceptBreak:ZFSEncryptionOracleSolaris11addstransparentdataencryptionfunctionalitytoZFS.Alldataandfilesystemmetadata(suchasownership,accesscontrollists,quotainformation,andsoon)isencryptedwhenstoredpersistentlyintheZFSpool.

    AZFSpoolcansupportamixofencryptedandunencryptedZFSdatasets(filesystemsandZVOLs).DataencryptioniscompletelytransparenttoapplicationsandotherOracleSolarisfileservices,suchasNFSorCIFS.SinceencryptionisafirstclassfeatureofZFS,weareabletosupportcompression,encryption,anddeduplicationtogether.Encryptionkeymanagementforencrypteddatasetscanbedelegatedtousers,OracleSolarisZones,orboth.OracleSolariswithZFSencryptionprovidesaveryflexiblesystemforsecuringdataatrest,anditdoesn'trequireanyapplicationchangesorqualification.

    FormoreinformationaboutZFSencryption,see"HowtoManageZFSDataEncryption."

    Theoutputdatacancontainsensitiveinformation,souseZFSencryptiontoprotecttheoutputdata.

    CreatetheencryptedZFSdataset:

    Note:Youneedtoprovidethepassphraseitmustbeatleasteightcharacters.

    root@name-node:~# zfs create -o encryption=on rpool/export/output

    Enter passphrase for 'rpool/export/output':Enter again:VerifythattheZFSdatasetisencrypted:

    root@name-node:~# zfs get all rpool/export/output | grep encryrpool/export/output encryption on localChangetheownership:

    root@name-node:~# chown hadoop:hadoop /export/outputCopytheoutputfilefromHDFSintoZFS:

    root@name-node:~# su - hadoopOracle Corporation SunOS 5.11 11.1 September 2012

    hadoop@name-node:$ hadoop dfs -getmerge /output-data/output1 /export/output Analyzetheoutputtextfile.Eachlinecontainsawordandthenumberoftimesthewordoccurred,separatedbyatab.

    hadoop@name-node:$ head /export/output/output1"A 2"Alpha 1"Alpha," 1"An 2"And 1"BOILING" 2"Batesian" 1"Beta 2ProtecttheoutputtextfilebyunmountingtheZFSdataset,andthenunloadthewrappingkeyforanencrypteddatasetusingthefollowingcommand:

    root@name-node:~# zfs key -u rpool/export/output

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 14/15

    Ifthecommandissuccessful,thedatasetisnotaccessibleanditisunmounted.

    IfyouwanttomountthisZFSfilesystem,youneedtoprovidethepassphrase:

    root@name-node:~# zfs mount rpool/export/outputEnter passphrase for 'rpool/export/output':

    Byusingapassphrase,youensurethatonlythosewhoknowthepassphrasecanobservetheoutputfile.

    Exercise13:UseOracleSolarisDTraceforPerformanceMonitoringConceptBreak:OracleSolarisDTraceOracleSolarisDTraceisacomprehensive,advancedtracingtoolfortroubleshootingsystematicproblemsinrealtime.Administrators,integrators,anddeveloperscanuseDTracetodynamicallyandsafelyobserveliveproductionsystems,includingbothapplicationsandtheoperatingsystemitself,forperformanceissues.

    DTraceallowsyoutoexploreasystemtounderstandhowitworks,trackdownproblemsacrossmanylayersofsoftware,andlocatethecauseofanyaberrantbehavior.Whetherit'satahighlevelglobaloverview,suchmemoryconsumptionorCPUtime,oratamuchfinergrainedlevel,suchaswhatspecificfunctioncallsarebeingmade,DTracecanprovideoperationalinsightsthathavebeenmissinginthedatacenterbyenablingyoutodothefollowing:

    Insert80,000+probepointsacrossallfacetsoftheoperatingsystem.Instrumentuserandsystemlevelsoftware.Useapowerfulandeasytousescriptinglanguageandcommandlineinterfaces.FormoreinformationaboutDTrace,seehttp://www.oracle.com/technetwork/serverstorage/solaris11/technologies/dtrace1930301.html.

    Openanotherterminalwindowandloginintoname-nodeasuserhadoop.RunthefollowingMapReducejob:

    hadoop@name-node:$ hadoop jar /usr/local/hadoop/hadoop-examples-1.2.1.jar wordcount /input-data/pg20417.txt /output-data/output2WhentheHadoopjobisrun,determinewhatprocessesareexecutedontheNameNode.

    Intheterminalwindow,runthefollowingDTracecommand:

    root@global-zone:~# dtrace -n 'proc:::exec-success/strstr(zonename,"name-node")>0/ { trace(curpsinfo->pr_psargs); }'

    dtrace: description 'proc:::exec-success' matched 1 probe

    CPU ID FUNCTION:NAME 0 4473 exec_common:exec-success /usr/bin/env bash /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/hadoop-exa 0 4473 exec_common:exec-success bash /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/hadoop-examples-1.1.2.j 0 4473 exec_common:exec-success dirname /usr/local/hadoop-1.1.2/libexec/-- 0 4473 exec_common:exec-success dirname /usr/local/hadoop-1.1.2/libexec/-- 0 4473 exec_common:exec-success sed -e s/ /_/g 1 4473 exec_common:exec-success dirname /usr/local/hadoop/bin/hadoop 1 4473 exec_common:exec-success dirname -- /usr/local/hadoop/bin/../libexec/hadoop-config.sh 1 4473 exec_common:exec-success basename -- /usr/local/hadoop/bin/../libexec/hadoop-config.sh 1 4473 exec_common:exec-success basename /usr/local/hadoop-1.1.2/libexec/-- 1 4473 exec_common:exec-success uname 1 4473 exec_common:exec-success /usr/java/bin/java -Xmx32m org.apache.hadoop.util.PlatformName 1 4473 exec_common:exec-success /usr/java/bin/java -Xmx32m org.apache.hadoop.util.PlatformName 0 4473 exec_common:exec-success /usr/java/bin/java -Dproc_jar -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop -Dhado 0 4473 exec_common:exec-success /usr/java/bin/java -Dproc_jar -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop -DhadoC^

    Note:PressCtrlcinordertoseetheDTraceoutput.

    WhentheHadoopjobisrun,determinewhatfilesarewrittentotheNameNode.

    Note:IftheMapReducejobisfinished,youcanrunanotherjobwithadifferentoutputdirectory(forexample,/output-data/output3).

    Forexample:

    hadoop@name-node:$ hadoop jar /usr/local/hadoop/hadoop-examples-1.2.1.jar wordcount /input-data/pg20417.txt /output-data/output3

    root@global-zone:~# dtrace -n 'syscall::write:entry/strstr(zonename,"name-node")>0/ {@write[fds[arg0].fi_pathname]=count();}'

    dtrace: description 'syscall::write:entry' matched 1 probeC^

    /zones/name-node/root/tmp/hadoop-hadoop/mapred/local/jobTracker/.job_201307181457_0007.xml.crc 1 /zones/name-node/root/var/log/hadoop/history/.job_201307181457_0007_conf.xml.crc 1 /zones/name-node/root/dev/pts/3 5 /zones/name-node/root/var/log/hadoop/job_201307181457_0007_conf.xml 6 /zones/name-node/root/tmp/hadoop-hadoop/mapred/local/jobTracker/job_201307181457_0007.xml 8 /zones/name-node/root/var/log/hadoop/history/job_201307181457_0007_conf.xml 11 /zones/name-node/root/var/log/hadoop/hadoop--jobtracker-name-node.log 13 /zones/name-node/root/hdfs/name/current/edits.new 25 /zones/name-node/root/var/log/hadoop/hadoop--namenode-name-node.log 45 /zones/name-node/root/dev/poll 207 3131655

    Note:PressCtrlcinordertoseetheDTraceoutput.

    WhentheHadoopjobisrun,determinewhatprocessesareexecutedontheDataNode:

    root@global-zone:~# dtrace -n 'proc:::exec-success/strstr(zonename,"data-node1")>0/ { trace(curpsinfo->pr_psargs); }'

    dtrace: description 'proc:::exec-success' matched 1 probe

    CPU ID FUNCTION:NAME 0 8833 exec_common:exec-success dirname /usr/local/hadoop/bin/hadoop 0 8833 exec_common:exec-success dirname /usr/local/hadoop/libexec/-- 0 8833 exec_common:exec-success sed -e s/ /_/g 1 8833 exec_common:exec-success dirname -- /usr/local/hadoop/bin/../libexec/hadoop-config.sh 2 8833 exec_common:exec-success basename /usr/local/hadoop/libexec/-- 2 8833 exec_common:exec-success /usr/java/bin/java -Xmx32m org.apache.hadoop.util.PlatformName 2 8833 exec_common:exec-success /usr/java/bin/java -Xmx32m org.apache.hadoop.util.PlatformName 3 8833 exec_common:exec-success /usr/bin/env bash /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/hadoop-exa 3 8833 exec_common:exec-success bash /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/hadoop-examples-1.0.4.j 3 8833 exec_common:exec-success basename -- /usr/local/hadoop/bin/../libexec/hadoop-config.sh 3 8833 exec_common:exec-success dirname /usr/local/hadoop/libexec/-- 3 8833 exec_common:exec-success uname 3 8833 exec_common:exec-success /usr/java/bin/java -Dproc_jar -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop -Dhado

  • 2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

    http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 15/15

    3 8833 exec_common:exec-success /usr/java/bin/java -Dproc_jar -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop -DhadoC^

    WhentheHadoopjobisrun,determinewhatfilesarewrittenontheDataNode:

    (Therewere222linesofoutput,whichwerereducedforreadability.)

    root@global-zone:~# dtrace -n 'syscall::write:entry/strstr(zonename,"data-node1")>0/ {@write[fds[arg0].fi_pathname]=count();}'

    dtrace: description 'syscall::write:entry' matched 1 probe

    C^ /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_-5404946161781239203 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_-5404946161781239203_1103.meta 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_-6136035696057459536 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_-6136035696057459536_1102.meta 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_-8420966433041064066 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_-8420966433041064066_1105.meta 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_1792925233420187481 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_1792925233420187481_1101.meta 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_4108435250688953064 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_4108435250688953064_1106.meta 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_8503732348705847964 DeterminethetotalamountofHDFSdatawrittenfortheDataNodes:

    root@global-zone:~# dtrace -n 'syscall::write:entry / ( strstr(zonename,"data-node1")!=0 || strstr(zonename,"data-node2")!=0 || strstr(zonename,"data-node3")!=0 ) && strstr(fds[arg0].fi_pathname,"hdfs")!=0 && strstr(fds[arg0].fi_pathname,"blocksBeingWritten")>0/ { @write[fds[arg0].fi_pathname]=sum(arg2); }'C^

    SummaryInthislab,welearnedhowtosetupaHadoopclusterusingOracleSolaris11technologiessuchasOracleSolarisZones,ZFS,andnetworkvirtualizationandDTrace.

    SeeAlsoHadoopandHDFSHadoopframework"HowtoControlYourApplication'sNetworkBandwidth""HowtoGetStartedCreatingOracleSolarisZonesinOracleSolaris11""HowtoSetUpaHadoopClusterUsingOracleSolarisZones""HowtoBuildNativeHadoopLibrariesforOracleSolaris11"MapReduceWordCount"HowtoManageZFSDataEncryption"DTraceAbouttheAuthorOrgadKimchiisaprincipalsoftwareengineerontheISVEngineeringteamatOracle(formerlySunMicrosystems).For6yearshehasspecializedinvirtualization,bigdata,andcloudcomputingtechnologies.

    Revision1.0,10/21/2013

    Followus:Blog|Facebook|Twitter|YouTube

    Emailthispage PrinterView

    ORACLECLOUDLearnAboutOracleCloudGetaFreeTrialLearnAboutPaaSLearnAboutSaaSLearnAboutIaaS

    JAVALearnAboutJavaDownloadJavaforConsumersDownloadJavaforDevelopersJavaResourcesforDevelopersJavaCloudServiceJavaMagazine

    CUSTOMERSANDEVENTSExploreandReadCustomerStoriesAllOracleEventsOracleOpenWorldJavaOne

    COMMUNITIESBlogsDiscussionForumsWikisOracleACEsUserGroupsSocialMediaChannels

    SERVICESANDSTORELogIntoMyOracleSupportTrainingandCertificationBecomeaPartnerFindaPartnerSolutionPurchasefromtheOracleStore

    CONTACTANDCHATPhone:+1.800.633.0738GlobalContactsOracleSupportPartnerSupport

    Subscribe Careers ContactUs SiteMaps LegalNotices TermsofUse Privacy CookiePreferences OracleMobile