How to Set Up a Hadoop Cluster Using Oracle Solaris

2/21/2014 How to Set Up a Hadoop Cluster Using Oracle Solaris

http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html 1/15

Archive

AutoServiceRequest(ASR)

AllSystemAdminArticles

AllSystemsTopics

CoolThreads

DST

EndofNotices

FAQ

HandsOnLabs

HighPerformanceComputing

Interoperability

Patches

Security

SoftwareStacks

SolarisDeveloper

SolarisHowTo

SolarisStudioIDETopics

SysadminDays

SystemAdminDocs

Upgrade

VMServerforSPARC

DidyouKnow

JetToolkit

OracleACESforSystems

OracleonDell

Wanttocommentonthisarticle?PostthelinkonFacebook'sOTN

Garagepage.Haveasimilararticletoshare?

BringituponFacebookorTwitterandlet'sdiscuss.

HowtoSetUpaHadoopClusterUsingOracleSolarisHandsOnLabsoftheSystemAdminandDeveloperCommunityofOTN

byOrgadKimchi

HowtosetupaHadoopclusterusingtheOracleSolarisZones,ZFS,andnetworkvirtualizationtechnologies.

PublishedOctober2013

TableofContentsLabIntroductionPrerequisitesSystemRequirementsSummaryofLabExercisesTheCaseforHadoopExercise1:InstallHadoopExercise2:EdittheHadoopConfigurationFilesExercise3:ConfiguretheNetworkTimeProtocolExercise4:CreatetheVirtualNetworkInterfacesExercise5:CreatetheNameNodeandSecondaryNameNodeZonesExercise6:SetUptheDataNodeZonesExercise7:ConfiguretheNameNodeExercise8:SetUpSSHExercise9:FormatHDFSfromtheNameNodeExercise10:StarttheHadoopClusterExercise11:RunaMapReduceJobExercise12:UseZFSEncryptionExercise13:UseOracleSolarisDTraceforPerformanceMonitoringSummarySeeAlsoAbouttheAuthor

Expectedduration:180minutes

LabIntroductionThishandsonlabpresentsexercisesthatdemonstratehowtosetupanApacheHadoopclusterusingOracleSolaris11technologiessuchasOracleSolarisZones,ZFS,andnetworkvirtualization.KeytopicsincludetheHadoopDistributedFileSystem(HDFS)andtheHadoopMapReduceprogrammingmodel.

WewillalsocovertheHadoopinstallationprocessandtheclusterbuildingblocks:NameNode,asecondaryNameNode,andDataNodes.Inaddition,youwillseehowyoucancombinetheOracleSolaris11technologiesforbetterscalabilityanddatasecurity,andyouwilllearnhowtoloaddataintotheHadoopclusterandrunaMapReducejob.

PrerequisitesThishandsonlabisappropriateforsystemadministratorswhowillbesettingupormaintainingaHadoopclusterinproductionordevelopmentenvironments.BasicLinuxorOracleSolarissystemadministrationexperienceisaprerequisite.PriorknowledgeofHadoopisnotrequired.

SystemRequirementsThishandsonlabisrunonOracleSolaris11inOracleVMVirtualBox.Thelabisselfcontained.AllyouneedisintheOracleVMVirtualBoxinstance.

ForthoseattendingthelabatOracleOpenWorld,yourlaptopsarealreadypreloadedwiththecorrectOracleVMVirtualBoximage.

IfyouwanttotrythislaboutsideofOracleOpenWorld,youwillneedanOracleSolaris11system.Dothefollowingtosetupyourmachine:

IfyoudonothaveOracleSolaris11,downloadithere.DownloadtheOracleSolaris11.1VirtualBoxTemplate(filesize1.7GB).Installthetemplateasdescribedhere.(Note:Onstep4ofExercise2forinstallingthetemplate,settheRAMsizeto4GBinordertogetgoodperformance.)NotesforOracleOpenWorldAttendeesEachattendeewillhavehisorherownlaptopforthelab.Theloginnameandpasswordforthislabareprovidedina"onepager."OracleSolaris11usestheGNOMEdesktop.IfyouhaveusedthedesktopsonLinuxorotherUNIXoperatingsystems,theinterfaceshouldbefamiliar.Herearesomequickbasicsincasetheinterfaceisnewforyou.

InordertoopenaterminalwindowintheGNOMEdesktopsystem,rightclickthebackgroundofthedesktop,andselectOpenTerminalinthepopupmenu.Thefollowingsourcecodeeditorsareprovidedonthelabmachines:vi(typeviinaterminalwindow)andemacs(typeemacsinaterminalwindow).SummaryofLabExercisesThishandsonlabconsistsof13exercisescoveringvariousOracleSolarisandApacheHadooptechnologies:

InstallHadoop.EdittheHadoopconfigurationfiles.ConfiguretheNetworkTimeProtocol.Createthevirtualnetworkinterfaces(VNICs).CreatetheNameNodeandthesecondaryNameNodezones.SetuptheDataNodezones.ConfiguretheNameNode.SetupSSH.FormatHDFSfromtheNameNode.StarttheHadoopcluster.RunaMapReducejob.SecuredataatrestusingZFSencryption.UseOracleSolarisDTraceforperformancemonitoring.TheCaseforHadoopTheApacheHadoopsoftwareisaframeworkthatallowsforthedistributedprocessingoflargedatasetsacrossclustersofcomputersusingsimpleprogrammingmodels.

Tostoredata,HadoopusestheHadoopDistributedFileSystem(HDFS),whichprovideshighthroughputaccesstoapplicationdataandissuitableforapplicationsthathavelargedatasets.

FormoreinformationaboutHadoopandHDFS,seehttp://hadoop.apache.org/.

TheHadoopclusterbuildingblocksareasfollows:

NameNode:ThecenterpieceofHDFS,whichstoresfilesystemmetadata,directstheslaveDataNodedaemonstoperformthelowlevelI/Otasks,

OracleTechnologyNetwork SystemAdminsandDevelopers HandsOnLabs

Products Solutions Downloads Store Support Training Partners About OTN

Account SignOut Help Country Communities Iama... Iwantto...

Welcomerushi

Search



andalsorunstheJobTrackerprocess.SecondaryNameNode:PerformsinternalchecksoftheNameNodetransactionlog.DataNodes:NodesthatstorethedatainHDFS,whicharealsoknownasslavesandruntheTaskTrackerprocess.Intheexamplepresentedinthislab,alltheHadoopclusterbuildingblockswillbeinstalledusingtheOracleSolarisZones,ZFS,andnetworkvirtualizationtechnologies.Figure1showsthearchitecture:

Figure1

Exercise1:InstallHadoopInOracleVMVirtualBox,enableabidirectional"sharedclipboard"betweenthehostandtheguestinordertoenablecopyingandpastingtextfromthisfile.

Figure2

OpenaterminalwindowbyrightclickinganypointinthebackgroundofthedesktopandselectingOpenTerminalinthepopupmenu.

Figure3

Next,switchtotherootuserusingthefollowingcommand.

Note:ForOracleOpenWorldattendees,therootpasswordhasbeenprovidedintheonepagerassociatedwiththislab.ForthoserunningthislaboutsideofOracleOpenWorld,entertherootpasswordyouenteredwhenyoufollowedthestepsinthe"SystemRequirements"section.

root@global_zone:~# su -Password:Oracle Corporation SunOS 5.11 11.1 September 2012Setupthevirtualnetworkinterfacecard(VNIC)inordertoenablenetworkaccesstotheglobalzonefromthenonglobalzones.

Note:OracleOpenWorldattendeescanskipthisstep(becausethepreloadedOracleVMVirtualBoximagealreadyprovidesconfiguredVNICs)andgodirectlytostep16,"Browsethelabsupplementmaterials."

root@global_zone:~# dladm create-vnic -l net0 vnic0root@global_zone:~# ipadm create-ip vnic0root@global_zone:~# ipadm create-addr -T static -a local=192.168.1.100/24 vnic0/addrVerifytheVNICcreation:

root@global_zone:~# ipadm show-addr vnic0ADDROBJ TYPE STATE ADDRvnic0/addr static ok 192.168.1.100/24Createthehadoopholdirectorywewilluseittostorethelabsupplementmaterialsassociatedwiththislab,suchasscriptsandinputfiles.



root@global_zone:~# mkdir -p /usr/local/hadoopholCreatetheBindirectorywewillputtheHadoopbinaryfilethere.

root@global_zone:~# mkdir /usr/local/hadoophol/BinInthislab,wewillusetheApacheHadoop"23Jul2013,2013:Release1.2.1"release.YoucandownloadtheHadoopbinaryfileusingawebbrowser.OpentheFirefoxwebbrowserfromthedesktopanddownloadthefile.

Figure4

CopytheHadooptarballto/usr/local/hadoophol/Bin.

root@global_zone:~# cp /export/home/oracle/Downloads/hadoop-1.2.1.tar.gz /usr/local/hadoophol/Bin/

Note:Bydefault,thefileisdownloadedtotheuser'sDownloadsdirectory.

Next,wearegoingtocreatethelabscripts,socreateadirectoryforthem:

root@global_zone:~# mkdir /usr/local/hadoophol/ScriptsCreatethecreatezonescriptusingyourfavoriteeditor,asshowninListing1.WewillusethisscripttosetuptheOracleSolarisZones.

root@global_zone:~# vi /usr/local/hadoophol/Scripts/createzone

Listing1#!/bin/ksh

# FILENAME: createzone# Create a zone with a VNIC# Usage:# createzone

if [ $# != 2 ]then echo "Usage: createzone " exit 1fi

ZONENAME=$1VNICNAME=$2

zonecfg -z $ZONENAME > /dev/null 2>&1



for transaction in _; do

for i in name-node sec-name-node data-node1 data-node2 data-node3 do

cmd="zlogin $i ls /usr/local > /dev/null 2>&1 " eval $cmd || break 2

done

for i in name-node sec-name-node data-node1 data-node2 data-node3 do cmd="zlogin $i ping name-node > /dev/null 2>&1" eval $cmd || break 2 done

for i in name-node sec-name-node data-node1 data-node2 data-node3 do cmd="zlogin $i ping sec-name-node > /dev/null 2>&1" eval $cmd || break 2 done

for i in name-node sec-name-node data-node1 data-node2 data-node3 do cmd="zlogin $i ping data-node1 > /dev/null 2>&1" eval $cmd || break 2 done



RET=0done

if [ $RET == 0 ] ; thenecho "The cluster is verified"elseecho "Error: unable to verify the cluster"fiexit $RET

CreatetheDocdirectorywewillputtheHadoopinputfilesthere.

root@global_zone:~# mkdir /usr/local/hadoophol/DocDownloadthefollowingeBookfromProjectGutenbergasaplaintextfilewithUTF8encoding:TheOutlineofScience,Vol.1(of4)byJ.ArthurThomson.Copythedownloadedfile(pg20417.txt)intothe/usr/local/hadoophol/Docdirectory.

root@global_zone:~# cp ~oracle/Downloads/pg20417.txt /usr/local/hadoophol/Doc/Browsethelabsupplementmaterialsbytypingthefollowingonthecommandline:

root@global_zone:~# cd /usr/local/hadoophol Onthecommandline,typels -ltoseethecontentofthedirectory:

root@global_zone:~# ls -ltotal 9drwxr-xr-x 2 root root 2 Jul 8 15:11 Bindrwxr-xr-x 2 root root 2 Jul 8 15:11 Docdrwxr-xr-x 2 root root 2 Jul 8 15:12 Scripts

Youcanseethefollowingdirectorystructure:

BinTheHadoopbinarylocationDocTheHadoopinputfilesScriptsThelabscriptsInthislabwewillusetheApacheHadoop"23Jul2013,2013:Release1.2.1"release.CopytheHadooptarballinto/usr/local:

root@global_zone:~# cp /usr/local/hadoophol/Bin/hadoop-1.2.1.tar.gz /usr/localUnpackthetarball:

root@global_zone:~# cd /usr/localroot@global_zone:~# tar -xvfz /usr/local/hadoop-1.2.1.tar.gzCreatethehadoopgroup:

root@global_zone:~# groupadd hadoopAddthehadoopuser:

root@global_zone:~# useradd -m -g hadoop hadoopSettheuser'sHadooppassword.Youcanusewhateverpasswordthatyouwant,butbesureyourememberthepassword.

root@global_zone:~# passwd hadoopCreateasymlinkfortheHadoopbinaries:

root@global_zone:~# ln -s /usr/local/hadoop-1.2.1 /usr/local/hadoopGiveownershiptothehadoopuser:

root@global_zone:~# chown -R hadoop:hadoop /usr/local/hadoop*Changethepermissions:

root@global_zone:~# chmod -R 755 /usr/local/hadoop*Exercise2:EdittheHadoopConfigurationFilesInthisexercise,wewilledittheHadoopconfigurationfiles,whichareshowninTable1:



Table1.HadoopConfigurationFilesFileName Description

hadoop-env.shSpecifiesenvironmentvariablesettingsusedbyHadoop.core-site.xmlSpecifiesparametersrelevanttoallHadoopdaemonsandclients.mapred-site.xml SpecifiesparametersusedbytheMapReducedaemonsandclients.

masters ContainsalistofmachinesthatruntheSecondaryNameNode.

slaves ContainsalistofmachinenamesthatruntheDataNodeandTaskTrackerpairofdaemons.

TolearnmoreabouthowtheHadoopframeworkiscontrolledbytheseconfigurationfiles,seehttp://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/conf/Configuration.html.

Runthefollowingcommandtochangetotheconfdirectory:

root@global_zone:~# cd /usr/local/hadoop/confRunthefollowingcommandstochangethehadoop-env.shscript:

Note:TheclusterconfigurationwillsharetheHadoopdirectorystructure(/usr/local/hadoop)acrossthezonesasareadonlyfilesystem.EveryHadoopclusternodeneedstobeabletowriteitslogstoanindividualdirectory.Thedirectory/var/log/hadoopisabestpracticedirectoryforeveryOracleSolarisZone.

root@global_zone:~# echo "export JAVA_HOME=/usr/java" >> hadoop-env.shroot@global_zone:~# echo "export HADOOP_LOG_DIR=/var/log/hadoop" >> hadoop-env.shEditthemastersfiletoreplacethelocalhostentrywiththelineshowninListing3:

root@global_zone:~# vi masters

Listing3sec-name-nodeEdittheslavesfiletoreplacethelocalhostentrywiththelinesshowninListing4:

root@global_zone:~# vi slaves

Listing4data-node1data-node2data-node3Editthecore-site.xmlfilesoitlookslikeListing5:

root@global_zone:~# vi core-site.xmlNote:fs.default.nameistheURIthatdescribestheNameNodeaddress(protocolspecifier,hostname,andport)forthecluster.EachDataNodeinstancewillregisterwiththisNameNodeandmakeitsdataavailablethroughit.Inaddition,theDataNodessendheartbeatstotheNameNodetoconfirmthateachDataNodeisoperatingandtheblockreplicasithostsareavailable.

Listing5

fs.default.name hdfs://name-node

Editthehdfs-site.xmlfilesoitlookslikeListing6:

root@global_zone:~# vi hdfs-site.xml

Notes:

dfs.data.diristhepathonthelocalfilesysteminwhichtheDataNodeinstanceshouldstoreitsdata.dfs.name.diristhepathonthelocalfilesystemoftheNameNodeinstancewheretheNameNodemetadataisstored.ItisonlyusedbytheNameNodeinstancetofinditsinformation.dfs.replicationisthedefaultreplicationfactorforeachblockofdatainthefilesystem.(Foraproductioncluster,thisshouldusuallybeleftatitsdefaultvalueof3.)Listing6

dfs.data.dir /hdfs/data/

dfs.name.dir /hdfs/name/

dfs.replication 3

Editthemapred-site.xmlfilesoitlookslikeListing7:

root@global_zone:~# vi mapred-site.xml

Note:mapred.job.trackerisahost:portstringspecifyingtheJobTracker'sRPCaddress.

Listing7

mapred.job.tracker name-node:8021

Exercise3:ConfiguretheNetworkTimeProtocol



WeshouldensurethatthesystemclockontheHadoopzonesissynchronizedbyusingtheNetworkTimeProtocol(NTP).

Note:ItisbesttoselectanNTPserverthatcanbeadedicatedtimesynchronizationsourcesothatotherservicesarenotnegativelyaffectedifthemachineisbroughtdownforplannedmaintenance.

Inthefollowingexample,theglobalzoneisconfiguredasanNTPserver.

ConfigureanNTPserver:

root@global_zone:~# cd /etc/inetroot@global_zone:~# cp ntp.server ntp.confroot@global_zone:~# chmod +w /etc/inet/ntp.confroot@global_zone:~# touch /var/ntp/ntp.driftEdittheNTPserverconfigurationfile,asshowninListing8:

root@global_zone:~# vi /etc/inet/ntp.conf

Listing8server 127.127.1.0 preferbroadcast 224.0.1.1 ttl 4enable auth monitordriftfile /var/ntp/ntp.driftstatsdir /var/ntp/ntpstats/filegen peerstats file peerstats type day enablefilegen loopstats file loopstats type day enablefilegen clockstats file clockstats type day enablekeys /etc/inet/ntp.keystrustedkey 0requestkey 0controlkey 0

EnabletheNTPserverservice:

root@global_zone:~# svcadm enable ntpVerifythattheNTPserverisonlinebyusingthefollowingcommand:

root@global_zone:~# svcs -a | grep ntponline 16:04:15 svc:/network/ntp:defaultExercise4:CreatetheVirtualNetworkInterfacesConceptBreak:OracleSolaris11NetworkingVirtualizationTechnologyOracleSolarisprovidesareliable,secure,andscalableinfrastructuretomeetthegrowingneedsofdatacenterimplementations.Itspowerfulnetworkstackarchitecture,alsoknownasProjectCrossbow,providesthefollowing.

NetworkvirtualizationwithvirtualNICs(VNICs)andvirtualswitchingTightintegrationwithOracleSolarisZonesandOracleSolaris10ZonesNetworkresourcemanagement,whichprovidesanefficientandeasywaytomanageintegratedQoStoenforcebandwidthlimitsonVNICsandtrafficflowsAnoptimizednetworkstackthatreactstonetworkloadlevelsTheabilitytobuilda"datacenterinabox"OracleSolarisZonesonthesamesystemcanbenefitfromveryhighnetworkI/Othroughput(uptofourtimesfaster)withverylowlatencycomparedtosystemswith,say,1Gbphysicalnetworkconnections.ForaHadoopcluster,thismeansthattheDataNodescanreplicatetheHDFSblocksmuchfaster.

Formoreinformationaboutnetworkvirtualizationbenchmarks,see"HowtoControlYourApplication'sNetworkBandwidth."

Createaseriesofvirtualnetworkinterfaces(VNICs)forthedifferentzones:

root@global_zone:~# dladm create-vnic -l net0 name_node1root@global_zone:~# dladm create-vnic -l net0 secondary_name1root@global_zone:~# dladm create-vnic -l net0 data_node1root@global_zone:~# dladm create-vnic -l net0 data_node2root@global_zone:~# dladm create-vnic -l net0 data_node3VerifytheVNICscreation:

root@global_zone:~# dladm show-vnicLINK OVER SPEED MACADDRESS MACADDRTYPE VIDname_node1 net0 1000 2:8:20:c6:3e:f1 random 0secondary_name1 net0 1000 2:8:20:b9:80:45 random 0data_node1 net0 1000 2:8:20:30:1c:3a random 0data_node2 net0 1000 2:8:20:a8:b1:16 random 0data_node3 net0 1000 2:8:20:df:89:81 random 0

WecanseethatwehavefiveVNICsnow.Figure5showsthearchitecturelayout:

Figure5

Exercise5:CreatetheNameNodeandSecondaryNameNodeZonesConceptBreak:OracleSolarisZonesOracleSolarisZonesletyouisolateoneapplicationfromothersonthesameOS,allowingyoutocreateanisolatedenvironmentinwhichuserscanloginanddowhattheywantfrominsideanOracleSolarisZonewithoutaffectinganythingoutsidethatzone.Inaddition,OracleSolarisZonesaresecurefromexternalattacksandinternalmaliciousprograms.EachOracleSolarisZonecontainsacompleteresourcecontrolledenvironment



thatallowsyoutoallocateresourcessuchasCPU,memory,networking,andstorage.

Ifyouaretheadministratorwhoownsthesystem,youcanchoosetocloselymanagealltheOracleSolarisZonesoryoucanassignrightstootheradministratorsforspecificOracleSolarisZones.Thisflexibilityletsyoutailoranentirecomputingenvironmenttotheneedsofaparticularapplication,allwithinthesameOS.

FormoreinformationaboutOracleSolarisZones,see"HowtoGetStartedCreatingOracleSolarisZonesinOracleSolaris11."

AlltheHadoopnodesforthislabwillbeinstalledusingOracleSolarisZones.

Ifyoudon'talreadyhaveafilesystemfortheNameNodeandSecondaryNameNodezones,runthefollowingcommand:

root@global_zone:~# zfs create -o mountpoint=/zones rpool/zonesVerifytheZFSfilesystemcreation:

root@global_zone:~# zfs list rpool/zonesNAME USED AVAIL REFER MOUNTPOINTrpool/zones 31K 51.4G 31K /zonesCreatethename-nodezone:

root@global_zone:~# zonecfg -z name-nodeUse 'create' to begin configuring a new zone.Zonecfg:name-node> createcreate: Using system default template 'SYSdefault'zonecfg:name-node> set autoboot=truezonecfg:name-node> set limitpriv=default,dtrace_proc,dtrace_user,sys_timezonecfg:name-node> set zonepath=/zones/name-nodezonecfg:name-node> add fszonecfg:name-node:fs> set dir=/usr/localzonecfg:name-node:fs> set special=/usr/localzonecfg:name-node:fs> set type=lofszonecfg:name-node:fs> set options=[ro,nodevices]zonecfg:name-node:fs> endzonecfg:name-node> add netzonecfg:name-node:net> set physical=name_node1zonecfg:name-node:net> endzonecfg:name-node> verifyzonecfg:name-node> exit (Optional)Youcancreatethename-nodezoneusingthefollowingscript,whichwillcreatethezoneconfigurationfile.Forarguments,thescriptneedsthezonenameandVNICname,forexample:createzone .

root@global_zone:~# /usr/local/hadoophol/Scripts/createzone name-node name_node1Createthesec-name-nodezone:

root@global_zone:~# zonecfg -z sec-name-nodeUse 'create' to begin configuring a new zone.Zonecfg:sec-name-node> createcreate: Using system default template 'SYSdefault'zonecfg:sec-name-node> set autoboot=truezonecfg:sec-name-node> set limitpriv=default,dtrace_proc,dtrace_user,sys_timezonecfg:sec-name-node> set zonepath=/zones/sec-name-nodezonecfg:sec-name-node> add fszonecfg:sec-name-node:fs> set dir=/usr/localzonecfg:sec-name-node:fs> set special=/usr/localzonecfg:sec-name-node:fs> set type=lofszonecfg:sec-name-node:fs> set options=[ro,nodevices]zonecfg:sec-name-node:fs> endzonecfg:sec-name-node> add netzonecfg:sec-name-node:net> set physical=secondary_name1zonecfg:sec-name-node:net> endzonecfg:sec-name-node> verifyzonecfg:sec-name-node> exit

(Optional)Youcancreatethesec-name-nodezoneusingthefollowingscript,whichwillcreatethezoneconfigurationfile.Forarguments,thescriptneedsthezonenameandVNICname,forexample:createzone .

root@global_zone:~: /usr/local/hadoophol/Scripts/createzone sec-name-node secondary_name1Exercise6:SetUptheDataNodeZonesInthisexercise,wewillleveragetheintegrationbetweenOracleSolarisZonesvirtualizationtechnologyandtheZFSfilesystemthatisbuiltintoOracleSolaris.

Table2showsasummaryoftheHadoopzonesconfigurationwewillcreate:

Table2.ZoneSummaryFunction ZoneName ZFSMountPoint VNICName IPAddress

NameNode name-node /zones/name-node name_node1 192.168.1.1SecondaryNameNode sec-name-node/zones/sec-name-nodesecondary_name1192.168.1.2DataNode data-node1 /zones/data-node1 data_node1 192.168.1.3DataNode data-node2 /zones/data-node2 data_node2 192.168.1.4DataNode data-node3 /zones/data-node3 data_node3 192.168.1.5

Createthedata-node1zone:

root@global_zone:~# zonecfg -z data-node1Use 'create' to begin configuring a new zone.zonecfg:data-node1> createcreate: Using system default template 'SYSdefault'zonecfg:data-node1> set autoboot=truezonecfg:data-node1> set limitpriv=default,dtrace_proc,dtrace_user,sys_timezonecfg:data-node1> set zonepath=/zones/data-node1 zonecfg:data-node1> add fszonecfg:data-node1:fs> set dir=/usr/localzonecfg:data-node1:fs> set special=/usr/localzonecfg:data-node1:fs> set type=lofszonecfg:data-node1:fs> set options=[ro,nodevices]zonecfg:data-node1:fs> endzonecfg:data-node1> add netzonecfg:data-node1:net> set physical=data_node1zonecfg:data-node1:net> endzonecfg:data-node1> verifyzonecfg:data-node1> commitzonecfg:data-node1> exit

(Optional)Youcancreatethedata-node1zoneusingthefollowingscript:



root@global_zone:~# /usr/local/hadoophol/Scripts/createzone data-node1 data_node1Createthedata-node2zone:

root@global_zone:~# zonecfg -z data-node2Use 'create' to begin configuring a new zone.zonecfg:data-node2> createcreate: Using system default template 'SYSdefault'zonecfg:data-node2> set autoboot=truezonecfg:data-node2> set limitpriv=default,dtrace_proc,dtrace_user,sys_timezonecfg:data-node2> set zonepath=/zones/data-node2 zonecfg:data-node2> add fszonecfg:data-node2:fs> set dir=/usr/localzonecfg:data-node2:fs> set special=/usr/localzonecfg:data-node2:fs> set type=lofszonecfg:data-node2:fs> set options=[ro,nodevices]zonecfg:data-node2:fs> endzonecfg:data-node2> add netzonecfg:data-node2:net> set physical=data_node2zonecfg:data-node2:net> endzonecfg:data-node2> verifyzonecfg:data-node2> commitzonecfg:data-node2> exit


root@global_zone:~# /usr/local/hadoophol/Scripts/createzone data-node2 data_node2Createthedata-node3zone:

root@global_zone:~# zonecfg -z data-node3Use 'create' to begin configuring a new zone.zonecfg:data-node3> createcreate: Using system default template 'SYSdefault'zonecfg:data-node3> set autoboot=truezonecfg:data-node3> set limitpriv=default,dtrace_proc,dtrace_user,sys_timezonecfg:data-node3> set zonepath=/zones/data-node3zonecfg:data-node3> add fszonecfg:data-node3:fs> set dir=/usr/localzonecfg:data-node3:fs> set special=/usr/localzonecfg:data-node3:fs> set type=lofszonecfg:data-node3:fs> set options=[ro,nodevices]zonecfg:data-node3:fs> endzonecfg:data-node3> add netzonecfg:data-node3:net> set physical=data_node3zonecfg:data-node3:net> endzonecfg:data-node3> verifyzonecfg:data-node3> commitzonecfg:data-node3> exit


root@global_zone:~# /usr/local/hadoophol/Scripts/createzone data-node3 data_node3Exercise7:ConfiguretheNameNodeNow,installthename-nodezonelaterwewillcloneitinordertoacceleratezonecreationtime.

root@global_zone:~# zoneadm -z name-node installThe following ZFS file system(s) have been created: rpool/zones/name-nodeProgress being logged to /var/log/zones/zoneadm.20130106T134835Z.name-node.install Image: Preparing at /zones/name-node/root.Bootthename-nodezone:

root@global_zone:~# zoneadm -z name-node bootCheckthestatusofthezoneswe'vecreated:

root@global_zone:~# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared 1 name-node running /zones/name-node solaris excl - sec-name-node configured /zones/sec-name-node solaris excl - data-node1 configured /zones/data-node1 solaris excl - data-node2 configured /zones/data-node2 solaris excl - data-node3 configured /zones/data-node3 solaris exclLogintothename-nodezone:

root@global_zone:~# zlogin -C name-nodeProvidethezonehostinformationbyusingthefollowingconfigurationforthename-nodezone:

Forthehostname,usename-node.Selectmanualnetworkconfiguration.Ensurethenetworkinterfacename_node1hasanIPaddressof192.168.1.1andanetmaskof255.255.255.0.Ensurethenameserviceisbasedonyournetworkconfiguration.Inthislab,wewilluse/etc/hostsfornameresolution,sowewon'tsetupDNSforhostnameresolution.SelectDonotconfigureDNS.ForAlternateNameService,selectNone.ForTimeZoneRegion,selectAmericas.ForTimeZoneLocation,selectUnitedStates.ForTimeZone,selectPacificTime.Enteryourrootpassword.Afterfinishingthezonesetup,youwillgettheloginprompt.Logintothezoneasuserroot.

name-node console login: rootPassword:DevelopingforHadooprequiresaJavaprogrammingenvironment.YoucaninstallJavaDevelopmentKit(JDK)6usingthefollowingcommand:

root@name-node:~# pkg install jdk-6VerifytheJavainstallation:

root@name-node:~# which java/usr/bin/java

root@name-node:~# java -versionjava version "1.6.0_35"Java(TM) SE Runtime Environment (build 1.6.0_35-b10)Java HotSpot(TM) Client VM (build 20.10-b01, mixed mode)CreateaHadoopuserinsidethename-nodezone:



root@name-node:~# groupadd hadooproot@name-node:~# useradd -m -g hadoop hadooproot@name-node:~# passwd hadoop

Note:ThepasswordshouldbethesamepasswordasyouenteredinStep22ofExercise1whenyousettheuser'sHadooppassword.

CreateadirectoryfortheHadooplogfiles:

root@name-node:~# mkdir /var/log/hadooproot@name-node:~# chown hadoop:hadoop /var/log/hadoopConfigureanNTPclient,asshowninthefollowingexample:

InstalltheNTPpackage:

root@name-node:~# pkg install ntpCreatetheNTPclientconfigurationfiles:

root@name-node:~# cd /etc/inetroot@name-node:~# cp ntp.client ntp.confroot@name-node:~# chmod +w /etc/inet/ntp.confroot@name-node:~# touch /var/ntp/ntp.driftEdittheNTPclientconfigurationfile,asshowninListing9:

root@name-node:~# vi /etc/inet/ntp.conf

Note:Inthislab,weareusingtheglobalzoneasatimeserversoweadditsname(forexample,global-zone)to/etc/inet/ntp.conf.

Listing9server global-zone preferdriftfile /var/ntp/ntp.driftstatsdir /var/ntp/ntpstats/filegen peerstats file peerstats type day enablefilegen loopstats file loopstats type day enableAddtheHadoopclustermembers'hostnamesandIPaddressesto/etc/hosts,asshowninListing10:

root@name-node:~# vi /etc/hosts

Listing10::1 localhost127.0.0.1 localhost loghost192.168.1.1 name-node192.168.1.2 sec-name-node192.168.1.3 data-node1192.168.1.4 data-node2192.168.1.5 data-node3192.168.1.100 global-zoneEnabletheNTPclientservice:

root@name-node:~# svcadm enable ntpVerifytheNTPclientstatus:

root@name-node:~# svcs ntpSTATE STIME FMRIonline 11:15:59 svc:/network/ntp:defaultCheckwhethertheNTPclientcansynchronizeitsclockwiththeNTPserver:

root@name-node:~# ntpq -pExercise8:SetUpSSHSetupSSHkeybasedauthenticationfortheHadoopuseronthename_nodezoneinordertoenablepasswordlesslogintotheSecondaryDataNodeandtheDataNodes:

root@name-node:~# su - hadoophadoop@name-node $ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsahadoop@name-node $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keysEdit$HOME/.profiletoappendtotheendofthefilethelinesshowninListing11:

hadoop@name-node $ vi $HOME/.profile

Listing11# Set JAVA_HOME export JAVA_HOME=/usr/java# Add Hadoop bin/ directory to PATHexport PATH=$PATH:/usr/local/hadoop/bin

Thenrunthefollowingcommand:

hadoop@name-node $ source $HOME/.profileCheckthatHadooprunsbytypingthefollowingcommand:

hadoop@name-node:~$ hadoop versionHadoop 1.2.1Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152Compiled by mattf on Mon Jul 22 15:23:09 PDT 2013From source with checksum 6923c86528809c4e7e6f493b6b413a9a

Note:Press~.toexitfromthename-nodeconsoleandreturntotheglobalzone.

Youcanverifythatyouareintheglobalzoneusingthezonenamecommand:

root@global_zone:~# zonenameglobalFromtheglobalzone,runthefollowingcommandtocreatethesec-name-nodezoneasacloneofname-node:

root@global_zone:~# zoneadm -z name-node shutdown root@global_zone:~# zoneadm -z sec-name-node clone name-nodeBootthesec-name-nodezone:

root@global_zone:~# zoneadm -z sec-name-node bootroot@global_zone:~# zlogin -C sec-name-nodeAsweexperiencedpreviously,thesystemconfigurationtoolislaunched(seeFigure6),sodothefinalconfigurationforsec-name-nodezone:

Note:Allthezonesmusthavethesametimezoneconfigurationandthesamerootpassword.



Figure6

Forthehostname,usesec-name-node.Selectmanualnetworkconfigurationandforthenetworkinterface,usesecondary_name1.UseanIPaddressof192.168.1.2andanetmaskof255.255.255.0.SelectDonotconfigureDNSintheDNSnameservicewindow.EnsureAlternateNameServiceissettoNone.ForTimeZoneRegion,selectAmericas.ForTimeZoneLocation,selectUnitedStates.ForTimeZone,selectPacificTime.Enteryourrootpassword.

Note:Press~.toexitfromthesec-name-nodeconsoleandreturntotheglobalzone.

Performsimilarstepsfordata-node1,data-node2,anddata-node3:

Dothefollowingfordata-node1:

root@global_zone:~# zoneadm -z data-node1 clone name-noderoot@global_zone:~# zoneadm -z data-node1 bootroot@global_zone:~# zlogin -C data-node1

Forthehostname,usedata-node1.Selectmanualnetworkconfigurationandforthenetworkinterface,usedata_node1.UseanIPaddressof192.168.1.3andanetmaskof255.255.255.0.SelectDonotconfigureDNSintheDNSnameservicewindow.EnsureAlternateNameServiceissettoNone.ForTimeZoneRegion,selectAmericas.ForTimeZoneLocation,selectUnitedStates.ForTimeZone,selectPacificTime.Enteryourrootpassword.Dothefollowingfordata-node2:


Forthehostname,usedata-node2.Forthenetworkinterface,usedata_node2.UseanIPaddressof192.168.1.4andanetmaskof255.255.255.0.SelectDonotconfigureDNSintheDNSnameservicewindow.EnsureAlternateNameServiceissettoNone.ForTimeZoneRegion,selectAmericas.ForTimeZoneLocation,selectUnitedStates.ForTimeZone,selectPacificTime.Enteryourrootpassword.Dothefollowingfordata-node3:


Forthehostname,usedata-node3.Forthenetworkinterface,usedata_node3.UseanIPaddressof192.168.1.5andanetmaskof255.255.255.0.SelectDonotconfigureDNSintheDNSnameservicewindow.EnsureAlternateNameServiceissettoNone.ForTimeZoneRegion,selectAmericas.ForTimeZoneLocation,selectUnitedStates.ForTimeZone,selectPacificTime.Enteryourrootpassword.Bootthename_nodezone:

root@global_zone:~# zoneadm -z name-node bootVerifythatallthezonesareupandrunning:

root@global_zone:~# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared 10 sec-name-node running /zones/sec-name-node solaris excl 12 data-node1 running /zones/data-node1 solaris excl 14 data-node2 running /zones/data-node2 solaris excl 16 data-node3 running /zones/data-node3 solaris excl 17 name-node running /zones/name-node solaris exclToverifyyourSSHaccesswithoutusingapasswordfortheHadoopuser,dothefollowing.

Fromname_node,loginviaSSHintoname-node(thatis,toitself):

root@global_zone:~# zlogin name-noderoot@name-node:~# su - hadoop



hadoop@name-node $ ssh name-node

The authenticity of host 'name-node (192.168.1.1)' can't be established.RSA key fingerprint is 04:93:a9:e0:b7:8c:d7:8b:51:b8:42:d7:9f:e1:80:ca.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'name-node,192.168.1.1' (RSA) to the list of known hosts.Now,trytologintosec-name-nodeandtheDataNodes(data-node1,data-node2,anddata-node3).TryloggingintothehostsagainusingSSH.Youshouldn'tgetaprompttoaddthehosttotheknownkeyslist.Editthe/etc/hostsfilesinsidesec-name-nodeandtheDataNodesinordertoaddthename-nodeentry:

root@global_zone:~# zlogin sec-name-node 'echo "192.168.1.1 name-node" >> /etc/hosts' root@global_zone:~# zlogin data-node1 'echo "192.168.1.1 name-node" >> /etc/hosts'root@global_zone:~# zlogin data-node2 'echo "192.168.1.1 name-node" >> /etc/hosts'root@global_zone:~# zlogin data-node3 'echo "192.168.1.1 name-node" >> /etc/hosts'VerifynameresolutionbyensuringthattheglobalzoneandalltheHadoopzoneshavethehostentriesshowninListing12in/etc/hosts:

# cat /etc/hosts

Listing12::1 localhost127.0.0.1 localhost loghost192.168.1.1 name-node192.168.1.2 sec-name-node192.168.1.3 data-node1192.168.1.4 data-node2192.168.1.5 data-node3192.168.1.100 global-zone

Note:IfyouareusingtheglobalzoneasanNTPserver,youmustalsoadditshostnameandIPaddressto/etc/hosts.

Verifytheclusterusingtheverifyclusterscript:

root@global_zone:~# /usr/local/hadoophol/Scripts/verifycluster

Iftheclustersetupisfine,youwillgetacluster is verifiedmessage.

Note:Iftheverifyclusterscriptfailswithanerrormessage,checkthatthe/etc/hostsfileineveryzoneincludesallthezonesnamesasdescribedintheStep12,andthenreruntheverifiabilityscriptagain.

Exercise9:FormatHDFSfromtheNameNodeConceptBreak:HadoopDistributedFileSystem(HDFS)HDFSisadistributed,scalablefilesystem.HDFSstoresmetadataontheNameNode.ApplicationdataisstoredontheDataNodes,andeachDataNodeservesupblocksofdataoverthenetworkusingablockprotocolspecifictoHDFS.ThefilesystemusestheTCP/IPlayerforcommunication.ClientsuseRemoteProcedureCall(RPC)tocommunicatewitheachother.

TheDataNodesdonotrelyondataprotectionmechanisms,suchasRAID,tomakethedatadurable.Instead,thefilecontentisreplicatedonmultipleDataNodesforreliability.

Withthedefaultreplicationvalue(3),whichissetupinthehdfs-site.xmlfile,dataisstoredonthreenodes.DataNodescantalktoeachotherinordertorebalancedata,tomovecopiesaround,andtokeepthereplicationofdatahigh.InFigure7,wecanseethateverydatablockisreplicatedacrossthreedatanodesbasedonthereplicationvalue.

AnadvantageofusingHDFSisdataawarenessbetweentheJobTrackerandTaskTracker.TheJobTrackerschedulesmaporreducejobstoTaskTrackerwithanawarenessofthedatalocation.AnexampleofthiswouldbeifnodeAcontaineddata(x,y,z)andnodeBcontaineddata(a,b,c).ThentheJobTrackerwillschedulenodeBtoperformmaporreducetaskson(a,b,c)andnodeAwouldbescheduledtoperformmaporreducetaskson(x,y,z).Thisreducestheamountoftrafficthatgoesoverthenetworkandpreventsunnecessarydatatransfer..Thisdataawarenesscanhaveasignificantimpactonjobcompletiontimes,whichhasbeendemonstratedwhenrunningdataintensivejobs.

FormoreinformationaboutHadoopHDFSseehttps://en.wikipedia.org/wiki/Hadoop.

Figure7

ToformatHDFS,runthefollowingcommandsandanswerYattheprompt:

root@global_zone:~# zlogin name-noderoot@name-node:~# mkdir -p /hdfs/nameroot@name-node:~# chown -R hadoop:hadoop /hdfsroot@name-node:~# su - hadoophadoop@name-node:$ /usr/local/hadoop/bin/hadoop namenode -format13/10/13 09:10:52 INFO namenode.NameNode: STARTUP_MSG: /************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = name-node/192.168.1.1STARTUP_MSG: args = [-format]STARTUP_MSG: version = 1.2.1STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013STARTUP_MSG: java = 1.6.0_35************************************************************/

hadoop@name-node:$ Re-format filesystem in /hdfs/name ? (Y or N) YOneveryDataNode(data-node1,data-node2,anddata-node3),createaHadoopdatadirectorytostoretheHDFSblocks:

root@global_zone:~# zlogin data-node1root@data-node1:~# mkdir -p /hdfs/dataroot@data-node1:~# chown -R hadoop:hadoop /hdfs



root@global_zone:~# zlogin data-node2root@data-node2:~# mkdir -p /hdfs/dataroot@data-node2:~# chown -R hadoop:hadoop /hdfs

root@global_zone:~# zlogin data-node3root@data-node3:~# mkdir -p /hdfs/dataroot@data-node3:~# chown -R hadoop:hadoop /hdfsExercise10:StarttheHadoopClusterTable3describesthestartupscripts.

Table3.StartupScriptsFileName Description

start-dfs.shStartstheHDFSdaemons,theNameNode,andtheDataNodes.Usethisbeforestart-mapred.sh.stop-dfs.sh StopstheHadoopDFSdaemons.start-mapred.sh StartstheHadoopMapReducedaemons,theJobTracker,andtheTaskTrackers.

stop-mapred.sh StopstheHadoopMapReducedaemons.

Fromthename-nodezone,starttheHadoopDFSdaemons,theNameNode,andtheDataNodesusingthefollowingcommands:

root@global_zone:~# zlogin name-noderoot@name-node:~# su - hadoophadoop@name-node:$ start-dfs.shstarting namenode, logging to /var/log/hadoop/hadoop--namenode-name-node.outdata-node2: starting datanode, logging to /var/log/hadoop/hadoop-hadoop-datanode-data-node2.outdata-node1: starting datanode, logging to /var/log/hadoop/hadoop-hadoop-datanode-data-node1.outdata-node3: starting datanode, logging to /var/log/hadoop/hadoop-hadoop-datanode-data-node3.outsec-name-node: starting secondarynamenode, logging to /var/log/hadoop/hadoop-hadoop-secondarynamenode-sec-name-node.outStarttheHadoopMap/Reducedaemons,theJobTracker,andtheTaskTrackersusingthefollowingcommand:

hadoop@name-node:$ start-mapred.shstarting jobtracker, logging to /var/log/hadoop/hadoop--jobtracker-name-node.outdata-node1: starting tasktracker, logging to /var/log/hadoop/hadoop-hadoop-tasktracker-data-node1.outdata-node3: starting tasktracker, logging to /var/log/hadoop/hadoop-hadoop-tasktracker-data-node3.outdata-node2: starting tasktracker, logging to /var/log/hadoop/hadoop-hadoop-tasktracker-data-node2.outToviewacomprehensivestatusreport,executethefollowingcommandtochecktheclusterstatus.Thecommandwilloutputbasicstatisticsabouttheclusterhealth,suchasNameNodedetails,thestatusofeachDataNode,anddiskcapacityamounts.

hadoop@name-node:$ hadoop dfsadmin -reportConfigured Capacity: 171455269888 (159.68 GB)Present Capacity: 169711053357 (158.06 GB)DFS Remaining: 169711028736 (158.06 GB)DFS Used: 24621 (24.04 KB)DFS Used%: 0%Under replicated blocks: 0Blocks with corrupt replicas: 0Missing blocks: 0-------------------------------------------------Datanodes available: 3 (3 total, 0 dead)...

YoushouldseethatthreeDataNodesareavailable.

Note:YoucanfindthesameinformationontheNameNodewebstatuspage(showninFigure8)athttp://:50070/dfshealth.jsp.ThenamenodeIPaddressis192.168.1.1.

Figure8

Exercise11:RunaMapReduceJobConceptBreak:MapReduceMapReduceisaframeworkforprocessingparallelizableproblemsacrosshugedatasetsusingaclusterofcomputers.

TheessentialideaofMapReduceisusingtwofunctionstograbdatafromasource:usingtheMap()functionandthenprocessingthedataacrossaclusterofcomputersusingtheReduce()function.Specifically,Map()willapplyafunctiontoallthemembersofadatasetandpostaresultset,whichReduce()willthencollateandresolve.

Map()andReduce()canberuninparallelandacrossmultiplesystems.

FormoreinformationaboutMapReduce,seehttp://en.wikipedia.org/wiki/MapReduce.

WewillusetheWordCountexample,whichreadstextfilesandcountshowoftenwordsoccur.Theinputandoutputconsistoftextfiles,eachlineof



whichcontainsawordandthenumberoftimesthewordoccurred,separatedbyatab.FormoreinformationaboutWordCount,seehttp://wiki.apache.org/hadoop/WordCount.

Createtheinputdatadirectorywewillputtheinputfilesthere.

hadoop@name-node:$ hadoop fs -mkdir /input-dataVerifythedirectorycreation:

hadoop@name-node:$ hadoop dfs -ls /Found 1 itemsdrwxr-xr-x - hadoop supergroup 0 2013-10-13 23:45 /input-dataCopythepg20417.txtfileyoudownloadedearliertoHDFSusingthefollowingcommand:

Note:OracleOpenWorldattendeescanfindthepg20417.txtfileinthe/usr/local/hadoophol/Docdirectory.

hadoop@name-node:$ hadoop dfs -copyFromLocal /usr/local/hadoophol/Doc/pg20417.txt /input-dataVerifythatthefileislocatedonHDFS:

hadoop@name-node:$ hadoop dfs -ls /input-dataFound 1 items-rw-r--r-- 3 hadoop supergroup 674570 2013-10-13 10:20 /input-data/pg20417.txtCreatetheoutputdirectorytheMapReducejobwillputitsoutputsinthisdirectory:

hadoop@name-node:$ hadoop fs -mkdir /output-dataStarttheMapReducejobusingthefollowingcommand:

hadoop@name-node:$ hadoop jar /usr/local/hadoop/hadoop-examples-1.2.1.jar wordcount /input-data/pg20417.txt /output-data/output1

13/10/13 10:23:08 INFO input.FileInputFormat: Total input paths to process : 113/10/13 10:23:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable13/10/13 10:23:08 WARN snappy.LoadSnappy: Snappy native library not loaded13/10/13 10:23:09 INFO mapred.JobClient: Running job: job_201310130918_001013/10/13 10:23:10 INFO mapred.JobClient: map 0% reduce 0%13/10/13 10:23:19 INFO mapred.JobClient: map 100% reduce 0%13/10/13 10:23:29 INFO mapred.JobClient: map 100% reduce 33%13/10/13 10:23:31 INFO mapred.JobClient: map 100% reduce 100%13/10/13 10:23:34 INFO mapred.JobClient: Job complete: job_201310130918_001013/10/13 10:23:34 INFO mapred.JobClient: Counters: 26

Theprogramtakesabout60secondstoexecuteonthecluster.

Allofthefilesintheinputdirectory(input-datainthecommandlineshownabove)arereadandthecountsforthewordsintheinputarewrittentotheoutputdirectory(calledoutput-data/output1).

Verifytheoutputdata:

hadoop@name-node:$ hadoop dfs -ls /output-data/output1Found 3 items-rw-r--r-- 3 hadoop supergroup 0 2013-10-13 10:30 /output-data/output1/_SUCCESSdrwxr-xr-x - hadoop supergroup 0 2013-10-13 10:30 /output-data/output1/_logs-rw-r--r-- 3 hadoop supergroup 196192 2013-10-13 10:30 /output-data/output1/part-r-00000Exercise12:UseZFSEncryptionConceptBreak:ZFSEncryptionOracleSolaris11addstransparentdataencryptionfunctionalitytoZFS.Alldataandfilesystemmetadata(suchasownership,accesscontrollists,quotainformation,andsoon)isencryptedwhenstoredpersistentlyintheZFSpool.

AZFSpoolcansupportamixofencryptedandunencryptedZFSdatasets(filesystemsandZVOLs).DataencryptioniscompletelytransparenttoapplicationsandotherOracleSolarisfileservices,suchasNFSorCIFS.SinceencryptionisafirstclassfeatureofZFS,weareabletosupportcompression,encryption,anddeduplicationtogether.Encryptionkeymanagementforencrypteddatasetscanbedelegatedtousers,OracleSolarisZones,orboth.OracleSolariswithZFSencryptionprovidesaveryflexiblesystemforsecuringdataatrest,anditdoesn'trequireanyapplicationchangesorqualification.

FormoreinformationaboutZFSencryption,see"HowtoManageZFSDataEncryption."

Theoutputdatacancontainsensitiveinformation,souseZFSencryptiontoprotecttheoutputdata.

CreatetheencryptedZFSdataset:

Note:Youneedtoprovidethepassphraseitmustbeatleasteightcharacters.

root@name-node:~# zfs create -o encryption=on rpool/export/output

Enter passphrase for 'rpool/export/output':Enter again:VerifythattheZFSdatasetisencrypted:

root@name-node:~# zfs get all rpool/export/output | grep encryrpool/export/output encryption on localChangetheownership:

root@name-node:~# chown hadoop:hadoop /export/outputCopytheoutputfilefromHDFSintoZFS:

root@name-node:~# su - hadoopOracle Corporation SunOS 5.11 11.1 September 2012

hadoop@name-node:$ hadoop dfs -getmerge /output-data/output1 /export/output Analyzetheoutputtextfile.Eachlinecontainsawordandthenumberoftimesthewordoccurred,separatedbyatab.

hadoop@name-node:$ head /export/output/output1"A 2"Alpha 1"Alpha," 1"An 2"And 1"BOILING" 2"Batesian" 1"Beta 2ProtecttheoutputtextfilebyunmountingtheZFSdataset,andthenunloadthewrappingkeyforanencrypteddatasetusingthefollowingcommand:

root@name-node:~# zfs key -u rpool/export/output



Ifthecommandissuccessful,thedatasetisnotaccessibleanditisunmounted.

IfyouwanttomountthisZFSfilesystem,youneedtoprovidethepassphrase:

root@name-node:~# zfs mount rpool/export/outputEnter passphrase for 'rpool/export/output':

Byusingapassphrase,youensurethatonlythosewhoknowthepassphrasecanobservetheoutputfile.

Exercise13:UseOracleSolarisDTraceforPerformanceMonitoringConceptBreak:OracleSolarisDTraceOracleSolarisDTraceisacomprehensive,advancedtracingtoolfortroubleshootingsystematicproblemsinrealtime.Administrators,integrators,anddeveloperscanuseDTracetodynamicallyandsafelyobserveliveproductionsystems,includingbothapplicationsandtheoperatingsystemitself,forperformanceissues.

DTraceallowsyoutoexploreasystemtounderstandhowitworks,trackdownproblemsacrossmanylayersofsoftware,andlocatethecauseofanyaberrantbehavior.Whetherit'satahighlevelglobaloverview,suchmemoryconsumptionorCPUtime,oratamuchfinergrainedlevel,suchaswhatspecificfunctioncallsarebeingmade,DTracecanprovideoperationalinsightsthathavebeenmissinginthedatacenterbyenablingyoutodothefollowing:

Insert80,000+probepointsacrossallfacetsoftheoperatingsystem.Instrumentuserandsystemlevelsoftware.Useapowerfulandeasytousescriptinglanguageandcommandlineinterfaces.FormoreinformationaboutDTrace,seehttp://www.oracle.com/technetwork/serverstorage/solaris11/technologies/dtrace1930301.html.

Openanotherterminalwindowandloginintoname-nodeasuserhadoop.RunthefollowingMapReducejob:

hadoop@name-node:$ hadoop jar /usr/local/hadoop/hadoop-examples-1.2.1.jar wordcount /input-data/pg20417.txt /output-data/output2WhentheHadoopjobisrun,determinewhatprocessesareexecutedontheNameNode.

Intheterminalwindow,runthefollowingDTracecommand:

root@global-zone:~# dtrace -n 'proc:::exec-success/strstr(zonename,"name-node")>0/ { trace(curpsinfo->pr_psargs); }'

dtrace: description 'proc:::exec-success' matched 1 probe

CPU ID FUNCTION:NAME 0 4473 exec_common:exec-success /usr/bin/env bash /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/hadoop-exa 0 4473 exec_common:exec-success bash /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/hadoop-examples-1.1.2.j 0 4473 exec_common:exec-success dirname /usr/local/hadoop-1.1.2/libexec/-- 0 4473 exec_common:exec-success dirname /usr/local/hadoop-1.1.2/libexec/-- 0 4473 exec_common:exec-success sed -e s/ /_/g 1 4473 exec_common:exec-success dirname /usr/local/hadoop/bin/hadoop 1 4473 exec_common:exec-success dirname -- /usr/local/hadoop/bin/../libexec/hadoop-config.sh 1 4473 exec_common:exec-success basename -- /usr/local/hadoop/bin/../libexec/hadoop-config.sh 1 4473 exec_common:exec-success basename /usr/local/hadoop-1.1.2/libexec/-- 1 4473 exec_common:exec-success uname 1 4473 exec_common:exec-success /usr/java/bin/java -Xmx32m org.apache.hadoop.util.PlatformName 1 4473 exec_common:exec-success /usr/java/bin/java -Xmx32m org.apache.hadoop.util.PlatformName 0 4473 exec_common:exec-success /usr/java/bin/java -Dproc_jar -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop -Dhado 0 4473 exec_common:exec-success /usr/java/bin/java -Dproc_jar -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop -DhadoC^

Note:PressCtrlcinordertoseetheDTraceoutput.

WhentheHadoopjobisrun,determinewhatfilesarewrittentotheNameNode.

Note:IftheMapReducejobisfinished,youcanrunanotherjobwithadifferentoutputdirectory(forexample,/output-data/output3).

Forexample:

hadoop@name-node:$ hadoop jar /usr/local/hadoop/hadoop-examples-1.2.1.jar wordcount /input-data/pg20417.txt /output-data/output3

root@global-zone:~# dtrace -n 'syscall::write:entry/strstr(zonename,"name-node")>0/ {@write[fds[arg0].fi_pathname]=count();}'

dtrace: description 'syscall::write:entry' matched 1 probeC^

/zones/name-node/root/tmp/hadoop-hadoop/mapred/local/jobTracker/.job_201307181457_0007.xml.crc 1 /zones/name-node/root/var/log/hadoop/history/.job_201307181457_0007_conf.xml.crc 1 /zones/name-node/root/dev/pts/3 5 /zones/name-node/root/var/log/hadoop/job_201307181457_0007_conf.xml 6 /zones/name-node/root/tmp/hadoop-hadoop/mapred/local/jobTracker/job_201307181457_0007.xml 8 /zones/name-node/root/var/log/hadoop/history/job_201307181457_0007_conf.xml 11 /zones/name-node/root/var/log/hadoop/hadoop--jobtracker-name-node.log 13 /zones/name-node/root/hdfs/name/current/edits.new 25 /zones/name-node/root/var/log/hadoop/hadoop--namenode-name-node.log 45 /zones/name-node/root/dev/poll 207 3131655

Note:PressCtrlcinordertoseetheDTraceoutput.

WhentheHadoopjobisrun,determinewhatprocessesareexecutedontheDataNode:

root@global-zone:~# dtrace -n 'proc:::exec-success/strstr(zonename,"data-node1")>0/ { trace(curpsinfo->pr_psargs); }'

dtrace: description 'proc:::exec-success' matched 1 probe

CPU ID FUNCTION:NAME 0 8833 exec_common:exec-success dirname /usr/local/hadoop/bin/hadoop 0 8833 exec_common:exec-success dirname /usr/local/hadoop/libexec/-- 0 8833 exec_common:exec-success sed -e s/ /_/g 1 8833 exec_common:exec-success dirname -- /usr/local/hadoop/bin/../libexec/hadoop-config.sh 2 8833 exec_common:exec-success basename /usr/local/hadoop/libexec/-- 2 8833 exec_common:exec-success /usr/java/bin/java -Xmx32m org.apache.hadoop.util.PlatformName 2 8833 exec_common:exec-success /usr/java/bin/java -Xmx32m org.apache.hadoop.util.PlatformName 3 8833 exec_common:exec-success /usr/bin/env bash /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/hadoop-exa 3 8833 exec_common:exec-success bash /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/hadoop-examples-1.0.4.j 3 8833 exec_common:exec-success basename -- /usr/local/hadoop/bin/../libexec/hadoop-config.sh 3 8833 exec_common:exec-success dirname /usr/local/hadoop/libexec/-- 3 8833 exec_common:exec-success uname 3 8833 exec_common:exec-success /usr/java/bin/java -Dproc_jar -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop -Dhado



3 8833 exec_common:exec-success /usr/java/bin/java -Dproc_jar -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop -DhadoC^

WhentheHadoopjobisrun,determinewhatfilesarewrittenontheDataNode:

(Therewere222linesofoutput,whichwerereducedforreadability.)

root@global-zone:~# dtrace -n 'syscall::write:entry/strstr(zonename,"data-node1")>0/ {@write[fds[arg0].fi_pathname]=count();}'

dtrace: description 'syscall::write:entry' matched 1 probe

C^ /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_-5404946161781239203 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_-5404946161781239203_1103.meta 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_-6136035696057459536 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_-6136035696057459536_1102.meta 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_-8420966433041064066 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_-8420966433041064066_1105.meta 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_1792925233420187481 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_1792925233420187481_1101.meta 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_4108435250688953064 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_4108435250688953064_1106.meta 1 /zones/data-node1/root/hdfs/data/blocksBeingWritten/blk_8503732348705847964 DeterminethetotalamountofHDFSdatawrittenfortheDataNodes:

root@global-zone:~# dtrace -n 'syscall::write:entry / ( strstr(zonename,"data-node1")!=0 || strstr(zonename,"data-node2")!=0 || strstr(zonename,"data-node3")!=0 ) && strstr(fds[arg0].fi_pathname,"hdfs")!=0 && strstr(fds[arg0].fi_pathname,"blocksBeingWritten")>0/ { @write[fds[arg0].fi_pathname]=sum(arg2); }'C^

SummaryInthislab,welearnedhowtosetupaHadoopclusterusingOracleSolaris11technologiessuchasOracleSolarisZones,ZFS,andnetworkvirtualizationandDTrace.

SeeAlsoHadoopandHDFSHadoopframework"HowtoControlYourApplication'sNetworkBandwidth""HowtoGetStartedCreatingOracleSolarisZonesinOracleSolaris11""HowtoSetUpaHadoopClusterUsingOracleSolarisZones""HowtoBuildNativeHadoopLibrariesforOracleSolaris11"MapReduceWordCount"HowtoManageZFSDataEncryption"DTraceAbouttheAuthorOrgadKimchiisaprincipalsoftwareengineerontheISVEngineeringteamatOracle(formerlySunMicrosystems).For6yearshehasspecializedinvirtualization,bigdata,andcloudcomputingtechnologies.

Revision1.0,10/21/2013

Followus:Blog|Facebook|Twitter|YouTube

Emailthispage PrinterView

ORACLECLOUDLearnAboutOracleCloudGetaFreeTrialLearnAboutPaaSLearnAboutSaaSLearnAboutIaaS

JAVALearnAboutJavaDownloadJavaforConsumersDownloadJavaforDevelopersJavaResourcesforDevelopersJavaCloudServiceJavaMagazine

CUSTOMERSANDEVENTSExploreandReadCustomerStoriesAllOracleEventsOracleOpenWorldJavaOne

COMMUNITIESBlogsDiscussionForumsWikisOracleACEsUserGroupsSocialMediaChannels

SERVICESANDSTORELogIntoMyOracleSupportTrainingandCertificationBecomeaPartnerFindaPartnerSolutionPurchasefromtheOracleStore

CONTACTANDCHATPhone:+1.800.633.0738GlobalContactsOracleSupportPartnerSupport

Subscribe Careers ContactUs SiteMaps LegalNotices TermsofUse Privacy CookiePreferences OracleMobile

How to Set Up a Hadoop Cluster Using Oracle Solaris

Documents

Transcript of How to Set Up a Hadoop Cluster Using Oracle Solaris