Cloudera CCD-333.docx

download Cloudera CCD-333.docx

of 12

Transcript of Cloudera CCD-333.docx

  • 8/20/2019 Cloudera CCD-333.docx

    1/39

  • 8/20/2019 Cloudera CCD-333.docx

    2/39

  • 8/20/2019 Cloudera CCD-333.docx

    3/39

  • 8/20/2019 Cloudera CCD-333.docx

    4/39

  • 8/20/2019 Cloudera CCD-333.docx

    5/39

  • 8/20/2019 Cloudera CCD-333.docx

    6/39

  • 8/20/2019 Cloudera CCD-333.docx

    7/39

  • 8/20/2019 Cloudera CCD-333.docx

    8/39

  • 8/20/2019 Cloudera CCD-333.docx

    9/39

  • 8/20/2019 Cloudera CCD-333.docx

    10/39

  • 8/20/2019 Cloudera CCD-333.docx

    11/39

  • 8/20/2019 Cloudera CCD-333.docx

    12/39

  • 8/20/2019 Cloudera CCD-333.docx

    13/39

  • 8/20/2019 Cloudera CCD-333.docx

    14/39

  • 8/20/2019 Cloudera CCD-333.docx

    15/39

  • 8/20/2019 Cloudera CCD-333.docx

    16/39

  • 8/20/2019 Cloudera CCD-333.docx

    17/39

  • 8/20/2019 Cloudera CCD-333.docx

    18/39

  • 8/20/2019 Cloudera CCD-333.docx

    19/39

  • 8/20/2019 Cloudera CCD-333.docx

    20/39

  • 8/20/2019 Cloudera CCD-333.docx

    21/39

  • 8/20/2019 Cloudera CCD-333.docx

    22/39

  • 8/20/2019 Cloudera CCD-333.docx

    23/39

  • 8/20/2019 Cloudera CCD-333.docx

    24/39

  • 8/20/2019 Cloudera CCD-333.docx

    25/39

  • 8/20/2019 Cloudera CCD-333.docx

    26/39

  • 8/20/2019 Cloudera CCD-333.docx

    27/39

  • 8/20/2019 Cloudera CCD-333.docx

    28/39

  • 8/20/2019 Cloudera CCD-333.docx

    29/39

  • 8/20/2019 Cloudera CCD-333.docx

    30/39

    EAll intermediate "alues associated with a gi"en output ey are subsequently groupedby theframewor , and passed to the )educer4s6 to determine the final output.)eference'Nuestions ] Answers for Dadoop 2ap)educe de"elopers,Where is the2apper =utput

    4intermediate ay!"alue data6 stored ?Cloudera CC8!&&& # am;A Composite Solution With

  • 8/20/2019 Cloudera CCD-333.docx

    31/39

    >ote'EA 2ap)educe job withm mappers and r reducers in"ol"es up to mEr distinct copyoperations,Cloudera CC8!&&& # am;A Composite Solution With o, a combiner would not be useful in this case.&. 1es.C. 1es, but the number of unique eys must be nown in ad"ance.D. 1es, as long as all the eys fit into memory on each node.E. 1es, as long as all the integer "alues that share the same ey fit into memory oneach node.Ans'er: &E(planation:QUE !"#$ $#: *What happens in a 2ap)educe job when you set the number of reducers to ero?A. >o reducer e ecutes, but the mappers generate no output.&. >o reducer e ecutes, and the output of each mapper is written to a separate file inD8FS.C. >o reducer e ecutes, but the outputs of all the mappers are gathered together andwritten to asingle file in D8FS.D. Setting the number of reducers to ero is in"alid, and an e ception is thrown.Ans'er: &E(planation: E3t is legal to set the number of reduce!tas s to ero if no reduction isdesired.3n this case the outputs of the map!tas s go directly to the FileSystem, into the outputpath set byset=utput ath4 ath6. %he framewor does not sort the map!outputs before writing themout to theFileSystem.E=ften, you may want to process input data using a map function only. %o do this, simplysetmapreduce.job.reduces to ero. %he 2ap)educe framewor will not create any reducertas s.

  • 8/20/2019 Cloudera CCD-333.docx

    32/39

    Cloudera CC8!&&& # am;A Composite Solution With

  • 8/20/2019 Cloudera CCD-333.docx

    33/39

    ;A Composite Solution With

  • 8/20/2019 Cloudera CCD-333.docx

    34/39

    as soon as they are a"ailable. %he programmer defined reduce method is called onlyafter all themappers ha"e finished.)eference'0B 3nter"iew Nuestions ] Answers for Dadoop 2ap)educede"elopers,When is the

    reducers are started in a 2ap)educe job?http'$$www.fromde".com$0L/L$/0$inter"iew!questions!hadoop!mapreduce.html4questionno. /@6QUE !"#$ $#: 5)What happens in a 2ap)educe job when you set the number of reducers to one?A. A single reducer gathers and processes all the output from all the mappers. %heoutput iswritten in as many separate files as there are mappers.&. A single reducer gathers and processes all the output from all the mappers. %heoutput iswritten to a single file in D8FS.

    C. Setting the number of reducers to one creates a processing bottlenec , and since thenumber of reducers as specified by the programmer is used as a reference "alue only, the2ap)educeruntime pro"ides a default setting for the number of reducers.D. Setting the number of reducers to one is in"alid, and an e ception is thrown.Ans'er: AE(planation: E3t is legal to set the number of reduce!tas s to ero if no reduction isdesired.3n this case the outputs of the map!tas s go directly to the FileSystem, into the outputpath setbyset=utput ath4 ath6. %he framewor does not sort the map!outputs before writingthem out tothe FileSystem.E=ften, you may want to process input data using a map function only. %o do this, simplysetmapreduce.job.reduces to ero. %he 2ap)educe framewor will not create any reducertas s.)ather, the outputs of the mapper tas s will be the final output of the job.QUE !"#$ $#: 533n the standard word count 2ap)educe algorithm, why might using a combiner reducethe o"erall

  • 8/20/2019 Cloudera CCD-333.docx

    35/39

    mappers that need to run.C. +ecause combiners perform local aggregation of word counts, and then transfer thatdata toreducers without writing the intermediate data to dis .D. +ecause combiners perform local aggregation of word counts, thereby reducing the

    number of ey!"alue pairs that need to be snuff let across the networ to the reducers.Ans'er: AE(planation:ESimply spea ing a combiner can be considered as aMmini reducerOthat will be appliedpotentiallyse"eral times still during the map phase before to send the new 4hopefully reduced6 setof

    ey$"alue pairs to the reducer4s6. %his is why a combiner must implement the )educerinterface4or e tend the )educer class as of hadoop L.0L6.

    ECombiners are used to increase the efficiency of a 2ap)educe program. %hey areused toaggregate intermediate map output locally on indi"idual mapper outputs. Combiners canhelp youreduce the amount of data that needs to be transferred across to the reducers. 1ou canuse your reducer code as a combiner if the operation performed is commutati"e and associati"e.%hee ecution of combiner is not guaranteed, Dadoop may or may not e ecute a combiner.

    Also, if required it may e ecute it more then / times. %herefore your 2ap)educe jobs shouldnot dependon the combiners e ecution.)eference'0B 3nter"iew Nuestions ] Answers for Dadoop 2ap)educe de"elopers,Whatarecombiners? When should 3 use a combiner in my 2ap)educe

  • 8/20/2019 Cloudera CCD-333.docx

    36/39

    D8FS are those that deal with large data sets. %hese applications write their data onlyonce butthey read it one or more times and require these reads to be satisfied at streamingspeeds. D8FSCloudera CC8!&&& # am

    ;A Composite Solution With ame>ode for the bloc location4s6. %he >ame>ode returnsthe bloclocation4s6 to the client. %he client reads the data directly off the 8ata>ode4s6.&. %he client queries all 8ata>odes in parallel. %he 8ata>ode that contains therequested dataresponds directly to the client. %he client reads the data directly off the 8ata>ode.C. %he client contacts the >ame>ode for the bloc location4s6. %he >ame>ode thenqueries the8ata>odes for bloc locations. %he 8ata>odes respond to the >ame>ode, and the>ame>ode

  • 8/20/2019 Cloudera CCD-333.docx

    37/39

    redirects the client to the 8ata>ode that holds the requested data bloc 4s6. %he clientthen readsthe data directly off the 8ata>ode.Cloudera CC8!&&& # am;A Composite Solution With ame>ode for the bloc location4s6. %he >ame>ode contactsthe8ata>ode that holds the requested data bloc . 8ata is transferred from the8ata>ode to the>ame>ode, and then from the >ame>ode to the client.Ans'er: CE(planation: %he Client communication to D8FS happens using Dadoop D8FS A 3.Clientapplications tal to the >ame>ode whene"er they wish to locate a file, or when theywant toadd$copy$mo"e$delete a file on D8FS. %he >ame>ode responds the successfulrequests by

    returning a list of rele"ant 8ata>ode ser"ers where the data li"es. Client applicationscan taldirectly to a 8ata>ode, once the >ame>ode has pro"ided the location of the data.)eference' 0B 3nter"iew Nuestions ] Answers for Dadoop 2ap)educe de"elopers,Dowthe Clientcommunicates with D8FS?QUE !"#$ $#: 51ou need to create a job that does frequency analysis on input data. 1ou will do this bywriting a2apper that uses %e t3nputForma and splits each "alue 4a line of te t from an input file6intoindi"idual characters. For each one of these characters, you will emit the character as a

    ey andas 3ntWritable as the "alue. Since this will produce proportionally more intermediatedata thaninput data, which resources could you e pect to be li ely bottlenec s?A. rocessor and )A2&. rocessor and dis 3$=C. 8is 3$= and networ 3$=D. rocessor and networ 3$=Ans'er: &E(planation:QUE !"#$ $#: 5Which of the following statements best describes how a large 4/LL -+6 file is stored inD8FS?A. %he file is di"ided into "ariable si e bloc s, which are stored on multiple data nodes.#ach blocis replicated three times by default.&. %he file is replicated three times by default. #achcopy of the file is stored on aseparate

  • 8/20/2019 Cloudera CCD-333.docx

    38/39

    datanodes.C. %he master copy of the file is stored on a single datanode. %he replica copies aredi"ided intoCloudera CC8!&&& # am;A Composite Solution With odes, each with a single / %+ hard dri"e. 1ou utili e all yourdiscapacity for D8FS, reser"ing none for 2ap)educe. 1ou implement default replicationsettings.What is the storage capacity of your Dadoop cluster 4assuming no compression6?A. about & %+&. about %+C. about /L %+

  • 8/20/2019 Cloudera CCD-333.docx

    39/39

    D. about // %+Ans'er: AE(planation: 3n default configuration there are total & copies of a databloc on D8FS, 0copiesare stored on datanodes on same rac and &rd copy on a different rac .

    >ote'D8FS is designed to reliably store "ery large files across machines in a largecluster. 3tstores each file as a sequence of bloc s7 all bloc s in a file e cept the last bloc are thesamesi e. %he bloc s of a file are replicated for fault tolerance. %he bloc si e and replicationfactor areconfigurable per file. An application can specify the number of replicas of a file. %hereplicationCloudera CC8!&&& # am;A Composite Solution With