Fundamentals of MapReduce - UMIACSjimmylin/cloud-computing/2009-03-short-cou… · 1 Fundamentals...

17
1 Fundamentals of MapReduce Jimmy Lin The iSchool University of Maryland Monday, March 30, 2009 This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License) This afternoon... | Introduction to MapReduce | MapReduce “killer app” #1: text retrieval | MapReduce “killer app” #2: graph algorithms Introduction to MapReduce Introduction to MapReduce: Topics | Functional programming | MapReduce | Distributed file system Functional Programming | MapReduce = functional programming meets distributed processing on steroids z Not a new idea… dates back to the 50’s (or even 30’s) | What is functional programming? z Computation as application of functions z Theoretical foundation provided by lambda calculus z Theoretical foundation provided by lambda calculus | How is it different? z Traditional notions of “data” and “instructions” are not applicable z Data flows are implicit in program z Different orders of execution are possible | Exemplified by LISP and ML Overview of Lisp | Lisp Lost In Silly Parentheses | We’ll focus on particular a dialect: “Scheme” | Lists are primitive data types '(1 2 3 4 5) '((a1)(b2)(c3)) | Functions written in prefix notation (+ 1 2) 3 (* 3 4) 12 (sqrt (+ (* 3 3) (* 4 4))) 5 (define x 3) x (* x 5) 15

Transcript of Fundamentals of MapReduce - UMIACSjimmylin/cloud-computing/2009-03-short-cou… · 1 Fundamentals...

  • 1

    Fundamentals of MapReduce

    Jimmy LinThe iSchoolUniversity of Maryland

    Monday, March 30, 2009

    This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details

    Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)

    This afternoon...Introduction to MapReduce

    MapReduce “killer app” #1: text retrieval

    MapReduce “killer app” #2: graph algorithms

    Introduction to MapReducep

    Introduction to MapReduce: TopicsFunctional programming

    MapReduce

    Distributed file system

    Functional ProgrammingMapReduce = functional programming meets distributed processing on steroids

    Not a new idea… dates back to the 50’s (or even 30’s)

    What is functional programming?Computation as application of functionsTheoretical foundation provided by lambda calculusTheoretical foundation provided by lambda calculus

    How is it different?Traditional notions of “data” and “instructions” are not applicableData flows are implicit in programDifferent orders of execution are possible

    Exemplified by LISP and ML

    Overview of LispLisp ≠ Lost In Silly Parentheses

    We’ll focus on particular a dialect: “Scheme”

    Lists are primitive data types'(1 2 3 4 5)

    '((a 1) (b 2) (c 3))

    Functions written in prefix notation

    (+ 1 2) → 3(* 3 4) → 12(sqrt (+ (* 3 3) (* 4 4))) → 5(define x 3) → x(* x 5) → 15

    (( ) ( ) ( ))

  • 2

    FunctionsFunctions = lambda expressions bound to variables

    Syntactic sugar for defining functions

    (define foo(lambda (x y)(sqrt (+ (* x x) (* y y)))))

    Above expressions is equivalent to:

    Once defined, function can be applied:

    (define (foo x y)(sqrt (+ (* x x) (* y y))))

    (foo 3 4) → 5

    Other FeaturesIn Scheme, everything is an s-expression

    No distinction between “data” and “code”Easy to write self-modifying code

    Higher-order functionsFunctions that take other functions as arguments

    (define (bar f x) (f (f x)))

    (define (baz x) (* x x))

    (bar baz 2) → 16

    Doesn’t matter what f is, just apply it twice.

    Recursion is your friendSimple factorial example

    (define (factorial n)  (if (= n 1)

    1(* n (factorial (‐ n 1)))))

    (factorial 6) → 720

    Even iteration is written with recursive calls!(define (factorial‐iter n)(define (aux n top product)(if (= n top)

    (* n product)(aux (+ n 1) top (* n product))))

    (aux 1 n 1))

    (factorial‐iter 6) → 720

    Lisp → MapReduce?What does this have to do with MapReduce?

    After all, Lisp is about processing lists

    Two important concepts in functional programmingMap: do something to everything in a listFold: combine results of a list in some way

    MapMap is a higher-order function

    How map works:Function is applied to every element in a listResult is a new list

    f f f f f

    FoldFold is also a higher-order function

    How fold works:Accumulator set to initial valueFunction applied to list element and the accumulatorResult stored in the accumulatorRepeated for every item in the listRepeated for every item in the listResult is the final value in the accumulator

    f f f f f final value

    Initial value

  • 3

    Map/Fold in ActionSimple map example:

    Fold examples:

    (map (lambda (x) (* x x))'(1 2 3 4 5))

    → '(1 4 9 16 25)

    Sum of squares:

    (fold + 0 '(1 2 3 4 5)) → 15(fold * 1 '(1 2 3 4 5)) → 120

    (define (sum‐of‐squares v)(fold + 0 (map (lambda (x) (* x x)) v)))

    (sum‐of‐squares '(1 2 3 4 5)) → 55

    Lisp → MapReduceLet’s assume a long list of records: imagine if...

    We can parallelize map operationsWe have a mechanism for bringing map results back together in the fold operation

    That’s MapReduce!

    Observations:Observations:No limit to map parallelization since maps are indepedentWe can reorder folding if the fold function is commutative and associative

    Typical ProblemIterate over a large number of records

    Extract something of interest from each

    Shuffle and sort intermediate results

    Aggregate intermediate results

    G t fi l t tGenerate final output

    Key idea: provide an abstraction at the point of these two operations

    MapReduceProgrammers specify two functions:map (k, v) → *reduce (k’, v’) → *

    All v’ with the same k’ are reduced together

    Usually, programmers also specify:partition (k’, number of partitions ) → partition for k’

    Often a simple hash of the key, e.g. hash(k’) mod nAllows reduce operations for different keys in parallel

    Implementations:Google has a proprietary implementation in C++Hadoop is an open source implementation in Java

    mapmap map map

    k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6

    ba 1 2 c c3 6 a c5 2 b c7 9

    Shuffle and Sort: aggregate values by keys

    reduce reduce reduce

    a 1 5 b 2 7 c 2 3 6 9

    r1 s1 r2 s2 r3 s3

    Recall these problems?How do we assign work units to workers?

    What if we have more work units than workers?

    What if workers need to share partial results?

    How do we aggregate partial results?

    H d k ll th k h fi i h d?How do we know all the workers have finished?

    What if workers die?

  • 4

    MapReduce RuntimeHandles scheduling

    Assigns workers to map and reduce tasks

    Handles “data distribution”Moves the process to the data

    Handles synchronizationGathers sorts and shuffles intermediate dataGathers, sorts, and shuffles intermediate data

    Handles faultsDetects worker failures and restarts

    Everything happens on top of a distributed FS (later)

    “Hello World”: Word Count

    Map(String input_key, String input_value):// input_key: document name// input_value: document contentsfor each word w in input_values:

    EmitIntermediate(w, "1");

    Reduce(String key, Iterator intermediate values):Reduce(String key, Iterator intermediate_values):// key: a word, same for input and output// intermediate_values: a list of countsint result = 0;for each v in intermediate_values:

    result += ParseInt(v);Emit(AsString(result));

    split 0worker

    Master

    UserProgram

    (1) fork (1) fork (1) fork

    (2) assign map(2) assign reduce

    split 0split 1split 2split 3split 4

    worker

    worker

    worker

    worker

    outputfile 0

    outputfile 1

    (3) read(4) local write

    (5) remote read(6) write

    Inputfiles

    Mapphase

    Intermediate files(on local disk)

    Reducephase

    Outputfiles

    Redrawn from Dean and Ghemawat (OSDI 2004)

    Bandwidth OptimizationIssue: large number of key-value pairs

    Solution: use “Combiner” functionsExecuted on same machine as mapperResults in a “mini-reduce” right after the map phaseReduces key-value pairs to save bandwidth

    Skew ProblemIssue: reduce is only as fast as the slowest map

    Solution: redundantly execute map operations, use results of first to finish

    Addresses hardware problems... But not issues related to inherent distribution of data

    How do we get data to the workers?

    NAS

    Compute Nodes

    SAN

    What’s the problem here?

  • 5

    Distributed File SystemDon’t move data to workers… Move workers to the data!

    Store data on the local disks for nodes in the clusterStart up the workers on the node that has the data local

    Why?Not enough RAM to hold all the data in memoryDisk access is slow disk throughput is goodDisk access is slow, disk throughput is good

    A distributed file system is the answerGFS (Google File System)HDFS for Hadoop

    GFS: AssumptionsCommodity hardware over “exotic” hardware

    High component failure ratesInexpensive commodity components fail all the time

    “Modest” number of HUGE files

    Files are write-once, mostly appended to, y ppPerhaps concurrently

    Large streaming reads over random access

    High sustained throughput over low latency

    GFS slides adapted from material by Dean et al.

    GFS: Design DecisionsFiles stored as chunks

    Fixed size (64MB)

    Reliability through replicationEach chunk replicated across 3+ chunkservers

    Single master to coordinate access, keep metadataSimple centralized management

    No data cachingLittle benefit due to large data sets, streaming reads

    Simplify the APIPush some of the issues onto the client

    Application

    GSF Client

    GFS masterFile namespace

    /foo/barchunk 2ef0

    (file name, chunk index)

    (chunk handle, chunk location)

    Redrawn from Ghemawat et al. (SOSP 2003)

    GFS chunkserver

    Linux file system

    GFS chunkserver

    Linux file system

    Instructions to chunkserver

    Chunkserver state(chunk handle, byte range)

    chunk data

    Single MasterWe know this is a:

    Single point of failureScalability bottleneck

    GFS solutions:Shadow mastersMinimize master involvementMinimize master involvement

    • Never move data through it, use only for metadata (and cache metadata at clients)

    • Large chunk size• Master delegates authority to primary replicas in data mutations

    (chunk leases)

    Simple, and good enough!

    Master’s Responsibilities (1/2)Metadata storage

    Namespace management/locking

    Periodic communication with chunkserversGive instructions, collect state, track cluster health

    Chunk creation, re-replication, rebalancing, p , gBalance space utilization and access speedSpread replicas across racks to reduce correlated failuresRe-replicate data if redundancy falls below thresholdRebalance data to smooth out storage and request load

  • 6

    Master’s Responsibilities (2/2)Garbage Collection

    Simpler, more reliable than traditional file deleteMaster logs the deletion, renames the file to a hidden nameLazily garbage collects hidden files

    Stale replica deletionDetect “stale” replicas using chunk version numbersDetect stale replicas using chunk version numbers

    MetadataGlobal metadata is stored on the master

    File and chunk namespacesMapping from files to chunksLocations of each chunk’s replicas

    All in memory (64 bytes / chunk)FastFastEasily accessible

    Master has an operation log for persistent logging of critical metadata updates

    Persistent on local diskReplicatedCheckpoints for faster recovery

    MutationsMutation = write or append

    Must be done for all replicas

    Goal: minimize master involvement

    Lease mechanism:Master picks one replica as primary; gives it a “lease” for mutationsPrimary defines a serial order of mutationsAll replicas follow this orderData flow decoupled from control flow

    Parallelization ProblemsHow do we assign work units to workers?

    What if we have more work units than workers?

    What if workers need to share partial results?

    How do we aggregate partial results?

    H d k ll th k h fi i h d?How do we know all the workers have finished?

    What if workers die?

    How is MapReduce different?

    Questions?Questions? MapReduce “killer app” #1:Text RetrievalText Retrieval

  • 7

    Text Retrieval: TopicsIntroduction to information retrieval (IR)

    Boolean retrieval

    Ranked retrieval

    Text retrieval with MapReduce

    The Information Retrieval Cycle

    SourceSelection

    Search

    Query

    Results

    QueryFormulation

    Resource

    Selection

    Examination

    Documents

    Delivery

    Information

    source reselection

    System discoveryVocabulary discoveryConcept discoveryDocument discovery

    The Central Problem in Search

    SearcherAuthor

    Concepts Concepts

    Query Terms Document Terms

    Do these represent the same concepts?

    “tragic love story” “fateful star-crossed romance”

    Architecture of IR Systems

    DocumentsQuery

    RepresentationFunction

    RepresentationFunction

    offlineonline

    Hits

    Query Representation Document Representation

    ComparisonFunction Index

    How do we represent text?Remember: computers don’t “understand” anything!

    “Bag of words”Treat all the words in a document as index terms for that documentAssign a “weight” to each term based on “importance”Disregard order, structure, meaning, etc. of the wordsSimple yet effective!Simple, yet effective!

    AssumptionsTerm occurrence is independentDocument relevance is independent“Words” are well-defined

    What’s a word?

    天主教教宗若望保祿二世因感冒再度住進醫院。這是他今年第二度因同樣的病因住院。 الناطق باسم -وقال مارك ريجيف

    إن شارون قبل -الخارجية اإلسرائيلية الدعوة وسيقوم للمرة األولى بزيارةتونس، التي آانت لفترة طويلة المقر

    1982الرسمي لمنظمة التحرير الفلسطينية بعد خروجها من لبنان عام .

    Выступая в Мещанском суде Москвы экс-глава ЮКОСа заявил не совершал ничего противозаконного в чемзаявил не совершал ничего противозаконного, в чем обвиняет его генпрокуратура России.

    भारत सरकार ने आिथर्क सवेर्क्षण में िवत्तीय वषर् 2005-06 में सात फ़ीसदीिवकास दर हािसल करने का आकलन िकया है और कर सुधार पर ज़ोर िदया है

    日米連合で台頭中国に対処…アーミテージ前副長官提言

    조재영기자= 서울시는 25일이명박시장이 `행정중심복합도시'' 건설안에대해 `군대라도동원해막고싶은심정''이라고말했다는일부언론의보도를부인했다.

  • 8

    Sample DocumentMcDonald's slims down spudsFast-food chain to reduce certain types of fat in its french fries with new cooking oil.NEW YORK (CNN/Money) - McDonald's Corp. is cutting the amount of "bad" fat in its french fries nearly in half, the fast-food chain said Tuesday as it moves to make all its fried menu items healthier.But does that mean the popular shoestring fries won't taste the same? The company says no. "It's a win-win for our customers because they are getting the same great french fry taste along with

    16 × said

    14 × McDonalds

    12 × fat

    11 × fries

    “Bag of Words”

    getting the same great french-fry taste along with an even healthier nutrition profile," said Mike Roberts, president of McDonald's USA.

    But others are not so sure. McDonald's will not specifically discuss the kind of oil it plans to use, but at least one nutrition expert says playing with the formula could mean a different taste.Shares of Oak Brook, Ill.-based McDonald's (MCD: down $0.54 to $23.22, Research, Estimates) were lower Tuesday afternoon. It was unclear Tuesday whether competitors Burger King and Wendy's International (WEN: down $0.80 to $34.91, Research, Estimates) would follow suit. Neither company could immediately be reached for comment.

    11 × fries

    8 × new

    6 × company, french, nutrition

    5 × food, oil, percent, reduce, taste, Tuesday

    Boolean RetrievalUsers express queries as a Boolean expression

    AND, OR, NOTCan be arbitrarily nested

    Retrieval is based on the notion of setsAny given query divides the collection into two sets: retrieved, not-retrievedretrieved, not retrievedPure Boolean systems do not define an ordering of the results

    Representing Documents

    The quick brown fox jumped over the lazy dog’s back.

    Document 1

    for

    brown

    d

    backall

    come

    aid 001101

    110010

    Term Docu

    men

    t 1

    Doc

    umen

    t 2

    StopwordList

    Document 2

    Now is the time for all good men to come to the aid of their party.

    the

    is

    to

    of

    quick

    fox

    over

    lazy

    dog

    now

    time

    good

    men

    jump

    their

    party

    110110010100

    001001101011

    Inverted Index

    brown

    dog

    backall

    come

    aid 001100

    010010

    Term

    Doc

    1D

    oc 2

    001101

    110010

    Doc

    3D

    oc 4

    000101

    010010

    Doc

    5D

    oc 6

    001100

    100010

    Doc

    7D

    oc 8

    brown

    dog

    backall

    come

    aid 4 82 4 61 3 71 3 5 72 4 6 83 5

    Term Postings

    quick

    fox

    over

    lazy

    dog

    now

    time

    good

    men

    jump

    their

    party

    000010010110

    001001100001

    110110010100

    001001000001

    110010010010

    001000101001

    010010010010

    001001111000

    quick

    fox

    over

    lazy

    dog

    now

    time

    good

    men

    jump

    their

    party

    3 53 5 72 4 6 831 3 5 7

    1 3 5 7 8

    2 4 82 6 8

    1 5 72 4 6

    1 36 8

    Boolean RetrievalTo execute a Boolean query:

    Build query syntax tree

    For each clause, look up postings

    ( fox or dog ) and quick

    fox dog

    ORquick

    AND

    dog 3 5

    Traverse postings and apply Boolean operator

    Efficiency analysisPostings traversal is linear (assuming sorted postings)Start with shortest posting first

    fox 3 5 7

    foxdog 3 5

    3 5 7OR = union 3 5 7

    ExtensionsImplementing proximity operators

    Store word offset in postings

    Handling term variationsStem words: love, loving, loves … → lov

  • 9

    Strengths and WeaknessesStrengths

    Precise, if you know the right strategiesPrecise, if you have an idea of what you’re looking forImplementations are fast and efficient

    WeaknessesUsers must learn Boolean logicUsers must learn Boolean logicBoolean logic insufficient to capture the richness of languageNo control over size of result set: either too many hits or noneWhen do you stop reading? All documents in the result set are considered “equally good”What about partial matches? Documents that “don’t quite match” the query may be useful also

    Ranked RetrievalOrder documents by how likely they are to be relevant to the information need

    Estimate relevance(q, di)Sort documents by relevanceDisplay sorted results

    User modelPresent hits one screen at a time, best results firstAt any point, users can decide to stop looking

    How do we estimate relevance?Assume document is relevant if it has a lot of query termsReplace relevance (q, di) with sim(q, di)Compute similarity of vector representations

    Vector Space Model

    t

    d2

    d1

    d3

    t3

    θφ

    Assumption: Documents that are “close together” in vector space “talk about” the same things

    t1

    d4

    d5t2

    Therefore, retrieve documents based on how close the document is to the query (i.e., similarity ~ “closeness”)

    Similarity MetricHow about |d1 – d2|?

    Instead of Euclidean distance, use “angle” between the vectors

    It all boils down to the inner product (dot product) of vectors

    rr

    kj

    kj

    dd

    ddrr

    rr⋅

    =)cos(θ

    ∑∑∑

    ==

    ==⋅

    =n

    i kin

    i ji

    n

    i kiji

    kj

    kjkj

    ww

    ww

    dd

    ddddsim

    12,1

    2,

    1 ,,),( rr

    rr

    Term WeightingTerm weights consist of two components

    Local: how important is the term in this document?Global: how important is the term in the collection?

    Here’s the intuition:Terms that appear often in a document should get high weightsTerms that appear in many documents should get low weightsTerms that appear in many documents should get low weights

    How do we capture this mathematically?Term frequency (local)Inverse document frequency (global)

    TF.IDF Term Weighting

    ijiji n

    Nw logtf ,, ⋅=

    jiw , weight assigned to term i in document j

    ji,tf

    N

    in

    number of occurrence of term i in document j

    number of documents in entire collection

    number of documents with term i

  • 10

    TF.IDF Example

    4

    5

    1

    5

    3

    4

    1 2 3

    2

    3

    4

    0.301

    0.125

    0.125

    tfidf

    complicated

    contaminated

    fallout

    1,4

    1,5

    2,1

    3,5

    3,3

    3,4

    0.301

    0.125

    0.125

    complicated

    contaminated

    fallout

    4,2

    4,3

    6

    3

    3

    1

    6

    3

    7

    1

    2

    2

    4 0.125

    0.602

    0.301

    0.000

    0.602

    information

    interesting

    nuclear

    retrieval

    siberia

    1,6

    1,3

    2,1

    2,6

    1,2

    0.125

    0.602

    0.301

    0.000

    0.602

    information

    interesting

    nuclear

    retrieval

    siberia

    2,3 3,3 4,2

    3,7

    3,1 4,4

    Sketch: Scoring AlgorithmInitialize accumulators to hold document scores

    For each query term t in the user’s queryFetch t’s postingsFor each document, scoredoc += wt,d × wt,q

    Apply length normalization to the scores at end

    Return top N documents

    MapReduce it?The indexing problem

    Must be relatively fast, but need not be real timeFor Web, incremental updates are important

    The retrieval problemMust have sub-second responseFor Web only need relatively few resultsFor Web, only need relatively few results

    Indexing: Performance AnalysisInverted indexing is fundamental to all IR models

    Fundamentally, a large sorting problemTerms usually fit in memoryPostings usually don’t

    How is it done on a single machine?

    How large is the inverted index?Size of vocabularySize of postings

    Vocabulary Size: Heaps’ Law

    βKnV =V is vocabulary sizen is corpus size (number of documents)K and β are constants

    Typically, K is between 10 and 100, β is between 0.4 and 0.6

    When adding new documents, the system is likely to have seen most terms already… but the postings keep growing

    George Kingsley Zipf (1902-1950) observed the following relation between frequency and rank

    I th d

    Postings Size: Zipf’s Law

    crf =⋅ orrcf = f = frequencyr = rank

    c = constant

    In other words:A few elements occur very frequentlyMany elements occur very infrequently

    Zipfian distributions:English wordsLibrary book checkout patternsWebsite popularity (almost anything on the Web)

  • 11

    Word Frequency in English

    the 1130021 from 96900 or 54958of 547311 he 94585 about 53713to 516635 million 93515 market 52110a 464736 year 90104 they 51359in 390819 its 86774 this 50933and 387703 be 85588 would 50828that 204351 was 83398 you 49281for 199340 company 83070 which 48273is 152483 an 76974 bank 47940said 148302 has 74405 stock 47401it 134323 are 74097 trade 47310on 121173 have 73132 his 47116by 118863 but 71887 more 46244as 109135 will 71494 who 42142at 101779 say 66807 one 41635mr 101679 new 64456 their 40910with 101210 share 63925

    Frequency of 50 most common words in English (sample of 19 million words)

    Does it fit Zipf’s Law?

    the 59 from 92 or 101of 58 he 95 about 102to 82 million 98 market 101a 98 year 100 they 103in 103 its 100 this 105and 122 be 104 would 107that 75 was 105 you 106for 84 company 109 which 107is 72 an 105 bank 109said 78 has 106 stock 110it 78 are 109 trade 112on 77 have 112 his 114by 81 but 114 more 114as 80 will 117 who 106at 80 say 113 one 107mr 86 new 112 their 108with 91 share 114

    The following shows rf×1000r is the rank of word w in the samplef is the frequency of word w in the sample

    MapReduce: Index ConstructionMap over all documents

    Emit term as key, (docid, tf) as valueEmit other information as necessary (e.g., term position)

    ReduceTrivial: each value represents a posting!Might want to sort the postings (e g by docid or tf)Might want to sort the postings (e.g., by docid or tf)

    MapReduce does all the heavy lifting!

    Query ExecutionMapReduce is meant for large-data batch processing

    Not suitable for lots of real time operations requiring low latency

    The solution: “the secret sauce”Most likely involves document partitioningLots of system engineering: e.g., caching, load balancing, etc.

    MapReduce: Query ExecutionHigh-throughput batch query execution:

    Instead of sequentially accumulating scores per query term:Have mappers traverse postings in parallel, emitting partial score componentsReducers serve as the accumulators, summing contributions for each query term

    MapReduce does all the heavy liftingReplace random access with sequential readsAmortize over lots of queriesExamine multiple postings in parallel

    Questions?Questions?

  • 12

    MapReduce “killer app” #2:Graph AlgorithmsGraph Algorithms

    Graph Algorithms: TopicsIntroduction to graph algorithms and graph representations

    Single Source Shortest Path (SSSP) problemRefresher: Dijkstra’s algorithmBreadth-First Search with MapReduce

    P R kPageRank

    What’s a graph?G = (V,E), where

    V represents the set of vertices (nodes)E represents the set of edges (links)Both vertices and edges may contain additional information

    Different types of graphs:Directed vs undirected edgesDirected vs. undirected edgesPresence or absence of cycles

    Graphs are everywhere:Hyperlink structure of the WebPhysical structure of computers on the InternetInterstate highway systemSocial networks

    Some Graph ProblemsFinding shortest paths

    Routing Internet traffic and UPS trucks

    Finding minimum spanning treesTelco laying down fiber

    Finding Max FlowAirline scheduling

    Identify “special” nodes and communitiesBreaking up terrorist cells, spread of avian flu

    Bipartite matchingMonster.com, Match.com

    And of course... PageRank

    Graphs and MapReduceGraph algorithms typically involve:

    Performing computation at each nodeProcessing node-specific data, edge-specific data, and link structureTraversing the graph in some manner

    Key questions:How do you represent graph data in MapReduce?How do you traverse a graph in MapReduce?

    Representing GraphsG = (V, E)

    A poor representation for computational purposes

    Two common representationsAdjacency matrixAdjacency list

  • 13

    Adjacency MatricesRepresent a graph as an n x n square matrix M

    n = |V|Mij = 1 means a link from node i to j

    1 2 3 42

    1 0 1 0 12 1 0 1 13 1 0 0 04 1 0 1 0

    1

    3

    4

    Adjacency Matrices: CritiqueAdvantages:

    Naturally encapsulates iteration over nodesRows and columns correspond to inlinks and outlinks

    Disadvantages:Lots of zeros for sparse matricesLots of wasted spaceLots of wasted space

    Adjacency ListsTake adjacency matrices… and throw away all the zeros

    1 2 3 41 0 1 0 1 1: 2, 41 0 1 0 12 1 0 1 13 1 0 0 04 1 0 1 0

    2: 1, 3, 43: 14: 1, 3

    Adjacency Lists: CritiqueAdvantages:

    Much more compact representationEasy to compute over outlinksGraph structure can be broken up and distributed

    Disadvantages:Much more difficult to compute over inlinksMuch more difficult to compute over inlinks

    Single Source Shortest PathProblem: find shortest path from a source node to one or more target nodes

    First, a refresher: Dijkstra’s Algorithm

    Dijkstra’s Algorithm Example

    ∞ ∞

    10

    1

    0

    ∞ ∞

    5

    2 3

    2

    9

    7

    4 6

    Example from CLR

  • 14

    Dijkstra’s Algorithm Example

    10 ∞

    10

    1

    0

    5 ∞

    5

    2 3

    2

    9

    7

    4 6

    Example from CLR

    Dijkstra’s Algorithm Example

    8 14

    10

    1

    0

    5 7

    5

    2 3

    2

    9

    7

    4 6

    Example from CLR

    Dijkstra’s Algorithm Example

    8 13

    10

    1

    0

    5 7

    5

    2 3

    2

    9

    7

    4 6

    Example from CLR

    Dijkstra’s Algorithm Example

    8 9

    10

    1

    0

    5 7

    5

    2 3

    2

    9

    7

    4 6

    Example from CLR

    Dijkstra’s Algorithm Example

    8 9

    10

    1

    0

    5 7

    5

    2 3

    2

    9

    7

    4 6

    Example from CLR

    Single Source Shortest PathProblem: find shortest path from a source node to one or more target nodes

    Single processor machine: Dijkstra’s Algorithm

    MapReduce: parallel Breadth-First Search (BFS)

  • 15

    Finding the Shortest PathFirst, consider equal edge weights

    Solution to the problem can be defined inductively

    Here’s the intuition:DistanceTo(startNode) = 0For all nodes n directly reachable from startNode, DistanceTo(n) = 1For all nodes n reachable from some other set of nodes S, DistanceTo(n) = 1 + min(DistanceTo(m), m ∈ S)

    From Intuition to AlgorithmA map task receives

    Key: node nValue: D (distance from start), points-to (list of nodes reachable from n)

    ∀p ∈ points-to: emit (p, D+1)

    The reduce task gathers possible distances to a given pThe reduce task gathers possible distances to a given pand selects the minimum one

    Multiple Iterations NeededThis MapReduce task advances the “known frontier” by one hop

    Subsequent iterations include more reachable nodes as frontier advancesMultiple iterations are needed to explore entire graphFeed output back into the same MapReduce task

    Preserving graph structure:Problem: Where did the points-to list go?Solution: Mapper emits (n, points-to) as well

    Visualizing Parallel BFS

    1

    2 2

    23

    3

    33

    4

    4

    TerminationDoes the algorithm ever terminate?

    Eventually, all nodes will be discovered, all edges will be considered (in a connected graph)

    When do we stop?

    Weighted EdgesNow add positive weights to the edges

    Simple change: points-to list in map task includes a weight w for each pointed-to node

    emit (p, D+wp) instead of (p, D+1) for each node p

    Does this ever terminate?Yes! Eventually, no better distances will be found. When distance is the same, we stopMapper should emit (n, D) to ensure that “current distance” is carried into the reducer

  • 16

    Comparison to DijkstraDijkstra’s algorithm is more efficient

    At any step it only pursues edges from the minimum-cost path inside the frontier

    MapReduce explores all paths in parallelDivide and conquerThrow more hardware at the problemThrow more hardware at the problem

    General ApproachMapReduce is adapt at manipulating graphs

    Store graphs as adjacency lists

    Graph algorithms with for MapReduce:Each map task receives a node and its outlinksMap task compute some function of the link structure, emits value with target as the keywith target as the keyReduce task collects keys (target nodes) and aggregates

    Iterate multiple MapReduce cycles until some termination condition

    Remember to “pass” graph structure from one iteration to next

    Random Walks Over the WebModel:

    User starts at a random Web pageUser randomly clicks on links, surfing from page to page

    What’s the amount of time that will be spent on any given page?

    Thi i P R kThis is PageRank

    Given page x with in-bound links t1…tn, whereC(t) is the out-degree of tα is probability of random jumpN is the total number of nodes in the graph

    PageRank: Defined

    ∑−+⎟⎠⎞

    ⎜⎝⎛=

    ni

    tCtPR

    NxPR

    )()()1(1)( αα

    =⎠⎝ i itCN 1 )(

    X

    t1

    t2

    tn…

    Computing PageRankProperties of PageRank

    Can be computed iterativelyEffects at each iteration is local

    Sketch of algorithm:Start with seed PRi valuesEach page distributes PR “credit” to all pages it links toEach page distributes PRi credit to all pages it links toEach target page adds up “credit” from multiple in-bound links to compute PRi+1Iterate until values converge

    PageRank in MapReduce

    Map: distribute PageRank “credit” to link targets

    ...

    Reduce: gather up PageRank “credit” from multiple sources to compute new PageRank value

    Iterate untilconvergence

  • 17

    PageRank: IssuesIs PageRank guaranteed to converge? How quickly?

    What is the “correct” value of α, and how sensitive is the algorithm to it?

    What about dangling links?

    How do you know when to stop?y p Questions?Questions?