Fundamentals of MapReduce - UMIACSjimmylin/cloud-computing/2009-03-short-cou… · 1 Fundamentals...

1

Fundamentals of MapReduce

Jimmy LinThe iSchoolUniversity of Maryland

Monday, March 30, 2009

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details

Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)

This afternoon...Introduction to MapReduce

MapReduce “killer app” #1: text retrieval

MapReduce “killer app” #2: graph algorithms

Introduction to MapReducep

Introduction to MapReduce: TopicsFunctional programming

MapReduce

Distributed file system

Functional ProgrammingMapReduce = functional programming meets distributed processing on steroids

Not a new idea… dates back to the 50’s (or even 30’s)

What is functional programming?Computation as application of functionsTheoretical foundation provided by lambda calculusTheoretical foundation provided by lambda calculus

How is it different?Traditional notions of “data” and “instructions” are not applicableData flows are implicit in programDifferent orders of execution are possible

Exemplified by LISP and ML

Overview of LispLisp ≠ Lost In Silly Parentheses

We’ll focus on particular a dialect: “Scheme”

Lists are primitive data types'(1 2 3 4 5)

'((a 1) (b 2) (c 3))

Functions written in prefix notation

(+ 1 2) → 3(* 3 4) → 12(sqrt (+ (* 3 3) (* 4 4))) → 5(define x 3) → x(* x 5) → 15

(( ) ( ) ( ))

2

FunctionsFunctions = lambda expressions bound to variables

Syntactic sugar for defining functions

(define foo(lambda (x y)(sqrt (+ (* x x) (* y y)))))

Above expressions is equivalent to:

Once defined, function can be applied:

(define (foo x y)(sqrt (+ (* x x) (* y y))))

(foo 3 4) → 5

Other FeaturesIn Scheme, everything is an s-expression

No distinction between “data” and “code”Easy to write self-modifying code

Higher-order functionsFunctions that take other functions as arguments

(define (bar f x) (f (f x)))

(define (baz x) (* x x))

(bar baz 2) → 16

Doesn’t matter what f is, just apply it twice.

Recursion is your friendSimple factorial example

(define (factorial n) (if (= n 1)

1(* n (factorial (‐ n 1)))))

(factorial 6) → 720

Even iteration is written with recursive calls!(define (factorial‐iter n)(define (aux n top product)(if (= n top)

(* n product)(aux (+ n 1) top (* n product))))

(aux 1 n 1))

(factorial‐iter 6) → 720

Lisp → MapReduce?What does this have to do with MapReduce?

After all, Lisp is about processing lists

Two important concepts in functional programmingMap: do something to everything in a listFold: combine results of a list in some way

MapMap is a higher-order function

How map works:Function is applied to every element in a listResult is a new list

f f f f f

FoldFold is also a higher-order function

How fold works:Accumulator set to initial valueFunction applied to list element and the accumulatorResult stored in the accumulatorRepeated for every item in the listRepeated for every item in the listResult is the final value in the accumulator

f f f f f final value

Initial value

3

Map/Fold in ActionSimple map example:

Fold examples:

(map (lambda (x) (* x x))'(1 2 3 4 5))

→ '(1 4 9 16 25)

Sum of squares:

(fold + 0 '(1 2 3 4 5)) → 15(fold * 1 '(1 2 3 4 5)) → 120

(define (sum‐of‐squares v)(fold + 0 (map (lambda (x) (* x x)) v)))

(sum‐of‐squares '(1 2 3 4 5)) → 55

Lisp → MapReduceLet’s assume a long list of records: imagine if...

We can parallelize map operationsWe have a mechanism for bringing map results back together in the fold operation

That’s MapReduce!

Observations:Observations:No limit to map parallelization since maps are indepedentWe can reorder folding if the fold function is commutative and associative

Typical ProblemIterate over a large number of records

Extract something of interest from each

Shuffle and sort intermediate results

Aggregate intermediate results

G t fi l t tGenerate final output

Key idea: provide an abstraction at the point of these two operations

MapReduceProgrammers specify two functions:map (k, v) → *reduce (k’, v’) → *

All v’ with the same k’ are reduced together

Usually, programmers also specify:partition (k’, number of partitions ) → partition for k’

Often a simple hash of the key, e.g. hash(k’) mod nAllows reduce operations for different keys in parallel

Implementations:Google has a proprietary implementation in C++Hadoop is an open source implementation in Java

mapmap map map

k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6

ba 1 2 c c3 6 a c5 2 b c7 9

Shuffle and Sort: aggregate values by keys

reduce reduce reduce

a 1 5 b 2 7 c 2 3 6 9

r1 s1 r2 s2 r3 s3

Recall these problems?How do we assign work units to workers?

What if we have more work units than workers?

What if workers need to share partial results?

How do we aggregate partial results?

H d k ll th k h fi i h d?How do we know all the workers have finished?

What if workers die?

4

MapReduce RuntimeHandles scheduling

Assigns workers to map and reduce tasks

Handles “data distribution”Moves the process to the data

Handles synchronizationGathers sorts and shuffles intermediate dataGathers, sorts, and shuffles intermediate data

Handles faultsDetects worker failures and restarts

Everything happens on top of a distributed FS (later)

“Hello World”: Word Count

Map(String input_key, String input_value):// input_key: document name// input_value: document contentsfor each word w in input_values:

EmitIntermediate(w, "1");

Reduce(String key, Iterator intermediate values):Reduce(String key, Iterator intermediate_values):// key: a word, same for input and output// intermediate_values: a list of countsint result = 0;for each v in intermediate_values:

result += ParseInt(v);Emit(AsString(result));

split 0worker

Master

UserProgram

(1) fork (1) fork (1) fork

(2) assign map(2) assign reduce

split 0split 1split 2split 3split 4

worker

worker

worker

worker

outputfile 0

outputfile 1

(3) read(4) local write

(5) remote read(6) write

Inputfiles

Mapphase

Intermediate files(on local disk)

Reducephase

Outputfiles

Redrawn from Dean and Ghemawat (OSDI 2004)

Bandwidth OptimizationIssue: large number of key-value pairs

Solution: use “Combiner” functionsExecuted on same machine as mapperResults in a “mini-reduce” right after the map phaseReduces key-value pairs to save bandwidth

Skew ProblemIssue: reduce is only as fast as the slowest map

Solution: redundantly execute map operations, use results of first to finish

Addresses hardware problems... But not issues related to inherent distribution of data

How do we get data to the workers?

NAS

Compute Nodes

SAN

What’s the problem here?

5

Distributed File SystemDon’t move data to workers… Move workers to the data!

Store data on the local disks for nodes in the clusterStart up the workers on the node that has the data local

Why?Not enough RAM to hold all the data in memoryDisk access is slow disk throughput is goodDisk access is slow, disk throughput is good

A distributed file system is the answerGFS (Google File System)HDFS for Hadoop

GFS: AssumptionsCommodity hardware over “exotic” hardware

High component failure ratesInexpensive commodity components fail all the time

“Modest” number of HUGE files

Files are write-once, mostly appended to, y ppPerhaps concurrently

Large streaming reads over random access

High sustained throughput over low latency

GFS slides adapted from material by Dean et al.

GFS: Design DecisionsFiles stored as chunks

Fixed size (64MB)

Reliability through replicationEach chunk replicated across 3+ chunkservers

Single master to coordinate access, keep metadataSimple centralized management

No data cachingLittle benefit due to large data sets, streaming reads

Simplify the APIPush some of the issues onto the client

Application

GSF Client

GFS masterFile namespace

/foo/barchunk 2ef0

(file name, chunk index)

(chunk handle, chunk location)

Redrawn from Ghemawat et al. (SOSP 2003)

GFS chunkserver

Linux file system

…

GFS chunkserver

Linux file system

…

Instructions to chunkserver

Chunkserver state(chunk handle, byte range)

chunk data

Single MasterWe know this is a:

Single point of failureScalability bottleneck

GFS solutions:Shadow mastersMinimize master involvementMinimize master involvement

• Never move data through it, use only for metadata (and cache metadata at clients)

• Large chunk size• Master delegates authority to primary replicas in data mutations

(chunk leases)

Simple, and good enough!

Master’s Responsibilities (1/2)Metadata storage

Namespace management/locking

Periodic communication with chunkserversGive instructions, collect state, track cluster health

Chunk creation, re-replication, rebalancing, p , gBalance space utilization and access speedSpread replicas across racks to reduce correlated failuresRe-replicate data if redundancy falls below thresholdRebalance data to smooth out storage and request load

6

Master’s Responsibilities (2/2)Garbage Collection

Simpler, more reliable than traditional file deleteMaster logs the deletion, renames the file to a hidden nameLazily garbage collects hidden files

Stale replica deletionDetect “stale” replicas using chunk version numbersDetect stale replicas using chunk version numbers

MetadataGlobal metadata is stored on the master

File and chunk namespacesMapping from files to chunksLocations of each chunk’s replicas

All in memory (64 bytes / chunk)FastFastEasily accessible

Master has an operation log for persistent logging of critical metadata updates

Persistent on local diskReplicatedCheckpoints for faster recovery

MutationsMutation = write or append

Must be done for all replicas

Goal: minimize master involvement

Lease mechanism:Master picks one replica as primary; gives it a “lease” for mutationsPrimary defines a serial order of mutationsAll replicas follow this orderData flow decoupled from control flow

Parallelization ProblemsHow do we assign work units to workers?

What if we have more work units than workers?

What if workers need to share partial results?

How do we aggregate partial results?

H d k ll th k h fi i h d?How do we know all the workers have finished?

What if workers die?

How is MapReduce different?

Questions?Questions? MapReduce “killer app” #1:Text RetrievalText Retrieval

7

Text Retrieval: TopicsIntroduction to information retrieval (IR)

Boolean retrieval

Ranked retrieval

Text retrieval with MapReduce

The Information Retrieval Cycle

SourceSelection

Search

Query

Results

QueryFormulation

Resource

Selection

Examination

Documents

Delivery

Information

source reselection

System discoveryVocabulary discoveryConcept discoveryDocument discovery

The Central Problem in Search

SearcherAuthor

Concepts Concepts

Query Terms Document Terms

Do these represent the same concepts?

“tragic love story” “fateful star-crossed romance”

Architecture of IR Systems

DocumentsQuery

RepresentationFunction

RepresentationFunction

offlineonline

Hits

Query Representation Document Representation

ComparisonFunction Index

How do we represent text?Remember: computers don’t “understand” anything!

“Bag of words”Treat all the words in a document as index terms for that documentAssign a “weight” to each term based on “importance”Disregard order, structure, meaning, etc. of the wordsSimple yet effective!Simple, yet effective!

AssumptionsTerm occurrence is independentDocument relevance is independent“Words” are well-defined

What’s a word?

天主教教宗若望保祿二世因感冒再度住進醫院。這是他今年第二度因同樣的病因住院。 الناطق باسم -وقال مارك ريجيف

إن شارون قبل -الخارجية اإلسرائيلية الدعوة وسيقوم للمرة األولى بزيارةتونس، التي آانت لفترة طويلة المقر

1982الرسمي لمنظمة التحرير الفلسطينية بعد خروجها من لبنان عام .

Выступая в Мещанском суде Москвы экс-глава ЮКОСа заявил не совершал ничего противозаконного в чемзаявил не совершал ничего противозаконного, в чем обвиняет его генпрокуратура России.

भारत सरकार ने आिथर्क सवेर्क्षण में िवत्तीय वषर् 2005-06 में सात फ़ीसदीिवकास दर हािसल करने का आकलन िकया है और कर सुधार पर ज़ोर िदया है

日米連合で台頭中国に対処…アーミテージ前副長官提言

조재영기자= 서울시는 25일이명박시장이 `행정중심복합도시'' 건설안에대해 `군대라도동원해막고싶은심정''이라고말했다는일부언론의보도를부인했다.

8

Sample DocumentMcDonald's slims down spudsFast-food chain to reduce certain types of fat in its french fries with new cooking oil.NEW YORK (CNN/Money) - McDonald's Corp. is cutting the amount of "bad" fat in its french fries nearly in half, the fast-food chain said Tuesday as it moves to make all its fried menu items healthier.But does that mean the popular shoestring fries won't taste the same? The company says no. "It's a win-win for our customers because they are getting the same great french fry taste along with

16 × said

14 × McDonalds

12 × fat

11 × fries

“Bag of Words”

getting the same great french-fry taste along with an even healthier nutrition profile," said Mike Roberts, president of McDonald's USA.

But others are not so sure. McDonald's will not specifically discuss the kind of oil it plans to use, but at least one nutrition expert says playing with the formula could mean a different taste.Shares of Oak Brook, Ill.-based McDonald's (MCD: down $0.54 to $23.22, Research, Estimates) were lower Tuesday afternoon. It was unclear Tuesday whether competitors Burger King and Wendy's International (WEN: down $0.80 to $34.91, Research, Estimates) would follow suit. Neither company could immediately be reached for comment.

…

11 × fries

8 × new

6 × company, french, nutrition

5 × food, oil, percent, reduce, taste, Tuesday

…

Boolean RetrievalUsers express queries as a Boolean expression

AND, OR, NOTCan be arbitrarily nested

Retrieval is based on the notion of setsAny given query divides the collection into two sets: retrieved, not-retrievedretrieved, not retrievedPure Boolean systems do not define an ordering of the results

Representing Documents

The quick brown fox jumped over the lazy dog’s back.

Document 1

for

brown

d

backall

come

aid 001101

110010

Term Docu

men

t 1

Doc

umen

t 2

StopwordList

Document 2

Now is the time for all good men to come to the aid of their party.

the

is

to

of

quick

fox

over

lazy

dog

now

time

good

men

jump

their

party

110110010100

001001101011

Inverted Index

brown

dog

backall

come

aid 001100

010010

Term

Doc

1D

oc 2

001101

110010

Doc

3D

oc 4

000101

010010

Doc

5D

oc 6

001100

100010

Doc

7D

oc 8

brown

dog

backall

come

aid 4 82 4 61 3 71 3 5 72 4 6 83 5

Term Postings

quick

fox

over

lazy

dog

now

time

good

men

jump

their

party

000010010110

001001100001

110110010100

001001000001

110010010010

001000101001

010010010010

001001111000

quick

fox

over

lazy

dog

now

time

good

men

jump

their

party

3 53 5 72 4 6 831 3 5 7

1 3 5 7 8

2 4 82 6 8

1 5 72 4 6

1 36 8

Boolean RetrievalTo execute a Boolean query:

Build query syntax tree

For each clause, look up postings

( fox or dog ) and quick

fox dog

ORquick

AND

dog 3 5

Traverse postings and apply Boolean operator

Efficiency analysisPostings traversal is linear (assuming sorted postings)Start with shortest posting first

fox 3 5 7

foxdog 3 5

3 5 7OR = union 3 5 7

ExtensionsImplementing proximity operators

Store word offset in postings

Handling term variationsStem words: love, loving, loves … → lov

9

Strengths and WeaknessesStrengths

Precise, if you know the right strategiesPrecise, if you have an idea of what you’re looking forImplementations are fast and efficient

WeaknessesUsers must learn Boolean logicUsers must learn Boolean logicBoolean logic insufficient to capture the richness of languageNo control over size of result set: either too many hits or noneWhen do you stop reading? All documents in the result set are considered “equally good”What about partial matches? Documents that “don’t quite match” the query may be useful also

Ranked RetrievalOrder documents by how likely they are to be relevant to the information need

Estimate relevance(q, di)Sort documents by relevanceDisplay sorted results

User modelPresent hits one screen at a time, best results firstAt any point, users can decide to stop looking

How do we estimate relevance?Assume document is relevant if it has a lot of query termsReplace relevance (q, di) with sim(q, di)Compute similarity of vector representations

Vector Space Model

t

d2

d1

d3

t3

θφ

Assumption: Documents that are “close together” in vector space “talk about” the same things

t1

d4

d5t2

Therefore, retrieve documents based on how close the document is to the query (i.e., similarity ~ “closeness”)

Similarity MetricHow about |d1 – d2|?

Instead of Euclidean distance, use “angle” between the vectors

It all boils down to the inner product (dot product) of vectors

rr

kj

kj

dd

ddrr

rr⋅

=)cos(θ

∑∑∑

==

==⋅

=n

i kin

i ji

n

i kiji

kj

kjkj

ww

ww

dd

ddddsim

12,1

2,

1 ,,),( rr

rr

Term WeightingTerm weights consist of two components

Local: how important is the term in this document?Global: how important is the term in the collection?

Here’s the intuition:Terms that appear often in a document should get high weightsTerms that appear in many documents should get low weightsTerms that appear in many documents should get low weights

How do we capture this mathematically?Term frequency (local)Inverse document frequency (global)

TF.IDF Term Weighting

ijiji n

Nw logtf ,, ⋅=

jiw , weight assigned to term i in document j

ji,tf

N

in

number of occurrence of term i in document j

number of documents in entire collection

number of documents with term i

10

TF.IDF Example

4

5

1

5

3

4

1 2 3

2

3

4

0.301

0.125

0.125

tfidf

complicated

contaminated

fallout

1,4

1,5

2,1

3,5

3,3

3,4

0.301

0.125

0.125

complicated

contaminated

fallout

4,2

4,3

6

3

3

1

6

3

7

1

2

2

4 0.125

0.602

0.301

0.000

0.602

information

interesting

nuclear

retrieval

siberia

1,6

1,3

2,1

2,6

1,2

0.125

0.602

0.301

0.000

0.602

information

interesting

nuclear

retrieval

siberia

2,3 3,3 4,2

3,7

3,1 4,4

Sketch: Scoring AlgorithmInitialize accumulators to hold document scores

For each query term t in the user’s queryFetch t’s postingsFor each document, scoredoc += wt,d × wt,q

Apply length normalization to the scores at end

Return top N documents

MapReduce it?The indexing problem

Must be relatively fast, but need not be real timeFor Web, incremental updates are important

The retrieval problemMust have sub-second responseFor Web only need relatively few resultsFor Web, only need relatively few results

Indexing: Performance AnalysisInverted indexing is fundamental to all IR models

Fundamentally, a large sorting problemTerms usually fit in memoryPostings usually don’t

How is it done on a single machine?

How large is the inverted index?Size of vocabularySize of postings

Vocabulary Size: Heaps’ Law

βKnV =V is vocabulary sizen is corpus size (number of documents)K and β are constants

Typically, K is between 10 and 100, β is between 0.4 and 0.6

When adding new documents, the system is likely to have seen most terms already… but the postings keep growing

George Kingsley Zipf (1902-1950) observed the following relation between frequency and rank

I th d

Postings Size: Zipf’s Law

crf =⋅ orrcf = f = frequencyr = rank

c = constant

In other words:A few elements occur very frequentlyMany elements occur very infrequently

Zipfian distributions:English wordsLibrary book checkout patternsWebsite popularity (almost anything on the Web)

11

Word Frequency in English

the 1130021 from 96900 or 54958of 547311 he 94585 about 53713to 516635 million 93515 market 52110a 464736 year 90104 they 51359in 390819 its 86774 this 50933and 387703 be 85588 would 50828that 204351 was 83398 you 49281for 199340 company 83070 which 48273is 152483 an 76974 bank 47940said 148302 has 74405 stock 47401it 134323 are 74097 trade 47310on 121173 have 73132 his 47116by 118863 but 71887 more 46244as 109135 will 71494 who 42142at 101779 say 66807 one 41635mr 101679 new 64456 their 40910with 101210 share 63925

Frequency of 50 most common words in English (sample of 19 million words)

Does it fit Zipf’s Law?

the 59 from 92 or 101of 58 he 95 about 102to 82 million 98 market 101a 98 year 100 they 103in 103 its 100 this 105and 122 be 104 would 107that 75 was 105 you 106for 84 company 109 which 107is 72 an 105 bank 109said 78 has 106 stock 110it 78 are 109 trade 112on 77 have 112 his 114by 81 but 114 more 114as 80 will 117 who 106at 80 say 113 one 107mr 86 new 112 their 108with 91 share 114

The following shows rf×1000r is the rank of word w in the samplef is the frequency of word w in the sample

MapReduce: Index ConstructionMap over all documents

Emit term as key, (docid, tf) as valueEmit other information as necessary (e.g., term position)

ReduceTrivial: each value represents a posting!Might want to sort the postings (e g by docid or tf)Might want to sort the postings (e.g., by docid or tf)

MapReduce does all the heavy lifting!

Query ExecutionMapReduce is meant for large-data batch processing

Not suitable for lots of real time operations requiring low latency

The solution: “the secret sauce”Most likely involves document partitioningLots of system engineering: e.g., caching, load balancing, etc.

MapReduce: Query ExecutionHigh-throughput batch query execution:

Instead of sequentially accumulating scores per query term:Have mappers traverse postings in parallel, emitting partial score componentsReducers serve as the accumulators, summing contributions for each query term

MapReduce does all the heavy liftingReplace random access with sequential readsAmortize over lots of queriesExamine multiple postings in parallel

Questions?Questions?

12

MapReduce “killer app” #2:Graph AlgorithmsGraph Algorithms

Graph Algorithms: TopicsIntroduction to graph algorithms and graph representations

Single Source Shortest Path (SSSP) problemRefresher: Dijkstra’s algorithmBreadth-First Search with MapReduce

P R kPageRank

What’s a graph?G = (V,E), where

V represents the set of vertices (nodes)E represents the set of edges (links)Both vertices and edges may contain additional information

Different types of graphs:Directed vs undirected edgesDirected vs. undirected edgesPresence or absence of cycles

Graphs are everywhere:Hyperlink structure of the WebPhysical structure of computers on the InternetInterstate highway systemSocial networks

Some Graph ProblemsFinding shortest paths

Routing Internet traffic and UPS trucks

Finding minimum spanning treesTelco laying down fiber

Finding Max FlowAirline scheduling

Identify “special” nodes and communitiesBreaking up terrorist cells, spread of avian flu

Bipartite matchingMonster.com, Match.com

And of course... PageRank

Graphs and MapReduceGraph algorithms typically involve:

Performing computation at each nodeProcessing node-specific data, edge-specific data, and link structureTraversing the graph in some manner

Key questions:How do you represent graph data in MapReduce?How do you traverse a graph in MapReduce?

Representing GraphsG = (V, E)

A poor representation for computational purposes

Two common representationsAdjacency matrixAdjacency list

13

Adjacency MatricesRepresent a graph as an n x n square matrix M

n = |V|Mij = 1 means a link from node i to j

1 2 3 42

1 0 1 0 12 1 0 1 13 1 0 0 04 1 0 1 0

1

3

4

Adjacency Matrices: CritiqueAdvantages:

Naturally encapsulates iteration over nodesRows and columns correspond to inlinks and outlinks

Disadvantages:Lots of zeros for sparse matricesLots of wasted spaceLots of wasted space

Adjacency ListsTake adjacency matrices… and throw away all the zeros

1 2 3 41 0 1 0 1 1: 2, 41 0 1 0 12 1 0 1 13 1 0 0 04 1 0 1 0

2: 1, 3, 43: 14: 1, 3

Adjacency Lists: CritiqueAdvantages:

Much more compact representationEasy to compute over outlinksGraph structure can be broken up and distributed

Disadvantages:Much more difficult to compute over inlinksMuch more difficult to compute over inlinks

Single Source Shortest PathProblem: find shortest path from a source node to one or more target nodes

First, a refresher: Dijkstra’s Algorithm

Dijkstra’s Algorithm Example

∞ ∞

10

1

0

∞ ∞

5

2 3

2

9

7

4 6

Example from CLR

14


10 ∞

10

1

0

5 ∞

5

2 3

2

9

7

4 6

Example from CLR


8 14

10

1

0

5 7

5

2 3

2

9

7

4 6

Example from CLR


8 13

10

1

0

5 7

5

2 3

2

9

7

4 6

Example from CLR


8 9

10

1

0

5 7

5

2 3

2

9

7

4 6

Example from CLR


8 9

10

1

0

5 7

5

2 3

2

9

7

4 6

Example from CLR

Single Source Shortest PathProblem: find shortest path from a source node to one or more target nodes

Single processor machine: Dijkstra’s Algorithm

MapReduce: parallel Breadth-First Search (BFS)

15

Finding the Shortest PathFirst, consider equal edge weights

Solution to the problem can be defined inductively

Here’s the intuition:DistanceTo(startNode) = 0For all nodes n directly reachable from startNode, DistanceTo(n) = 1For all nodes n reachable from some other set of nodes S, DistanceTo(n) = 1 + min(DistanceTo(m), m ∈ S)

From Intuition to AlgorithmA map task receives

Key: node nValue: D (distance from start), points-to (list of nodes reachable from n)

∀p ∈ points-to: emit (p, D+1)

The reduce task gathers possible distances to a given pThe reduce task gathers possible distances to a given pand selects the minimum one

Multiple Iterations NeededThis MapReduce task advances the “known frontier” by one hop

Subsequent iterations include more reachable nodes as frontier advancesMultiple iterations are needed to explore entire graphFeed output back into the same MapReduce task

Preserving graph structure:Problem: Where did the points-to list go?Solution: Mapper emits (n, points-to) as well

Visualizing Parallel BFS

1

2 2

23

3

33

4

4

TerminationDoes the algorithm ever terminate?

Eventually, all nodes will be discovered, all edges will be considered (in a connected graph)

When do we stop?

Weighted EdgesNow add positive weights to the edges

Simple change: points-to list in map task includes a weight w for each pointed-to node

emit (p, D+wp) instead of (p, D+1) for each node p

Does this ever terminate?Yes! Eventually, no better distances will be found. When distance is the same, we stopMapper should emit (n, D) to ensure that “current distance” is carried into the reducer

16

Comparison to DijkstraDijkstra’s algorithm is more efficient

At any step it only pursues edges from the minimum-cost path inside the frontier

MapReduce explores all paths in parallelDivide and conquerThrow more hardware at the problemThrow more hardware at the problem

General ApproachMapReduce is adapt at manipulating graphs

Store graphs as adjacency lists

Graph algorithms with for MapReduce:Each map task receives a node and its outlinksMap task compute some function of the link structure, emits value with target as the keywith target as the keyReduce task collects keys (target nodes) and aggregates

Iterate multiple MapReduce cycles until some termination condition

Remember to “pass” graph structure from one iteration to next

Random Walks Over the WebModel:

User starts at a random Web pageUser randomly clicks on links, surfing from page to page

What’s the amount of time that will be spent on any given page?

Thi i P R kThis is PageRank

Given page x with in-bound links t1…tn, whereC(t) is the out-degree of tα is probability of random jumpN is the total number of nodes in the graph

PageRank: Defined

∑−+⎟⎠⎞

⎜⎝⎛=

ni

tCtPR

NxPR

)()()1(1)( αα

=⎠⎝ i itCN 1 )(

X

t1

t2

tn…

Computing PageRankProperties of PageRank

Can be computed iterativelyEffects at each iteration is local

Sketch of algorithm:Start with seed PRi valuesEach page distributes PR “credit” to all pages it links toEach page distributes PRi credit to all pages it links toEach target page adds up “credit” from multiple in-bound links to compute PRi+1Iterate until values converge

PageRank in MapReduce

Map: distribute PageRank “credit” to link targets

...

Reduce: gather up PageRank “credit” from multiple sources to compute new PageRank value

Iterate untilconvergence

17

PageRank: IssuesIs PageRank guaranteed to converge? How quickly?

What is the “correct” value of α, and how sensitive is the algorithm to it?

What about dangling links?

How do you know when to stop?y p Questions?Questions?

Fundamentals of MapReduce - UMIACSjimmylin/cloud-computing/2009-03-short-cou… · 1 Fundamentals...

Documents

Transcript of Fundamentals of MapReduce - UMIACSjimmylin/cloud-computing/2009-03-short-cou… · 1 Fundamentals...