Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Cluster Computing with Dryad

Mihai Budiu, MSR-SVCLiveLabs, March 2008

2

Goal

3

The Dryad Project

http://research.microsoft.com/research/sv/dryad

Dryad: Distributed Data-Parallel Programs from Sequential Building BlocksMichael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly

European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007

http://research.microsoft.com/research/sv/dryad/eurosys07.pdf

4

• Dryad Design• Implementation• Policies as Plug-ins• Building on Dryad

Outline

5

Design Space

ThroughputLatency

Internet

Privatedata

center

Data-parallel

Sharedmemory

DryadSearch

HPC

Grid

Transaction

6

Data Partitioning

RAM

DATA

DATA

7

2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50

8

Dryad = Execution Layer

Job (Application)

Dryad

Cluster

Pipeline

Shell

Machine≈

9


Outline

10

Virtualized 2-D Pipelines

11


12


13


14

Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized

15

Dryad Job Structure

grep

sed

sortawk

perlgrep

grepsed

sort

sort

awk

Inputfiles

Vertices (processes)

Outputfiles

ChannelsStage

grep1000 | sed500 | sort1000 | awk500 | perl50

16

Channels

X

M

Items

Finite Streams of items

• distributed filesystem files (persistent)• SMB/NTFS files (temporary)• TCP pipes (inter-machine)• memory FIFOs (intra-machine)

17

Architecture

Files, TCP, FIFO, Networkjob schedule

data plane

control plane

NS PD PDPD

V V V

Job manager cluster

JM code

vertex code

Staging1. Build

2. Send .exe

3. Start JM

5. Generate graph

7. Serializevertices

8. MonitorVertex execution

4. Querycluster resources

Cluster services6. Initialize vertices

Fault Tolerance

20

• Dryad Design• Implementation• Policies and Resource Management• Building on Dryad

Outline

21

Policy Managers

R R

X X X X

Stage RR R

Stage X

Job Manager

R managerX ManagerR-X

Manager

Connection R-X

X[0] X[1] X[3] X[2] X’[2]

Completed vertices Slow vertex

Duplicatevertex

Duplicate Execution Manager

Duplication Policy = f(running times, data volumes)

23

S S S S

A A A

S S

T

S S S S S S

T

# 1 # 2 # 1 # 3 # 3 # 2

# 3# 2# 1

static

dynamic

rack #

Aggregation Manager

24

Data Distribution(Group By)

Dest

Source

Dest

Source

Dest

Source m

n

m x n

TT[0-?) [?-100)

Range-Distribution Manager

S

D D D

S S

S S S

Tstatic

dynamic25

Hist

[0-30),[30-100)

[30-100)[0-30)

[0-100)

26

Goal: Declarative Programming

X

T

S

X X

S S

T T T

X

static dynamic

27


Outline

28

Software Stack

Windows Server

Cluster Services

Distributed Filesystem

Dryad

Distributed Shell

PSQL

DryadLINQ

PerlSQL

server

C++

Windows Server

Windows Server

Windows Server

C++

CIFS/NTFS

legacycode

sed, awk, grep, etc.

SSISQueries

C#

Vectors

Machine Learning

C#

Job

queu

eing

, mon

itorin

g

29

SkyServer Query 18

D D

MM 4n

SS 4n

YY

H

n

n

X Xn

U UN N

L L

select distinct P.ObjIDinto results from photoPrimary U, neighbors N, photoPrimary Lwhere U.ObjID = N.ObjID and L.ObjID = N.NeighborObjID and P.ObjID < L.ObjID and abs((U.u-U.g)-(L.u-L.g))<0.05 and abs((U.g-U.r)-(L.g-L.r))<0.05 and abs((U.r-U.i)-(L.r-L.i))<0.05 and abs((U.i-U.z)-(L.i-L.z))<0.05

30

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

0 2 4 6 8 10

Number of Computers

Speed-up (times)

Dryad In-Memory

Dryad Two-pass

SQLServer 2005

SkyServer Q18 Performance

31

DryadLINQ

• Declarative programming • Integration with Visual Studio• Integration with .Net• Type safety• Automatic serialization• Job graph optimizations static dynamic

• Conciseness

32

LINQ

Collection<T> collection;bool IsLegal(Key);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

33

Collection<T> collection;bool IsLegal(Key k);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

DryadLINQ = LINQ + Dryad

C#

collection

results

C# C# C#

Vertexcode

Queryplan(Dryad job)Data

34

Sort & Map-Reduce in DryadLINQ

S

D D D

S S

SortSort

Sampl

[0-30),[30-100)

[30-100)[0-30)

[0-100)

35

PLINQ

public static IEnumerable<TSource> DryadSort<TSource, TKey>(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer, bool isDescending){

return source.AsParallel().OrderBy(keySelector, comparer);}

36

Machine Learning in DryadLINQ

Dryad

DryadLINQ

Large Vector

Machine learningData analysis

37

Very Large Vector LibraryPartitionedVector<T>

T

Scalar<T>

T T

T

38

Operations on Large Vectors: Map 1

U

T

T Uf

f

f preserves partitioning

39

V

Map 2 (Pairwise)

T Uf

V

U

T

f

40

Map 3 (Vector-Scalar)T U

fV

V

40

U

T

f

Reduce (Fold)

41

U UU

U

f

f f f

fU U U

U

42

Linear Algebra

T U Vnmm ,,=, ,

T

43

Linear Regression

• Data

• Find

• S.t.

mt

nt yx ,

mnA

tt yAx

},...,1{ nt

44

Analytic Solution

X×XT X×XT X×XT Y×XT Y×XT Y×XT

Σ

X[0] X[1] X[2] Y[0] Y[1] Y[2]

Σ

[ ]-1

*

A

1))(( Ttt t

Ttt t xxxyA

Map

Reduce

45

Linear Regression Code

Vectors x = input(0), y = input(1);Matrices xx = x.PairwiseOuterProduct(x);OneMatrix xxs = xx.Sum();Matrices yx = y.PairwiseOuterProduct(x);OneMatrix yxs = yx.Sum();OneMatrix xxinv = xxs.Map(a => a.Inverse());OneMatrix A = yxs.Map(

xxinv, (a, b) => a.Multiply(b));

1))(( Ttt t

Ttt t xxxyA

Expectation Maximization (Gaussians)

46

• 160 lines • 3 iterations shown

Conclusions• Dryad = distributed execution environment• Application-independent (semantics oblivious)• Supports rich software ecosystem

– Relational algebra– Map-reduce– LINQ– Etc.

• DryadLINQ = A Dryad provider for LINQ• This is only the beginning!

47

START

48

Backup Slides

49

• Many similarities• Exe + app. model• Map+sort+reduce• Few policies• Program=map+reduce• Simple• Mature (> 4 years)• Widely deployed• Hadoop

Dryad Map-Reduce

• Execution layer• Job = arbitrary DAG• Plug-in policies• Program=graph gen.• Complex ( features)• New (< 2 years)• Still growing• Internal

50

Small Cluster Support

Sort Sort

Merge

Sort

MergeMerge

Sort

Merge

Grouping vertices

Sort

Merge

Fast channels

D D

MM 4n

SS 4n

YY

H

n

n

X Xn

U UN N

U U

SkyServer DB query

• Took SQL plan• Manually coded in Dryad• Manually partitioned data

u: objid, colorn: objid, neighborobjid[partition by objid]

select u.color,n.neighborobjidfrom u join nwhere u.objid = n.objid

(u.color,n.neighborobjid)[re-partition by n.neighborobjid][order by n.neighborobjid]

[distinct][merge outputs]

select u.objidfrom u join <temp>where u.objid = <temp>.neighborobjid and |u.color - <temp>.color| < d

Optimization

D

M

S

Y

X

M

S

M

S

M

S

U N

U

D D

MM 4n

SS 4n

YY

H

n

n

X Xn

U UN N

U U

Query histogram computation

• Input: log file (n partitions)• Extract queries from log partitions• Re-partition by hash of query (k buckets)• Compute histogram within each bucket

Naïve histogram topology

Q Q

R

Q

R k

k

k

n

n

is:Each

R

is:

Each

MS

C

P

C

S

C

S

D

P parse linesD hash distributeS quicksortC count

occurrencesMS merge sort

Efficient histogram topologyP parse linesD hash distributeS quicksortC count

occurrencesMS merge sortM non-deterministic

merge

Q' is:Each

R

is:

Each

MS

C

M

P

C

S

Q'

RR k

T

k

n

T

is:

Each

MS

D

C

Final histogram refinement

Q' Q'

RR 450

TT 217

450

10,405

99,713

33.4 GB

118 GB

154 GB

10.2 TB

1,800 computers43,171 vertices11,072 processes11.5 minutes

Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Documents

Transcript of Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.