Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.
-
Upload
addison-figgins -
Category
Documents
-
view
216 -
download
0
Transcript of Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.
Cluster Computing with Dryad
Mihai Budiu, MSR-SVCLiveLabs, March 2008
2
Goal
3
The Dryad Project
http://research.microsoft.com/research/sv/dryad
Dryad: Distributed Data-Parallel Programs from Sequential Building BlocksMichael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007
4
• Dryad Design• Implementation• Policies as Plug-ins• Building on Dryad
Outline
5
Design Space
ThroughputLatency
Internet
Privatedata
center
Data-parallel
Sharedmemory
DryadSearch
HPC
Grid
Transaction
6
Data Partitioning
RAM
DATA
DATA
7
2-D Piping• Unix Pipes: 1-D
grep | sed | sort | awk | perl
• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50
8
Dryad = Execution Layer
Job (Application)
Dryad
Cluster
Pipeline
Shell
Machine≈
9
• Dryad Design• Implementation• Policies as Plug-ins• Building on Dryad
Outline
10
Virtualized 2-D Pipelines
11
Virtualized 2-D Pipelines
12
Virtualized 2-D Pipelines
13
Virtualized 2-D Pipelines
14
Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized
15
Dryad Job Structure
grep
sed
sortawk
perlgrep
grepsed
sort
sort
awk
Inputfiles
Vertices (processes)
Outputfiles
ChannelsStage
grep1000 | sed500 | sort1000 | awk500 | perl50
16
Channels
X
M
Items
Finite Streams of items
• distributed filesystem files (persistent)• SMB/NTFS files (temporary)• TCP pipes (inter-machine)• memory FIFOs (intra-machine)
17
Architecture
Files, TCP, FIFO, Networkjob schedule
data plane
control plane
NS PD PDPD
V V V
Job manager cluster
JM code
vertex code
Staging1. Build
2. Send .exe
3. Start JM
5. Generate graph
7. Serializevertices
8. MonitorVertex execution
4. Querycluster resources
Cluster services6. Initialize vertices
Fault Tolerance
20
• Dryad Design• Implementation• Policies and Resource Management• Building on Dryad
Outline
21
Policy Managers
R R
X X X X
Stage RR R
Stage X
Job Manager
R managerX ManagerR-X
Manager
Connection R-X
X[0] X[1] X[3] X[2] X’[2]
Completed vertices Slow vertex
Duplicatevertex
Duplicate Execution Manager
Duplication Policy = f(running times, data volumes)
23
S S S S
A A A
S S
T
S S S S S S
T
# 1 # 2 # 1 # 3 # 3 # 2
# 3# 2# 1
static
dynamic
rack #
Aggregation Manager
24
Data Distribution(Group By)
Dest
Source
Dest
Source
Dest
Source m
n
m x n
TT[0-?) [?-100)
Range-Distribution Manager
S
D D D
S S
S S S
Tstatic
dynamic25
Hist
[0-30),[30-100)
[30-100)[0-30)
[0-100)
26
Goal: Declarative Programming
X
T
S
X X
S S
T T T
X
static dynamic
27
• Dryad Design• Implementation• Policies as Plug-ins• Building on Dryad
Outline
28
Software Stack
Windows Server
Cluster Services
Distributed Filesystem
Dryad
Distributed Shell
PSQL
DryadLINQ
PerlSQL
server
C++
Windows Server
Windows Server
Windows Server
C++
CIFS/NTFS
legacycode
sed, awk, grep, etc.
SSISQueries
C#
Vectors
Machine Learning
C#
Job
queu
eing
, mon
itorin
g
29
SkyServer Query 18
D D
MM 4n
SS 4n
YY
H
n
n
X Xn
U UN N
L L
select distinct P.ObjIDinto results from photoPrimary U, neighbors N, photoPrimary Lwhere U.ObjID = N.ObjID and L.ObjID = N.NeighborObjID and P.ObjID < L.ObjID and abs((U.u-U.g)-(L.u-L.g))<0.05 and abs((U.g-U.r)-(L.g-L.r))<0.05 and abs((U.r-U.i)-(L.r-L.i))<0.05 and abs((U.i-U.z)-(L.i-L.z))<0.05
30
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
0 2 4 6 8 10
Number of Computers
Speed-up (times)
Dryad In-Memory
Dryad Two-pass
SQLServer 2005
SkyServer Q18 Performance
31
DryadLINQ
• Declarative programming • Integration with Visual Studio• Integration with .Net• Type safety• Automatic serialization• Job graph optimizations static dynamic
• Conciseness
32
LINQ
Collection<T> collection;bool IsLegal(Key);string Hash(Key);
var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
33
Collection<T> collection;bool IsLegal(Key k);string Hash(Key);
var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
DryadLINQ = LINQ + Dryad
C#
collection
results
C# C# C#
Vertexcode
Queryplan(Dryad job)Data
34
Sort & Map-Reduce in DryadLINQ
S
D D D
S S
SortSort
Sampl
[0-30),[30-100)
[30-100)[0-30)
[0-100)
35
PLINQ
public static IEnumerable<TSource> DryadSort<TSource, TKey>(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer, bool isDescending){
return source.AsParallel().OrderBy(keySelector, comparer);}
36
Machine Learning in DryadLINQ
Dryad
DryadLINQ
Large Vector
Machine learningData analysis
37
Very Large Vector LibraryPartitionedVector<T>
T
Scalar<T>
T T
T
38
Operations on Large Vectors: Map 1
U
T
T Uf
f
f preserves partitioning
39
V
Map 2 (Pairwise)
T Uf
V
U
T
f
40
Map 3 (Vector-Scalar)T U
fV
V
40
U
T
f
Reduce (Fold)
41
U UU
U
f
f f f
fU U U
U
42
Linear Algebra
T U Vnmm ,,=, ,
T
43
Linear Regression
• Data
• Find
• S.t.
mt
nt yx ,
mnA
tt yAx
},...,1{ nt
44
Analytic Solution
X×XT X×XT X×XT Y×XT Y×XT Y×XT
Σ
X[0] X[1] X[2] Y[0] Y[1] Y[2]
Σ
[ ]-1
*
A
1))(( Ttt t
Ttt t xxxyA
Map
Reduce
45
Linear Regression Code
Vectors x = input(0), y = input(1);Matrices xx = x.PairwiseOuterProduct(x);OneMatrix xxs = xx.Sum();Matrices yx = y.PairwiseOuterProduct(x);OneMatrix yxs = yx.Sum();OneMatrix xxinv = xxs.Map(a => a.Inverse());OneMatrix A = yxs.Map(
xxinv, (a, b) => a.Multiply(b));
1))(( Ttt t
Ttt t xxxyA
Expectation Maximization (Gaussians)
46
• 160 lines • 3 iterations shown
Conclusions• Dryad = distributed execution environment• Application-independent (semantics oblivious)• Supports rich software ecosystem
– Relational algebra– Map-reduce– LINQ– Etc.
• DryadLINQ = A Dryad provider for LINQ• This is only the beginning!
47
START
48
Backup Slides
49
• Many similarities• Exe + app. model• Map+sort+reduce• Few policies• Program=map+reduce• Simple• Mature (> 4 years)• Widely deployed• Hadoop
Dryad Map-Reduce
• Execution layer• Job = arbitrary DAG• Plug-in policies• Program=graph gen.• Complex ( features)• New (< 2 years)• Still growing• Internal
50
Small Cluster Support
Sort Sort
Merge
Sort
MergeMerge
Sort
Merge
Grouping vertices
Sort
Merge
Fast channels
D D
MM 4n
SS 4n
YY
H
n
n
X Xn
U UN N
U U
SkyServer DB query
• Took SQL plan• Manually coded in Dryad• Manually partitioned data
u: objid, colorn: objid, neighborobjid[partition by objid]
select u.color,n.neighborobjidfrom u join nwhere u.objid = n.objid
(u.color,n.neighborobjid)[re-partition by n.neighborobjid][order by n.neighborobjid]
[distinct][merge outputs]
select u.objidfrom u join <temp>where u.objid = <temp>.neighborobjid and |u.color - <temp>.color| < d
Optimization
D
M
S
Y
X
M
S
M
S
M
S
U N
U
D D
MM 4n
SS 4n
YY
H
n
n
X Xn
U UN N
U U
Optimization
D
M
S
Y
X
M
S
M
S
M
S
U N
U
D D
MM 4n
SS 4n
YY
H
n
n
X Xn
U UN N
U U
Query histogram computation
• Input: log file (n partitions)• Extract queries from log partitions• Re-partition by hash of query (k buckets)• Compute histogram within each bucket
Naïve histogram topology
Q Q
R
Q
R k
k
k
n
n
is:Each
R
is:
Each
MS
C
P
C
S
C
S
D
P parse linesD hash distributeS quicksortC count
occurrencesMS merge sort
Efficient histogram topologyP parse linesD hash distributeS quicksortC count
occurrencesMS merge sortM non-deterministic
merge
Q' is:Each
R
is:
Each
MS
C
M
P
C
S
Q'
RR k
T
k
n
T
is:
Each
MS
D
C
Final histogram refinement
Q' Q'
RR 450
TT 217
450
10,405
99,713
33.4 GB
118 GB
154 GB
10.2 TB
1,800 computers43,171 vertices11,072 processes11.5 minutes