Cluster Computing with DryadLINQ
description
Transcript of Cluster Computing with DryadLINQ
![Page 1: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/1.jpg)
Cluster Computing with DryadLINQ
Mihai BudiuMicrosoft Research Silicon Valley
IEEE Cloud Computing ConferenceJuly 19, 2008
![Page 2: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/2.jpg)
2
Goal
![Page 3: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/3.jpg)
3
Design Space
ThroughputLatency
Internet
Privatedata
center
Data-parallel
Sharedmemory
DryadSearch
HPC
Grid
Transaction
![Page 4: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/4.jpg)
4
Data Partitioning
RAM
DATA
DATA
![Page 5: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/5.jpg)
5
Data-Parallel Computation
Storage
Execution
Application
Parallel Databases
Map-Reduce
GFSBigTable
CosmosNTFS
Dryad
DryadLINQScope,PSQL
Sawzall, Pig
![Page 6: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/6.jpg)
6
• Introduction• Dryad • DryadLINQ• Applications
Outline
![Page 7: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/7.jpg)
7
2-D Piping• Unix Pipes: 1-D
grep | sed | sort | awk | perl
• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50
![Page 8: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/8.jpg)
8
Virtualized 2-D Pipelines
![Page 9: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/9.jpg)
9
Virtualized 2-D Pipelines
![Page 10: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/10.jpg)
10
Virtualized 2-D Pipelines
![Page 11: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/11.jpg)
11
Virtualized 2-D Pipelines
![Page 12: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/12.jpg)
12
Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized
![Page 13: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/13.jpg)
13
Architecture
Files, TCP, FIFO, Networkjob schedule
data plane
control plane
NS PD PDPD
V V V
Job manager cluster
![Page 14: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/14.jpg)
Fault Tolerance
![Page 15: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/15.jpg)
15
• Introduction• Dryad • DryadLINQ• Applications
Outline
![Page 16: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/16.jpg)
16
DryadLINQ
Dryad
![Page 17: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/17.jpg)
17
LINQ = C# + Queries
Collection<T> collection;bool IsLegal(Key);string Hash(Key);
var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
![Page 18: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/18.jpg)
18
Collection<T> collection;bool IsLegal(Key k);string Hash(Key);
var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
DryadLINQ = LINQ + Dryad
C#
collection
results
C# C# C#
Vertexcode
Queryplan(Dryad job)Data
![Page 19: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/19.jpg)
19
Data Model
Partition
Collection
C# objects
![Page 20: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/20.jpg)
20
Language Summary
WhereSelectGroupByOrderByAggregateJoinApplyMaterialize
![Page 21: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/21.jpg)
21
Demo
Done
![Page 22: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/22.jpg)
22
Example: Histogrampublic static IQueryable<Pair> Histogram( IQueryable<LineRecord> input, int k){ var words = input.SelectMany(x => x.line.Split(' ')); var groups = words.GroupBy(x => x); var counts = groups.Select(x => new Pair(x.Key, x.Count())); var ordered = counts.OrderByDescending(x => x.count); var top = ordered.Take(k); return top;}
“A line of words of wisdom”
[“A”, “line”, “of”, “words”, “of”, “wisdom”]
[[“A”], [“line”], [“of”, “of”], [“words”], [“wisdom”]]
[ {“A”, 1}, {“line”, 1}, {“of”, 2}, {“words”, 1}, {“wisdom”, 1}]
[{“of”, 2}, {“A”, 1}, {“line”, 1}, {“words”, 1}, {“wisdom”, 1}]
[{“of”, 2}, {“A”, 1}, {“line”, 1}]
![Page 23: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/23.jpg)
23
Histogram Plan
SelectManyHashDistribute
MergeGroupBy
Select
OrderByDescendingTake
MergeSortTake
![Page 24: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/24.jpg)
24
Map-Reduce in DryadLINQ
public static IQueryable<S> MapReduce<T,M,K,S>( this IQueryable<T> input, Expression<Func<T, IEnumerable<M>>> mapper, Expression<Func<M,K>> keySelector, Expression<Func<IGrouping<K,M>,S>> reducer) { var map = input.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.Select(reducer); return result;}
![Page 25: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/25.jpg)
25
Map-Reduce Plan
M
R
G
M
Q
G1
R
D
MS
G2
R
static dynamic
X
X
M
Q
G1
R
D
MS
G2
R
X
M
Q
G1
R
D
MS
G2
R
X
M
Q
G1
R
D
M
Q
G1
R
D
MS
G2
R
X
M
Q
G1
R
D
MS
G2
R
X
M
Q
G1
R
D
MS
G2
R
MS
G2
R
map
sort
groupby
reduce
distribute
mergesort
groupby
reduce
mergesort
groupby
reduce
consumer
map
parti
al a
ggre
gatio
nre
ducedynamic
![Page 26: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/26.jpg)
26
• Introduction• Dryad • DryadLINQ• Applications
Outline
![Page 27: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/27.jpg)
27
Applications
Dryad
DryadLINQ
Distributed Data Structures
Machine learningGraphsLog
analysisImage
processingCombinatorialoptimization
Raytracing
Dataanalysis
![Page 28: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/28.jpg)
28
E.g: Linear Algebra
T U Vnmm ,,=, ,
T
![Page 29: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/29.jpg)
Expectation Maximization (Gaussians)
29
• 160 lines • 3 iterations shown
![Page 30: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/30.jpg)
Conclusions
• Dryad = distributed execution environment• Supports rich software ecosystem
• DryadLINQ = Compiles LINQ to Dryad• C# objects and declarative programming• .Net and Visual Studio
for distributed programming
30
![Page 31: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/31.jpg)
31
Dryad Job Structure
grep
sed
sortawk
perlgrep
grepsed
sort
sort
awk
Inputfiles
Vertices (processes)
Outputfiles
ChannelsStage
![Page 32: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/32.jpg)
32
Linear Regression
• Data
• Find
• S.t.
mt
nt yx ,
mnA
tt yAx
},...,1{ nt
![Page 33: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/33.jpg)
33
Analytic Solution
X×XT X×XT X×XT Y×XT Y×XT Y×XT
Σ
X[0] X[1] X[2] Y[0] Y[1] Y[2]
Σ
[ ]-1
*
A
1))(( Ttt t
Ttt t xxxyA
Map
Reduce
![Page 34: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/34.jpg)
34
Linear Regression Code
Vectors x = input(0), y = input(1);Matrices xx = x.Map(x, (a,b) => a.OuterProd(b));OneMatrix xxs = xx.Sum();Matrices yx = y.Map(x, (a,b) => a.OuterProd(b));OneMatrix yxs = yx.Sum();OneMatrix xxinv = xxs.Map(a => a.Inverse());OneMatrix A = yxs.Map(xxinv, (a, b) => a.Mult(b));
1))(( Ttt t
Ttt t xxxyA
![Page 35: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/35.jpg)
35
Dryad = Execution Layer
Job (application)
Dryad
Cluster
Pipeline
Shell
Machine≈
![Page 36: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/36.jpg)
36
• Many similarities• Exe + app. model• Map+sort+reduce• Few policies• Program=map+reduce• Simple• Mature (> 4 years)• Widely deployed• Hadoop
Dryad Map-Reduce
• Execution layer• Job = arbitrary DAG• Plug-in policies• Program=graph gen.• Complex ( features)• New (< 2 years)• Still growing• Internal
![Page 37: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/37.jpg)
37
PLINQ
public static IEnumerable<TSource> DryadSort<TSource, TKey>(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer, bool isDescending){
return source.AsParallel().OrderBy(keySelector, comparer);}
![Page 38: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/38.jpg)
38
Operations on Large Vectors: Map 1
U
T
T Uf
f
f preserves partitioning
![Page 39: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/39.jpg)
39
V
Map 2 (Pairwise)
T Uf
V
U
T
f
![Page 40: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/40.jpg)
40
Map 3 (Vector-Scalar)T U
fV
V
40
U
T
f
![Page 41: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/41.jpg)
Reduce (Fold)
41
U UU
U
f
f f f
fU U U
U
![Page 42: Cluster Computing with DryadLINQ](https://reader035.fdocuments.net/reader035/viewer/2022062521/568156e0550346895dc486d0/html5/thumbnails/42.jpg)
42
Software Stack
Windows Server
Cluster Services
Distributed Filesystem (Cosmos)
Dryad
Distributed Shell (Nebula)
PSQL
DryadLINQ
PerlSQL
server
C++
Windows Server
Windows Server
Windows Server
C++
CIFS/NTFS
legacycode
sed, awk, grep, etc.
SSISScope
C# Data structures
Applications
C#
Job
queu
eing
, mon
itorin
g