Programming clusters with DryadLINQ
description
Transcript of Programming clusters with DryadLINQ
Programming clusters with DryadLINQ
Mihai Budiu Microsoft Research, Silicon Valley
HPC & GPU Supercomputing Group of Silicon Valley Dec 5, 2011
2
Design Space
Throughput(batch)
Latency(interactive)
Internet
Datacenter
Data-parallel
Sharedmemory
XSearch
HPC
Grid
Transaction
3
Software Stack
Windows Server
Cluster (Windows HPC Cluster)
Storage (Distributed Storage Catalog)
Execution (Dryad)
Language (DryadLINQ)
Windows Server
Windows Server
Windows Server
Applications
Execution
Application
Data-Parallel Computation
4
Storage
Language
ParallelDatabases
Map-Reduce
GFSBigTable DSC
Dryad
DryadLINQScope
Sawzall,FlumeJava
Hadoop
HDFS
Pig, HiveSQL ≈SQL LINQ, SQLSawzall, Java
5
Dryad• Execution layer of Bing analytics• Continuously deployed since 2006• Running on >> 104 machines• Sifting through > 10Pb data daily• Runs on clusters > 3000 machines• Handles jobs with > 105 processes each• Platform for rich software ecosystem
The Dryad by Evelyn De Morgan.
6
Dryad = Execution Layer
Job (application)
Dryad
Cluster
Pipeline
Shell
Machine≈
7
2-D Piping• Unix Pipes: 1-D
grep | sed | sort | awk | perl
• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50
8
Virtualized 2-D Pipelines
9
Virtualized 2-D Pipelines
10
Virtualized 2-D Pipelines
11
Virtualized 2-D Pipelines
12
Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized
13
Dryad Job Structure
grep
sed
sortawk
perlgrep
grepsed
sort
sort
awk
Inputfiles
Vertices (processes)
Outputfiles
ChannelsStage
14
Dryad System Architecture
Files, TCP, FIFO, Networkjob schedule
data plane
control plane
NS,Sched RE RERE
V V V
Graph manager cluster
Fault Tolerance
16
LINQ
Dryad
=> DryadLINQ
17
LINQ = .Net+ Queries
Collection<T> collection;bool IsLegal(Key);string Hash(Key);
var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
18
Collections
class Collection<T> : IEnumerable<T>;
Elements of type TIterator(current element)
19
DryadLINQ Data Model
Partition
Collection
.Net objects
20
Collection<T> collection;bool IsLegal(Key k);string Hash(Key);
var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
DryadLINQ = LINQ + Dryad
C#
collection
results
C# C# C#
Vertexcode
Queryplan(Dryad job)Data
21
Demo
Example: counting lines
var table = PartitionedTable.Get<LineRecord>(file);int count = table.Count();
Parse, Count
Sum
Example: counting words
var table = PartitionedTable.Get<LineRecord>(file);int count = table
.SelectMany(l => l.line.Split(‘ ‘))
.Count();
Parse, SelectMany, Count
Sum
Example: counting unique wordsvar table = PartitionedTable.Get<LineRecord>(file);int count = table
.SelectMany(l => l.line.Split(‘ ‘))
.GroupBy(w => w)
.Count();
GroupBy;Count
HashPartition
Example: word histogramvar table = PartitionedTable.Get<LineRecord>(file);var result = table.SelectMany(l => l.line.Split(' ')) .GroupBy(w => w) .Select(g => new { word = g.Key, count = g.Count() });
GroupBy;Count
GroupByCountHashPartition
Example: high-frequency wordsvar table = PartitionedTable.Get<LineRecord>(file);var result = table.SelectMany(l => l.line.Split(' '))
.GroupBy(w => w)
.Select(g => new { word = g.Key, count = g.Count() })
.OrderByDescending(t => t.count)
.Take(100);
Sort; Take Mergesort;
Take
Example: words by frequencyvar table = PartitionedTable.Get<LineRecord>(file);var result = table.SelectMany(l => l.line.Split(' '))
.GroupBy(w => w)
.Select(g => new { word = g.Key, count = g.Count() })
.OrderByDescending(t => t.count);
Sample
Histogram
Broadcast
Range-partition
Sort
Example: Map-Reducepublic static IQueryable<S> MapReduce<T,M,K,S>( IQueryable<T> input,
Func<T, IEnumerable<M>> mapper,Func<M,K> keySelector,Func<IGrouping<K,M>,S> reducer)
{ var map = input.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.Select(reducer); return result;}
Expectation Maximization
29
• 160 lines • 3 iterations shown
30
Probabilistic Index MapsImages
features
31
Language Summary
WhereSelectGroupByOrderByAggregateJoin
32
Using PLINQQuery
LINQ to HPC
PLINQ
Local query
33
So, What Good is it For?
34
Application: Kinect Training
35
Input device
36
Depth computation
Source: http://nuit-blanche.blogspot.com/2010/11/unsing-kinect-for-compressive-sensing.html
37
Vision Input: Depth Map + RGB
Why is it hard?
Background segmentationPlayer separation
Body Part Classifier
Skeletal Tracking Pipeline
39
Depth mapSensor
Body Part Identification
Skeleton
40
The Classifier
InputDepth map
OutputBody parts
Classifier
Runs on GPU @ 320x240
41
Motion Capture
42
Ground Truth Data
3D avatar model
43
Learn from Data
Classifier
Training examplesMachine learning
Kinect Training
Dryad
DryadLINQ
Machine learning
Decisionforest
Millions of examples needed
Cluster
45
Highly efficient parallellization
time
mac
hine
Conclusions
46
ClusterLINQDryad
46
=