Problem(Based(Benchmarks:(and( their(role(in(parallel...
Transcript of Problem(Based(Benchmarks:(and( their(role(in(parallel...
![Page 1: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/1.jpg)
Problem Based Benchmarks: and their role in parallel algorithms
Guy Blelloch Carnegie Mellon University
Alenex 2012 1
Also: Jeremy Fineman, Phil Gibbons (Intel), Julian Shun, Harsha Vardham Simhadri, …
![Page 2: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/2.jpg)
Outline
• The challenge with parallel algorithms • The problem based benchmark suite • How they do on modern mulAprocessors
Alenex 2012 2
![Page 3: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/3.jpg)
16 core processor
Page 3 Alenex 2012
![Page 4: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/4.jpg)
64 core blade servers ($6K) (shared memory)
Page 4
x 4 =
Alenex 2012
![Page 5: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/5.jpg)
1024 “cuda” cores
Alenex 2012 5
![Page 6: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/6.jpg)
6 Alenex 2012
Up to 300K servers
![Page 7: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/7.jpg)
Alenex 2012 7
![Page 8: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/8.jpg)
Different Architectures
• MulAcore (shared memory) • GPUs • Distributed memory • FPGAs
Alenex 2012 8
![Page 9: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/9.jpg)
Different Programming Approaches • transacAons • futures • nested parallelism • map-‐reduce • CUDA/GPU programming • data parallelism • PRAM • bulk synchronizaAon Alenex 2012 9
![Page 10: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/10.jpg)
• threads • message passing • parallel I/O models • parAAoned global address space • coordinaAon languages • concurrent data structures • events • … Alenex 2012 10
Different Programming Approaches
![Page 11: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/11.jpg)
But…. • How well do these work on standard problems?
• How do they compare? • What kind of algorithms work best? • How easy are they to program?
Alenex 2012 11
![Page 12: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/12.jpg)
Outline
• The challenge with parallel algorithms • The problem based benchmark suite • How they do on modern mulAprocessors
Alenex 2012 12
![Page 13: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/13.jpg)
Problem Based Benchmarks
• Define a set of benchmarks in terms of Input/Output behavior on specific inputs, and use them to compare soluAons.
Alenex 2012 13 Input Output
![Page 14: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/14.jpg)
Problem Based Benchmarks • Judge based on:
– Performance and scalability – Ability to reason about performance – Quality of code – Generality over inputs – Pla_orm independence
Some aspects can be judged qualitaAvely, others aspects will be at the eye of the beholder.
Therefore making code public is very important.
Alenex 2012 14
![Page 15: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/15.jpg)
The PBBS effort
Benchmarks with following characterisAcs – Well known and understood – Concisely described – Implementable in under 1000 lines of code – Broad representaAon of domains – Correctness or quality of output easily measured – Independent of machine type
Alenex 2012 15
![Page 16: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/16.jpg)
Many ExisAng Benchmarks
But none we know of match the spec • Code Based : SPEC, Da Capo, PassMark, Splash-‐2, PARSEC, fluidMark
• Applica4on Specific: Linpack, BioBench, BioParallel, MediaBench, SATLIB, CineBench, MineBench, TCP, ALPBench, Graph 500, DIMACS challenges
• Method Based: Lonestar • Machine analysis: HPC challenge, Java Grande, NAS, Green 500, Graph 500, P-‐Ray, fluidMark
Alenex 2012 16
![Page 17: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/17.jpg)
Status
• About 15 benchmarks defined with supporAng code
• SequenAal implementaAons • MulAcore implementaAons • Will make public in February
Alenex 2012 17
![Page 18: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/18.jpg)
Preliminary Benchmarks I
Alenex 2012 18
Sequences * Comparison SorAng
* Removing Duplicates
* DicAonary
Graphs * Breadth First Search
Graph Separators
* Minimum Spanning Tree
* Maximal Independent Set
Geometry/Graphics
* Delaunay TriangulaAon and Refinement
* Convex Hulls
* Ray Triangle IntersecAon (Ray CasAng)
Micropolygon Rendering
![Page 19: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/19.jpg)
Preliminary Benchmarks II
Alenex 2012 19
Machine Learning
* All Nearest Neighbors
Support Vector Machines
K-‐Means
Text Processing
* Suffix Arrays
Edit Distance
String Search
Science * Nbody force calculaAons
PhylogeneAc tree
Numerical * Sparse Matrix Vector MulAply
Sparse Linear Solve
![Page 20: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/20.jpg)
Each Benchmark Consists of:
• A precise specificaAon of the problem • SpecificaAon of Input/Output file formats • A set of input generators. • A weighAng on the inputs • Code for tesAng the results • Baseline sequenAal code • Baseline parallel code(s)
Alenex 2012 20
![Page 21: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/21.jpg)
Example Input
SorAng: – Random floats (uniform) – Random floats (exponenAal bias) – Almost sorted – Strings generated from trigram probability and randomly permuted
– Structures with float key and 3 addiAonal fields
Alenex 2012 21
![Page 22: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/22.jpg)
Outline
• The challenge with parallel algorithms • The problem based benchmark suite • How they do on modern mulAprocessors
– Using 32-‐core Intel Nehalem – What parallel algorithms work
Alenex 2012 22
![Page 23: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/23.jpg)
Algorithmic Models
• PRAM • BSP • Nested Parallelism with Work and Span
– Compose work by summing – Compose span by taking the max
• Parallel Cache Oblivious Model – Count SequenAal Cache misses – Can be used to bound parallel cache misses
Alenex 2012 23
![Page 24: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/24.jpg)
How do the problems do on a modern mulAcore
Alenex 2012 24
Tseq/T3231.621.611.2109
14.5151711.7171815
0"
4"
8"
12"
16"
20"
24"
28"
32"
Sort"
Duplicate"Removal"
Min"Spanning"Tree"
Max"Independ."Set"
Spanning"Forest"
Breadth"First"Search"
Delaunay"Triang."
Triangle"Ray"Inter."
Nearest"Neighbors"
Sparse"M
xV"
Nbody"
Suffix"Array"
T1/T32"Tseq/T32"
![Page 25: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/25.jpg)
Divide and Conquer
• SorAng : Sample sort • Nearest neighbors : building quad-‐oct trees • Triangle-‐ray intersect : k-‐d trees • N-‐body simulaAon: Callahan-‐Kosaraju
Alenex 2012 25
![Page 26: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/26.jpg)
SorAng : Sample Sort
Alenex 2012 26
A1 A2 A3
Am
m
m
Sort(A1) Sort(A2) Sort(A3)
Sort(Am) m
m S
S1 S2 S3
Sm
Divide using P
Si P MERGE
Sample P sort
![Page 27: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/27.jpg)
SorAng : Sample Sort
Alenex 2012 27
Send to buckets
} Finally, sort buckets. } Depth(n) = O(log2(n)) } Work(n) = O(n log n) } Q1(n; M,B) = O((n/B)(log(M/B)(n/B))
![Page 28: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/28.jpg)
Alenex 2012 28
Tseq/T3231.621.611.2109
14.5151711.7171815
0"
4"
8"
12"
16"
20"
24"
28"
32"
Sort"
Duplicate"Removal"
Min"Spanning"Tree"
Max"Independ."Set"
Spanning"Forest"
Breadth"First"Search"
Delaunay"Triang."
Triangle"Ray"Inter."
Nearest"Neighbors"
Sparse"M
xV"
Nbody"
Suffix"Array"
T1/T32"Tseq/T32"
![Page 29: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/29.jpg)
Sort Performance, More Detail weight STL Sort Sanders Sort Quicksort SampleSort SampleSort
Cores 1 32 32 32 1
Uniform .1 15.8 1.06 4.22 .82 20.2
ExponenAal .1 10.8 .79 2.49 .53 13.8
Almost Sorted .1 3.28 1.11 1.76 .27 5.67
Trigram Strings .2 58.2 4.63 8.6 1.05 30.8
Strings Permuted
.2 82.5 7.08 28.4 1.76 49.3
Structure .3 17.6 2.03 6.73 1.18 26.7
Average 36.4 3.24 10.3 .97 28.0
Alenex 2012 29
All inputs are 100,000,000 long. All code written run on Cilk++ (also tested in Cilk+) All experiments on 32 core Nehalem (4 X x7560)
![Page 30: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/30.jpg)
SpeculaAve ExecuAon
Several efficient sequenAal algorithms are greedy loops that insert/process items one at a Ame, but with dependences: § Maximal independent Set (over verAces) § Maximal Matching (over edges) § Spanning Tree (over edges) § Delaunay TriangulaAon (over points) Alenex 2012 30
![Page 31: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/31.jpg)
Maximal Independent Set
SequenAal algorithm: for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out
Alenex 2012 31
X
1 2
3
9
5
4
7
8 6
10
x
x
x x
x
x
![Page 32: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/32.jpg)
Maximal Independent Set
SequenAal algorithm: for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out
Very efficient: most edges not even visited, simple loops About 7x faster than sorAng m edges
Alenex 2012 32
![Page 33: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/33.jpg)
Maximal Independent Set
Same algorithm: with parallel speculaAon for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out
Alenex 2012 33
X
1 2
3
9
5
4
7
8 6
10
x
x
x x
x
x
![Page 34: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/34.jpg)
Maximal Independent Set
same algorithm: with speculaAon on prefix for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out
Alenex 2012 34
X
1 2
3
9
5
4
7
8 6
10
?
?
?
1 2 3 4 5 6 7 8 9 10
![Page 35: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/35.jpg)
Maximal Independent Set
same algorithm: with speculaAon on prefix for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out
Alenex 2012 35
X
1 2
3
9
5
4
7
8 6
10
1 2 3 4 5 6 7 8 9 10
![Page 36: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/36.jpg)
Maximal Independent Set
same algorithm: with speculaAon on prefix for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out
Alenex 2012 36
X
1 2
3
9
5
4
7
8 6
10
x
x
?
? 1 2 3 4 5 6 7 8 9 10
![Page 37: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/37.jpg)
Maximal Independent Set
same algorithm: with speculaAon on prefix for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out
Alenex 2012 37
X
1 2
3
9
5
4
7
8 6
10
x
x
?
? 1 2 3 4 5 6 7 8 9 10
![Page 38: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/38.jpg)
Maximal Independent Set
same algorithm: with speculaAon on prefix for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out
Alenex 2012 38
X
1 2
3
9
5
4
7
8 6
10
x
x
x
x
x
x 1 2 3 4 5 6 7 8 9 10
![Page 39: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/39.jpg)
MIS Parallel Code struct MISStep { bool reserve(int i) { int d = V[i].degree; flag = IN; for (int j = 0; j < d; j++) { int ngh = V[i].Neighbors[j]; if (ngh < i) { if (Fl[ngh] == IN) { flag = OUT; return 1;} else if (Fl[ngh] == LIVE) flag = LIVE; } } return 1; }
bool commit(int i) { return (Fl[i] = flag) != LIVE;}};
void MIS(FlType* Fl, vertex* V, int n, int psize) speculative_for(MISStep(Fl, V), 0, n, psize);}Alenex 2012 39
![Page 40: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/40.jpg)
Maximal Independent Set
Costs: – Span = O(log3 n)
Expected case over all iniAal permutaAons – Work = O(m)
if prefix size = O(n/dmax)
DetermininisAc : – result only depends on iniAal permutaAon of
verAces
Alenex 2012 40
![Page 41: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/41.jpg)
Alenex 2012 41
Tseq/T3231.621.611.2109
14.5151711.7171815
0"
4"
8"
12"
16"
20"
24"
28"
32"
Sort"
Duplicate"Removal"
Min"Spanning"Tree"
Max"Independ."Set"
Spanning"Forest"
Breadth"First"Search"
Delaunay"Triang."
Triangle"Ray"Inter."
Nearest"Neighbors"
Sparse"M
xV"
Nbody"
Suffix"Array"
T1/T32"Tseq/T32"
![Page 42: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/42.jpg)
Spanning Tree
SequenAal algorithm:for each (u,v) in E u’ = find(u) v’ = find(v) if (u’ != v’) union(u’,v’)
Alenex 2012 42
![Page 43: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/43.jpg)
Spanning Tree struct STStep { bool reserve(int i) { u = F.find(E[i].u); v = F.find(E[i].v); if (u == v) return 0; if (u > v) swap(u,v); R[v].reserve(i); return 1;}
bool commit(int i) { if (R[v].check(i)) { F.link(v, u); return 1;} else return 0; }};
void ST(res* R, edge* E, int m, int n, int psize) { disjointSet F(n); speculative_for(STStep(E, F, R), 0, m, psize);}
Alenex 2012 43
![Page 44: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/44.jpg)
Delaunay TriangulaAon/Refinement
• Add points in parallel but detect conflicts
Alenex 2012 44
15
16 16
15
15
15
16
15
15
![Page 45: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/45.jpg)
DicAonary Using hashing:
– Based on generic hash and comparison – Problem: representaAon can depend on ordering. Also on which redundant element is kept.
– SoluAon: Use history independent hash table based on linear probing…representaAon is independent of order of inserAon
– Use write-‐min on collision
Alenex 2012 45
6 7 3 11 9 5 8
7, 11 3 9 8, 5 6
![Page 46: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/46.jpg)
Breadth First Search (BFS)
Goal: generate the same BFS (spanning) tree as the sequenAal Q based algorithm.
Alenex 2012 46
![Page 47: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/47.jpg)
Breadth First Search (BFS)
SequenAal algorithm:
Alenex 2012 47
![Page 48: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/48.jpg)
Breadth First Search (BFS)
Another possible tree:
Alenex 2012 48
![Page 49: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/49.jpg)
Breadth First Search (BFS)
SoluAon: – Maintain FronAer and priority order it – Use writeMin to choose winner.
Alenex 2012 49
1
1
2
1
2
3
1
2
3
4
![Page 50: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/50.jpg)
Delaunay TriangulaAon/Refinement
• Incremental algorithm adds one point at a Ame, but points can be added in parallel if they don’t interact.
• The problem is that the output will depend on the order they are added.
Alenex 2012 50
![Page 51: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/51.jpg)
Delaunay TriangulaAon/Refinement
• Adding points determinisAcally
Alenex 2012 51
![Page 52: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/52.jpg)
Delaunay TriangulaAon/Refinement
• Adding points determinisAcally
Alenex 2012 52
![Page 53: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/53.jpg)
Delaunay TriangulaAon/Refinement
• Adding points determinisAcally
Alenex 2012 53
![Page 54: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/54.jpg)
Performance on 32 Core Intel Nehalem
Alenex 2012 54
Tseq/T3231.621.611.2109
14.5151711.7171815
0"
4"
8"
12"
16"
20"
24"
28"
32"
Sort"
Duplicate"Removal"
Min"Spanning"Tree"
Max"Independ."Set"
Spanning"Forest"
Breadth"First"Search"
Delaunay"Triang."
Triangle"Ray"Inter."
Nearest"Neighbors"
Sparse"M
xV"
Nbody"
Suffix"Array"
T1/T32"Tseq/T32"
![Page 55: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/55.jpg)
Some Conclusions from Experiments
• MulAcores work quite well…but there are some issues with memory bandwidth
• Most problems parallelize well. • Cost models are reasonably accurate • Parallel code does not need to be complicated • Need a mix of parallelizaAon techniques
Alenex 2012 55
![Page 56: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/56.jpg)
Open QuesAons
• How do the benchmarks do on other machines….other models?
• Are there beser sequenAal implementaAons • Are there beser parallel implementaAons • More benchmarks – perhaps ones that don’t parallelize well (e.g. max flow?).
Alenex 2012 56
![Page 57: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(](https://reader034.fdocuments.net/reader034/viewer/2022042023/5e7b2640e0238843bd725453/html5/thumbnails/57.jpg)
Back to the benchmarks
• Need for standardized “problem based” benchmarks for comparing approaches.
• ParAcularly important for parallel algorithms, but also useful for sequenAal algorithms.
• With adequate framework, should be possible for anyone to submit new benchmarks and soluAons.
Alenex 2012 57