Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf ·...
Transcript of Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf ·...
![Page 1: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/1.jpg)
Multithreaded Algorithms for Approx and Exact Matching in Graphs
M. Halappanavar1, A. Azad2, F. Dobrian3, J. Feo1, and A. Pothen2
1 Pacific Northwest National Laboratory 2 Purdue University
3 Conviva Inc.
26 January, 2011
First ICCS Workshop 1
![Page 2: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/2.jpg)
Motivation: Irregular applications
Community thought leaders
Blog Analysis
Community Activities
FaceBook - 300 M users
Connect-the-dots
Bus
Hayashi Zaire
Train Anthrax
Money Endo
National Security
People, Places, & Actions
Semantic Web
Anomaly detection
Security
N-x contingency analysis
Power Grids
2
![Page 3: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/3.jpg)
Challenges
! Problem size ! Ton of bytes, not ton of flops
! Little data locality ! Have only parallelism to tolerate latencies
! Low computation to communication ratio ! Single word access ! Threads limited by loads and stores
! Synchronization points are simple elements ! Node, edge, record
! Work tends to be dynamic and imbalanced ! Let any processor execute any thread
![Page 4: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/4.jpg)
4
Fast On-Demand Context Switch
Memory Tag Bits
Globally Shared Memory
Latency Tolerance Concurrency
Multithreaded Architectures
Key Architectural Features
Source: Jace Mogill, PNNL
![Page 5: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/5.jpg)
Overview
5
XMT Niagara-2 Nehalem
Multithreading Caching
Approx Algorithms: ! Queue-based ! Q + Sorting ! Dataflow Exact Algorithms
Magny-Cours
Input: ! RMAT-ER ! RMAT-G ! RMAT-B
![Page 6: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/6.jpg)
Matching
! A matching M is a subset of edges such that no two edges in M are incident on the same vertex
! Maximum matching maximizes some function ! Number of edges matched (cardinality) ! Sum or product of edge-weights
1000 1
1
1
1
![Page 7: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/7.jpg)
Applications of matching
! Sparse linear solvers
! Block triangular form
! Graph partitioners
! Bioinformatics
! Web technology
! High speed network switching
! …
![Page 8: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/8.jpg)
Algorithms
! Exact Algorithms:
! Polynomial time algorithm first due to Edmonds
! Maximum matching: Hopcroft-Karp
! Maximum weighted: Hungarian
! (half) Approx Algorithms:
! Sorting-based approaches (Global)
! Search-based approaches (Local)
! Preis’s algorithm and its variants (Hoepman; Manne and Bisseling)
![Page 9: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/9.jpg)
Pointer-based algorithm (Queue-based)
! Identify locally-dominant edges using pointers
! Implement with queues (queue matched vertices)
! Variant: sorted edge-sets
1
2
3
4
6
5
35 25
20
10
15
5
20 15
5
![Page 10: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/10.jpg)
Pointer-based algorithm (Dataflow)
1
2
3
4
6
5
35 25
20
10
15
5
20 15
5
- Each node sets signal on its side of heaviest edge to 1
- Reads companion signal
_
_
_
_
_
_ _ _
_
_ _ _ _
_
_ _
_ 1
1
1
_
1
1 _ _
_
_ 1
_ _
_
_ _
_
! Queue headers can hotspot
! Dataflow approach
![Page 11: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/11.jpg)
Dataflow (cont.)
1
2
3
4
6
5
35 25
20
10
15
5
20 15
5
- If companion signal is 1, then set signal of other edges to 0 and stop
- else set signal on next heaviest edge to 1
1
1
1
_
1
1 _ _
_
_ 1
_ _
_
_ _
_ 1
1
1
0
1
1 0 0
0
_ 1
_ _
0
_ _
0 1
1
1
0
1
1 0 0
0
0 1 0
0
0
1 1
0
![Page 12: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/12.jpg)
Overview
12
XMT Niagara-2 Nehalem
Multithreading Caching
Approx Algorithms: ! Queue-based ! Q + Sorting ! Dataflow Exact Algorithms
Magny-Cours
Input: ! RMAT-ER ! RMAT-G ! RMAT-B
![Page 13: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/13.jpg)
Cray XMT: A block view
! Threadstorm Processor: ! 500 MHz ! 128 thread-streams ! VLIW
! 8 GB/proc ! 3D Torus
Interconnect
13
![Page 14: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/14.jpg)
Cray XMT: Memory
Physically distributed, globally addressable
14
! 8 GB/proc ! Total = 1TB (128p) ! Byte addressable ! H/w hashing ! 64Byte granularity ! Worst-case latency is
1000 cycles ! Sustainable 60
Megawords/s per processor
![Page 15: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/15.jpg)
Datasets: Synthetic data with R-MAT
! R-MAT: Recursive MATrix method
! Experiments ! RMAT-ER (0.25, 0.25, 0.25, 0.25) ! RMAT-G (0.45, 0.15, 0.15, 0.25) ! RMAT-B (0.55, 0.15, 0.15, 0.15)
15
Chakrabarti, D. and Faloutsos, C. 2006. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38, 1 (Jun. 2006), 2.
![Page 16: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/16.jpg)
Datasets for experiments
Degree distribution Clustering coefficient
16
![Page 17: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/17.jpg)
Experimental Results
• ½-approx algorithm • Magny-cours, Nehalem, Niagara-2, XMT • RMAT-B
17
![Page 18: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/18.jpg)
Matching: Cardinality
18
Graph Init. (% of final card)
Final (% of |V|)
RMAT-ER 53.14% 94.12 RMAT-G 46.33% 81.70% RMAT-B 36.06% 44.24%
![Page 19: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/19.jpg)
Matching: Queue status
19
1.00E+00 4.00E+00 1.60E+01 6.40E+01 2.56E+02 1.02E+03 4.10E+03 1.64E+04 6.55E+04 2.62E+05 1.05E+06 4.19E+06 1.68E+07
1 2 3 4 5 6 7 8 9 10 11
Que
ue s
ize
Iteration Number
RMAT-ER RMAT-G RMAT-B Expon.(RMAT-G)
![Page 20: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/20.jpg)
Strong scaling: Nehalem & Magny-Cours
20
RMAT-B
1
2
4
8
16
32
1 2 4 8
Com
pute
tim
e in
sec
onds
Number of cores
1 thread/core 2 threads/core
Nehalem Magny-Cours
0.5
1
2
4
8
16
32
64
1 2 4 8 16 32 48
Com
pute
tim
e in
sec
onds
Number of cores
actual
linear scaling
Algorithm: Queue-based
![Page 21: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/21.jpg)
Strong scaling: Nehalem & Niagara-2
21
RMAT-B
1
2
4
8
16
32
64
128
256
1 2 4 8
Com
pute
tim
e in
sec
onds
Number of cores
1 thread/core
2 threads/core
4 threads/core
8 threads/core
1
2
4
8
16
32
1 2 4 8
Com
pute
tim
e in
sec
onds
Number of cores
1 thread/core 2 threads/core
Nehalem Niagara-2
Algorithm: Queue-based
![Page 22: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/22.jpg)
Strong scaling: Nehalem & XMT
22
RMAT-B
1
2
4
8
16
32
1 2 4 8
Com
pute
tim
e in
sec
onds
Number of cores
1 thread/core 2 threads/core
Nehalem XMT
0.25
0.5
1
2
4
8
16
32
64
1 2 4 8 16 32 64 128
Com
pute
tim
e in
sec
onds
Number of processors
Queue
Q-Sorted
Dataflow
Algorithm: Queue; Q-Sorted; Dataflow
![Page 23: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/23.jpg)
Strong scaling: XMT
23
RMAT-ER RMAT-B
Algorithm: Queue; Q-Sorted; Dataflow
0.25
0.5
1
2
4
8
16
32
64
1 2 4 8 16 32 64 128
Com
pute
tim
e in
sec
onds
Number of processors
Queue
Q-Sorted
Dataflow
0.25
0.5
1
2
4
8
16
32
64
1 2 4 8 16 32 64 128
Com
pute
tim
e in
sec
onds
Number of processors
Queue Q-Sorted Dataflow
![Page 24: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/24.jpg)
Exact matching
24
! Augmentation-based approach ! Single-path v/s Multiple-path ! Hopcroft-Karp algorithm:
Breadth-first + Depth-first Dynamic: amount and type of parallelism Nested loop structure
! Our approach: ! Different locking policies (first-visited, last-, random) ! Disjoint forest (merge BFS+DFS)
! Future: Use futures :-)
![Page 25: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/25.jpg)
25
Fast On-Demand Context Switch
Memory Tag Bits
Globally Shared Memory
Latency Tolerance Concurrency
Multithreaded Architectures
Summary & conclusion: The trinity
http://cass-mt.pnl.gov/ Thank You!
![Page 26: Multithreaded Algorithms for Approx and Exact Matching in ... Mahantesh Halappanavar.pdf · Multithreaded Algorithms for Approx and Exact Matching in Graphs M. Halappanavar1, A. Azad2,](https://reader030.fdocuments.net/reader030/viewer/2022040310/5d0e24e188c9937f3b8b9260/html5/thumbnails/26.jpg)
26