Department of Biomedical Informatics Dynamic Load Balancing (Repartitioning) & Matrix Partitioning...
-
Upload
moses-benson -
Category
Documents
-
view
216 -
download
0
Transcript of Department of Biomedical Informatics Dynamic Load Balancing (Repartitioning) & Matrix Partitioning...
Department of Biomedical Informatics
Dynamic Load Balancing (Repartitioning)
&Matrix Partitioning
Ümit V. Çatalyürek
Associate Professor
Department of Biomedical Informatics
Department of Electrical & Computer Engineering
The Ohio State University
Workshop onCombinatorial Scientific Computing & Petascale Simulations 2008
June 10-13, 2008, Santa Fe, NM
Department of Biomedical Informatics
OSU’s CSCAPES Contributions
• Load Balancing• Parallel Static Load Balancing• Parallel Dynamic Load Balancing
• Parallel Graph Coloring• Distance-1 coloring• Distance-2 coloring• talk by Bozdag Friday morning
• Parallel Matrix Partitioning
• Parallel Matrix Ordering
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
2CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
Roadmap
• Dynamic Load Balancing• Motivation• Background
• Classification of Repartitioning Techniques• Graph and Hypergraph Approaches
• New Hypergraph Model for Dynamic Load Balancing• Parallel Multilevel Hypergraph Partitioning with Fixed Vertices • Experimental Results & Summary
• Matrix Partitioning• 1D Hypergraph-based Methods: Row-wise and Column-wise • 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard• Experimental Results & Summary
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
3CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
Partitioning and Load Balancing
• Goal: assign data to processors to• minimize application runtime• maximize utilization of computing resources
• Metrics:• minimize processor idle time (balance workloads)• keep inter-processor communication costs low
• Impacts performance of a wide range of simulations
Adaptive mesh refinementContact detection Particle simulations
x bA
=
Linear solvers & preconditioners
CSCAPES Workshop, June 10, 2008 4Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
Department of Biomedical Informatics
Dynamic Load Balancing/Repartitioning
• Applications with workload or locality that changes during simulation require dynamic load balancing (a.k.a. repartitioning)• Adaptive mesh refinement• Particle methods• Contact detection
• Repartitioning has additional cost:• Moving data from old to new decomposition
executionT = #iter x ( computationT + communicationT) + repartT + migrationT
CSCAPES Workshop, June 10, 2008 5Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
Department of Biomedical Informatics
Roadmap
• Dynamic Load Balancing• Motivation• Background
• Classification of Repartitioning Techniques• Graph and Hypergraph Approaches
• New Hypergraph Model for Dynamic Load Balancing• Parallel Multilevel Hypergraph Partitioning with Fixed Vertices • Experimental Results & Summary
• Matrix Partitioning• 1D Hypergraph-based Methods: Row-wise and Column-wise • 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard• Experimental Results & Summary
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
6CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
Classification of Dynamic Load Balancing Approaches
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
7CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
Graph and Hypergraph Partitioning
Graphs Hypergraphs
Community load-balancing(highly successful for PDE problems)
VLSI, recently Computational Science
Model Vertices = computation/dataEdge = relationship between computation/data (bi-directional)
Vertices= computation/dataEdge = dependency to data elements (multi-way)
Goal Evenly distribute vertex weight while minimizing weight of cut edges
Evenly distribute vertex weight while minimizing cut size
Algorithms Kernighan, Lin, Simon, Hendrickson, Leland, Kumar, Karypis, et al.
Kernighan, Schweikert, Fiduccia, Mattheyes, Sanchis, Alpert, Kahng, Hauck, Borriello, Çatalyürek, Aykanat, Karypis, et al.
Serial Partitioner
Chaco (SNL), Jostle (U. Greenwich), METIS (U. Minn.), Party (U. Paderborn), Scotch (U. Bordeaux)
hMETIS (Karypis), PaToH (Çatalyürek), Mondriaan (Bisseling)
Parallel Partitioner
ParMETIS (U. Minn.), PJostle (U. Greenwich)
Zoltan PHG (Sandia), Parkway (Trifunovic)
CSCAPES Workshop, June 10, 2008 8Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
Department of Biomedical Informatics
Impact of Hypergraph Models(Where Graph is not Sufficient)
• Greater expressiveness Greater applicability• Structurally non-symmetric systems
• circuits, biology• Rectangular systems
• linear programming, least-squares methods• Non-homogeneous, highly connected topologies
• circuits, nanotechnology, databases• Multiple models for different granularity partitioning
• Owner compute, fine-grain, checkerboard/cartesian, Mondriaan
• Accurate communication model lower application communication costs
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
9CSCAPES Workshop, June 10, 2008
P4P3
P1
Vi Vk
Vj
Vm
Vh
Vl
ni
nk
nl
nm
nh
Mondriaan PartitioningCourtesy of Rob Bisseling
P4P3
P1 P2
Vi Vk
Vj
Vm
Vh
Vl
Department of Biomedical Informatics
Roadmap
• Dynamic Load Balancing• Motivation• Background
• Classification of Repartitioning Techniques• Graph and Hypergraph Approaches
• New Hypergraph Model for Dynamic Load Balancing• Parallel Multilevel Hypergraph Partitioning with Fixed Vertices • Experimental Results & Summary
• Matrix Partitioning• 1D Hypergraph-based Methods: Row-wise and Column-wise • 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard• Experimental Results & Summary
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
10CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
Hypergraph Model
• : #parts edge ei connects
• Cut =
• Cut = total comm volume
λi
eiÃŽE
λi
1 Âci
CSCAPES Workshop, June 10, 2008 11Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
Department of Biomedical Informatics
• Start with application hypergraph
• Add • one partition vertex for each partition• migration edges connecting application
vertices to their partition vertices
• Weight the hyperedges:• Migration edge weight =
size of application objects (migration size)• Application edge weight =
size of communication elements • Scale application edge weights by ≈
number of application communications between repartitions (#iter)
• Perform hypergraph partitioning with partition vertices “fixed”
Hypergraph Repartitioning
CSCAPES Workshop, June 10, 2008 12Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
Department of Biomedical Informatics
• Start with application hypergraph
• Add • one partition vertex for each partition• migration edges connecting application
vertices to their partition vertices
• Weight the hyperedges:• Migration edge weight =
size of application objects (migration size)• Application edge weight =
size of communication elements • Scale application edge weights by ≈
number of application communications between repartitions (#iter)
• Perform hypergraph partitioning with partition vertices “fixed”
Hypergraph Repartitioning
CSCAPES Workshop, June 10, 2008 13Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
Department of Biomedical Informatics
• Start with application hypergraph
• Add • one partition vertex for each partition• migration edges connecting application
vertices to their partition vertices
• Weight the hyperedges:• Migration edge weight =
size of application objects (migration size)• Application edge weight =
size of communication elements • Scale application edge weights by ≈
number of application communications between repartitions (#iter)
• Perform hypergraph partitioning with partition vertices “fixed”
Hypergraph Repartitioning
executionT = #iter x ( computationT + communicationT) + repartT + migrationT
CSCAPES Workshop, June 10, 2008 14Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
Department of Biomedical Informatics
• Start with application hypergraph
• Add • one partition vertex for each partition• migration edges connecting application
vertices to their partition vertices
• Weight the hyperedges:• Migration edge weight =
size of application objects (migration size)• Application edge weight =
size of communication elements • Scale application edge weights by ≈
number of application communications between repartitions (#iter)
• Perform hypergraph partitioning with partition vertices “fixed”
Hypergraph Repartitioning
CSCAPES Workshop, June 10, 2008 15Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
Department of Biomedical Informatics
Roadmap
• Dynamic Load Balancing• Motivation• Background
• Classification of Repartitioning Techniques• Graph and Hypergraph Approaches
• New Hypergraph Model for Dynamic Load Balancing• Parallel Multilevel Hypergraph Partitioning with Fixed Vertices • Experimental Results & Summary
• Matrix Partitioning• 1D Hypergraph-based Methods: Row-wise and Column-wise • 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard• Experimental Results & Summary
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
16CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
Implementation of Hypergraph Repartitioning
• Implemented in Zoltan toolkit
• Based on parallel multilevel parallel hypergraph partitioner with recursive bisection (IPDPS’06)
• Automatically construct augmented hypergraph
• … with added capability for handling “fixed vertices.”
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
17CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
Experimental Results
• Experiments on • OSU-RI cluster
• 64 compute nodes connected with Infiniband
• Dual 2.4 GHz AMD Opteron processors with 8 GB RAM
• Sandia-Thunderbird cluster• 4,480 compute nodes connected with Infiniband
• Dual 3.6 GHz Intel EM64T processors with 6 GB RAM
• Zoltan v3 (alpha) hypergraph partitioner & ParMETIS v3.1 graph partitioner
• Test problems:• 2DLipid: density functional theory; 4K x 4K; 5.6M nonzeros• Xyce: ASIC Stripped; 680K x 680K; 2.3M nonzeros• Cage14: DNA Electrophoresis; 1.5M x 1.5M; 27M nonzeros
Xyce ASIC Stripped
Cage Electrophoresis
CSCAPES Workshop, June 10, 2008 18Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
Department of Biomedical Informatics
Communication Volume
2DLipid
• Hypergraph is better• Zoltan-repart trades comm with migration
to min tot cost• Scratch methods are comparable for large
alpha (#iter)
Xyce
Cage14
Department of Biomedical Informatics
Dynamic Graph: Partitioning Time on T-bird
2DLipid
Cage14
Xyce
Department of Biomedical Informatics
Summary of Dynamic Load Balancing
• A novel hypergraph model for dynamic load balancing• Single hypergraph that incorporates both communication
volume in the application and data migration cost• Performs better or comparable to graph-based dynamic load
balancing
• A parallel dynamic load balancing tool• Essential for peta-scale applications• Scales similar to those of graph-based tools
• Future Work• There is always room for improvement: speed and/or quality• Direct k-way refinement
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
21CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
Roadmap
• Dynamic Load Balancing• Motivation• Background
• Classification of Repartitioning Techniques• Graph and Hypergraph Approaches
• New Hypergraph Model for Dynamic Load Balancing• Parallel Multilevel Hypergraph Partitioning with Fixed Vertices • Experimental Results & Summary
• Matrix Partitioning• 1D Hypergraph-based Methods: Row-wise and Column-wise • 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard• Experimental Results & Summary
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
22CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
Matrix Partitioning
• Hypergraph Models for Sparse-Matrix Partitioning• 1D
• row-wise • column-wise
• 2D• Fine-grain• Jagged-like• Checkerboard
• Serial Tool: PaToH & Matlab interface• Matrix Partitioning• Partitioned Matrix Display
CSCAPES Workshop, June 10, 2008 23Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
Department of Biomedical Informatics
1D Partitioning
• M x N matrices with K processors
• Worst case• Total Volume = (K-1) x N words or (K-1) x M words• Total Number Messages = K x (K-1)
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
24CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
2D Partitioning:Jagged-Like
• M x N matrices with K=PxQ processors
• Worst case• Total Volume = (K-P) x N + (Q-1) x M• Total Number Messages = K x (K-Q) + K x (Q-1) = K x (K-1)
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
25CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
2D Partitioning: Checkerboard
• M x N matrices with K=PxQ processors
• Worst case• Total Volume = (P-1) x N + (Q-1) x M• Total Number Messages = P+Q-2
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
26CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
cage5
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
27CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
cage5
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
28CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
cage5
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
29CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
Experimental Results
• Tested 1,413 matrices (out of 1,877) from UFL Collection• #rows >= 500 and #columns >= 500• #non-zeros < 10,000,000
• K-way partitioning for K = 4, 16, 64 and 256• If 50 x K >= max {#rows, #columns}
• Partitioning instance = matrix & K• For each partitioning instance we run RW, CW, JL, CH, FG methods
• Linux Cluster• 64 dual 2.4GHz Opteron CPUs, 8GB ram
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
30CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
Experimental Results: Total Communication Volume
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
31CSCAPES Workshop, June 10, 2008
All Instances (4040) Square Symmetric (2231)
Performance Profiles
Department of Biomedical Informatics
Experimental Results: Total Communication Volume
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
32CSCAPES Workshop, June 10, 2008
Square Non-symmetric (1102) Rectangular (707)N>M (662) CW better than RWM>N (45)
Department of Biomedical Informatics
Experimental Results: Total Number of Messages
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
33CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
Experimental Results: Execution Time
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
34CSCAPES Workshop, June 10, 2008
Department of Biomedical Informatics
Summary of Matrix Partitioning
• Hypergraph models for Matrix Partitioning• Well.. some are not new but not have been adopted by applications yet.
Why? (Information dissemination problem? Tool?) • More hypergraph-based methods are being developed!
• Corner-Model• Hybrid Mondrian with Fine-Grain
• Matlab interface to PaToH for Matrix Partitioning• Currently supports: RW, CW, JL, CH, FG• Will be available soon
• Work in progress• Parallel Matrix Partitioning via Zoltan
CSCAPES Workshop, June 10, 2008 35Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
Department of Biomedical Informatics
Thanks
• Contact Info:• [email protected]• http://bmi.osu.edu/~umit
• Also: • http://www.cs.sandia.gov/Zoltan/• http://www.cscapes.org/
Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"
36CSCAPES Workshop, June 10, 2008