Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz...

31
Partitioning and Partitioning Tools Tim Barth NASA Ames Research Center Moffett Field, California 94035-1000 USA 1

Transcript of Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz...

Page 1: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Partitioning and Partitioning Tools

Tim BarthNASA Ames Research Center

Moffett Field, California 94035-1000 USA

1

Page 2: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Graph/Mesh Partitioning

• Why do it?

• The graph bisection problem

• What are the standard heuristic algorithms?

• What tools are available?

2

Page 3: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• Efficient utilization of distributed computational resources

– Equidistribution of workload among processors (load balancing)

– Minimized time spend in interprocessor communication

∗ Communication takes time and it’s not always possible to hide

this latency in data tranfer

∗ Cost of communication is often modeled by the linear

relationship for n messages: Cost =∑

n(α+ βmn)

(a) (b)

Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh

with minimized message length.

3

Page 4: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• Efficient utilization of distributed computational resources

– Equidistribution of workload among processors (load balancing)

– Minimized time spend in interprocessor communication

∗ Communication takes time and it’s not always possible to hide

this latency in data tranfer

∗ Cost of communication is often modeled by the linear

relationship for n messages: Cost =∑

n(α+ βmn)

(a) (b)

Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh

with minimized message length.

3-a

Page 5: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• Efficient utilization of distributed computational resources

– Equidistribution of workload among processors (load balancing)

– Minimized time spend in interprocessor communication

∗ Communication takes time and it’s not always possible to hide

this latency in data tranfer

∗ Cost of communication is often modeled by the linear

relationship for n messages: Cost =∑

n(α+ βmn)

(a) (b)

Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh

with minimized message length.

3-b

Page 6: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• Efficient utilization of distributed computational resources

– Equidistribution of workload among processors (load balancing)

– Minimized time spend in interprocessor communication

∗ Communication takes time and it’s not always possible to hide

this latency in data tranfer

∗ Cost of communication is often modeled by the linear

relationship for n messages: Cost =∑

n(α+ βmn)

(a) (b)

Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh

with minimized message length.

3-c

Page 7: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• Efficient utilization of distributed computational resources

– Equidistribution of workload among processors (load balancing)

– Minimized time spend in interprocessor communication

∗ Communication takes time and it’s not always possible to hide

this latency in data tranfer

∗ Cost of communication is often modeled by the linear

relationship for n messages: Cost =∑

n(α+ βmn)

(a) (b)

Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh

with minimized message length.

3-d

Page 8: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• Efficient utilization of distributed computational resources

– Equidistribution of workload among processors (load balancing)

– Minimized time spend in interprocessor communication

∗ Communication takes time and it’s not always possible to hide

this latency in data tranfer

∗ Cost of communication is often modeled by the linear

relationship for n messages: Cost =∑

n(α+ βmn)

(a) (b)

Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh

with minimized message length.

3-e

Page 9: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• As a strategy for reducing the overall arithmetic complexity ofan algorithm

– Overlapping Schwarz methods

– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring

– Multiscale methods, e.g. agglomeration multigrid

4

Page 10: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• As a strategy for reducing the overall arithmetic complexity ofan algorithm

– Overlapping Schwarz methods

– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring

– Multiscale methods, e.g. agglomeration multigrid

4-a

Page 11: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• As a strategy for reducing the overall arithmetic complexity ofan algorithm

– Overlapping Schwarz methods

– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring

– Multiscale methods, e.g. agglomeration multigrid

4-b

Page 12: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• As a strategy for reducing the overall arithmetic complexity ofan algorithm

– Overlapping Schwarz methods

– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring

– Multiscale methods, e.g. agglomeration multigrid

4-c

Page 13: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• As a strategy for reducing the overall arithmetic complexity ofan algorithm

– Overlapping Schwarz methods

– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring

– Multiscale methods, e.g. agglomeration multigrid

4-d

Page 14: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• Overlapping Schwarz methods

0 20 40 60 80

Schwarz Iterations

-15

-14

-13

-12

-11

-10

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

10101010101010101010101010101010

Nor

m (

Glo

bal)

Res

idua

l

2nd Order Scheme2 Partition, 1 Overlap2 Partition, 2 Overlap2 Partition, 3 Overlap8 Partition, 1 Overlap8 Partition, 2 Overlap8 Partition, 3 Overlap

5

Page 15: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• Overlapping Schwarz methods with subdomain size H, mesh cell

size h and overlap δ

Let A be the discretization matrix and Mas the additive Schwarz

preconditioner. There exists a constant C independent of H and h

such that the condition number κ

κ(M−1as A) ≤ CH−2

(

1 +

(

H

δ

)2)

. (1)

with 2-level coarse space correction

There exists a constant C independent of H and h such that

κ(M−1as A) ≤ C

(

1 +

(

H

δ

))

. (2)

6

Page 16: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Why do it?

• Substructuring

A1 A2

A3 A4

x1

x2

=

b1

b2

, A−1 =

C1 C2

C3 C4

with S = A4 −A3A−11 A2, C1 = A−1

1 +A−11 A2S

−1A3A−11 ,

C2 = −A−11 A2S

−1, C3 = −S−1A3A−11 , C4 = S−1.

κ(M−1SchurA) = C(1 + log(H/δ))

7

Page 17: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Graph Bisection (np hard)

Define a partitioning vector p ∈ Zn which 2-colors the vertices of a graph

p = [+1,−1,−1,+1,+1, ...,+1,−1]T (3)

+1

+1

+1

+1

+1+1

+1

+1

+1

+1

-1

-1

-1

-1

-1

-1

-1

-1

-1

• Minimize the cut-weight of the weighted graph

• Produce balanced partitions

8

Page 18: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Heuristic Graph Partitioning

Three commonly used partitioning techniques

• Recursive coordinate bisection

• Recursive Cuthill-McKee

• Recusive Spectral bisection

9

Page 19: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Recursive Coordinate Bisection

• Spatial coordinates are sorted along alternating horizontal andvertical directions

• Divisors are found to balance partitions

10

Page 20: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Graph Ordering Cuthill-McKee

Algorithm: Graph ordering, Cuthill-McKee.

Step 1. Find vertex with lowest degree. This is the root vertex.

Step 2. Find all neighboring vertices connecting to the root by incident

edges. Order them by increasing vertex degree. This forms level 1.

Step 3. Form level k by finding all neighboring vertices of level k − 1

which have not been previously ordered. Order these new vertices by

increasing vertex degree.

Step 4. If vertices remain, go to step 3.

11

Page 21: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Graph Ordering Cuthill-McKee

Matrix nonzero pattern

Figure 2: Natural Ordering (left) and Cuthill-McKee ordering (right)

12

Page 22: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Recursive Cuthill-McKee

• The level structure computed in Cuthill-McKee ordering isutilized

• Divisors are found to balance partitions

13

Page 23: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Recursive Spectral Bisection

Motivated by the observation that the cut-weight of a graph is precisely

Wc =1

4pTLp

Algorithm: Spectral Graph Bisection.

Step 1. Calculate the matrix L associated with the Laplacian of the

graph.

Step 2. Calculate the eigenvalues and eigenvectors of L.

Step 3. Order the eigenvalues by magnitude, λ1 ≤ λ2 ≤ λ3...λn.

Step 4. Determine the smallest nonzero eigenvalue, λf and its associated

eigenvector xf (the Fiedler vector).

Step 5. Sort elements of the Fiedler vector.

Step 6. Choose a divisor at the median of the sorted list and 2-color

vertices of the graph which correspond to elements of the Fielder vector

less than or greater than the median value.

14

Page 24: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Recursive Spectral Bisection

15

Page 25: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Multilevel k-way Partitioning

• Utilized successive k-way graph contraction to coarsen graph

• Perform high quality partitioning on coarsened graph

• Prolongate to finer graphs with local interface optimization toimprove cut-weight

16

Page 26: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

17

Page 27: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

18

Page 28: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Metis, ParMetis

• Extremely fast

• Parallel implementation (requires some initial partitioning)

• Supports weighted graphs by vertices or edges

• Supports incremental load balancing (repartitioning) withminimized data migration

19

Page 29: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

20

Page 30: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Zoltan

• Relatively new package under development at Sandia underGPL

• Interfaces with Metis or Jostle

• Documentation suggests that the package will contain most ofthe commonly needed services for parallel scientific codes:partitioning, repartitioning, data migration, etc.

21

Page 31: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10

Partitioning Tools for SSS?

• Domain specific languages?

– Language for finite element methods

– Language for molecular dynamics

– <Insert your favorite problem domain here>

• Partial or full data dependency specification (analogous toscene graph specification in Java3d).

• Automatic tools for performance enhancement

– Use hardware performance statistics (memory accesspatterns) of previous executions in subsequence compilations

– Runtime data migration

22