ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring...

16
ExaGraph: Parallel Partitioning and Coloring for Exascale Applications Seher Acer, Erik Boman, Siva Rajamanickam Sandia National Labs * Ian Bogle (RPI/Sandia), George Slota (RPI/Sandia) ECP Annual Meeting, Houston, Feb. 2020 Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.

Transcript of ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring...

Page 1: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

ExaGraph: Parallel Partitioning and Coloring for ExascaleApplications

Seher Acer, Erik Boman, Siva RajamanickamSandia National Labs*

Ian Bogle (RPI/Sandia), George Slota (RPI/Sandia)

ECP Annual Meeting, Houston, Feb. 2020

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration

under contract DE-NA-0003525.

Page 2: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

2 Exascale Computing Project

ExaGraph: Sandia focus

Problems/topics:• Graph coloring• Graph partitioning• Machine learning using graphs (Siva)

Software: We will deliver through existing software• Trilinos/Zoltan2 (distributed-memory)• KokkosKernels (shared-memory)

Page 3: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

3 Exascale Computing Project

Load Balancing and Partitioning

• Partitioning: – Assignment of application data to processors.

• For example: mesh points, elements, matrix rows.

• Ideal partition:– Work (load) is well balanced among proc.– Inter-processor communication is kept low.

• Low communication volume, few messages, etc.

Partition of an unstructured finite element mesh for three processors

Page 4: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

4 Exascale Computing Project

Partition

Exascale Graph Partitioning

• Multilevel: Highly successful graph partitioning method– Bui & Jones (1993); Hendrickson & Leland (1993); Karypis and Kumar (1995)– Construct smaller approximations to graph.– Perform graph partitioning on coarse graph.– Propagate partition back, refine as needed (typically each level)

• Software– Graphs: (Par-)Metis, Scotch, KaHip/Kaffpa, …– Hypergraphs: PaToH, hMetis, Zoltan/PHG, Mondriaan

• Problems– Algorithms are hard to parallelize– Quality and run-time often deteriorate for large #proc (cores)– No software for multi-GPU (or similar accelerators)!

Page 5: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

5 Exascale Computing Project

Spectral partitioning

• Algebraic algorithm based on graph Laplacian: min !"#(%)!!"!

– Partitioning: Fiedler (‘73), Donath, Hoffman (‘73), Pothen, Simon, Liou (‘90)– Clustering: Hagen, Kahng (‘92), Shi, Malik (‘00), Ng et al. (‘02)

• Pros:– Matrix-based, can reuse linear algebra software (e.g. Trilinos)– In parallel, quality is almost independent of # processors– Well suited to GPU and accelerators

• Cons:– Computationally expensive– Poor quality in some cases

Page 6: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

6 Exascale Computing Project

Exagraph Partitioning

• SPHYNX: New spectral partitioner– Based on Trilinos, uses Kokkos for portability (GPU)– Uses Anasazi (LOBPCG) for eigenvector computations– Uses Zoltan2 MJ for geometric partitioning– Runs on multi-GPU systems

v3

v5

v7 v8

v6 v9

v11 v12

v10

v13

v15 v16

v14

v1

v4

v2

eigenvectors

1

2

21

2

3

4

9 10

11 12

68

7

5

13 14

15 16

v3

v5

v7 v8

v6 v9

v11 v12

v10

v13

v15 v16

v14

v1

v4

v2

multi-jaggedgraph G partition ∏

Page 7: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

7 Exascale Computing Project

Sphynx: ResultsSPHYNX on 24 GPUs against ParMETIS (24 MPI ranks) on Summit:• On ASIC_680k, it is 54x faster with 1.66x increase in cutsize• On com-Orkut, it is 27x faster with 1.95x increase in cutsize• On cit-Patents, it is 2.3x faster with 3.79x increase in cutsize

Page 8: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

8 Exascale Computing Project

Sphynx: Work in Progress

• Integration into Trilinos/Zoltan2• Improve performance on meshes

– Sphynx currently often slower than ParMetis (on CPUs)– Need multilevel method or scalable preconditioner– Promising results using MueLu in LOBPCG

Page 9: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

9 Exascale Computing Project

Graph coloring

• Problem:– Label vertices in a graph so no two neighbors have the same color.– Many variations

• Applications:– Find independent tasks for parallel computing– Compress sparse matrices – Finite differences & AD– Independent sets for aggregation in AMG

Page 10: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

10 Exascale Computing Project

Graph Coloring: Problem• Given a graph G = ( V , E),

– With vertices – Edges

– Distance-1 graph coloring: assign colors to vertices so that each vertex have different color from all of its neighbors

• The distinct number of colors assigned to vertices: – Minimize is NP-Hard, not practical– Fast greedy heuristics work well in practice

• Trade-offs:– Speed vs quality– Deterministic or non-deterministic (parallel)

10

v ∈V(v1,v2 )∈ E v1,v2 ∈V

C :V→ N C(v1) ≠C(v2 ) for all (v1,v2 )∈ E

CC

Image courtesy of Sariyuce, Saule, Catalyurek, SIAM PP, 2012

Page 11: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

11 Exascale Computing Project

Graph Coloring Software• ColPack (Pothen et al., Purdue)

– Serial code (some OpenMP)– Wide variety of coloring problems

• Zoltan– MPI parallel– Dist-1, dist-2, partial dist-2

• KokkosKernels– On-node parallel using Kokkos (CPU, GPU)– Dist-1 and dist-2

• Zoltan2– Greedy method (serial)– Coming soon: MPI+Kokkos coloring based on KokkosKernels

11

Page 12: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

12 Exascale Computing Project

ExaGraph Coloring

• Collaboration with RPI (Bogle, Slota)

• Applications: SNL/ATDM (Empire, Sparc)– Want coloring of sparse Jacobian matrices (distributed)– Must run on both CPU and GPU

• Approach: Use KokkosKernels coloring on-node– Speculative method with conflict detection/resolution– Works with MPI+Kokkos (multi-GPU systems)– Novel communication-avoiding version with multiple ghost layers

12

1 2 4 8 160

20

40

60

MPI Ranks

Colors

Audi Graph Color Usage

1GL, no part1GL, vert part1GL, edge part2GL, no part2GL, vert part2GL, edge part

0 5 10 15

0.2

0.4

0.6

MPI Ranks

Tim

e(secon

ds)

Audi Graph Coloring Runtimes

1GL, no part1GL, vert part1GL, edge part2GL, no part2GL, vert part2GL, edge part

Page 13: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

13 Exascale Computing Project

Backup

Page 14: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

14 Exascale Computing Project

Zoltan2 Overview• Zoltan2: Trilinos package. Toolkit of combinatorial algorithms for

parallel computing on emerging architectures – Supersedes the Zoltan package

• Goals:– Provide graph algorithms needed on exascale systems

• Load-balancing and task placement for supercomputers (hierarchical systems)• Node-level coloring for multi-threaded parallelism• Highly parallel (eventually GPU) partitioning algorithms

– Support very large application problem sizes• Templated data type for global indices (64-bit integers work)

– Greater integration with Trilinos’ next-generation solver stack

14

Page 15: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

15 Exascale Computing Project

Zoltan2 Capabilities: Parallel Partitioning

• Provides common models and uniform interface to collection of partitioners– Easy to try & compare different methods

• Multi-Jagged Geometric Partitioning– Fast; scalable; enforces geometric locality of data– Uses geometric coordinates; generalizes RCB

• Graph partitioning – Interfaces to PT-Scotch (INRIA, France), ParMETIS (U.

Minnesota), PuLP and XtraPuLP (SNL, Penn St., RPI)– Connectivity-based; explicitly models

communication costs (not exact)– Widely used, proven track record

• Zoltan partitioning algorithms– Supports RCB, RIB, HSFC, hypergraph (PHG)

15

Image courtesy of Manoj Bhardwaj, SNL.

Page 16: ExaGraph: Parallel Partitioning and Coloring for …...ExaGraph: Parallel Partitioning and Coloring for Exascale Applications SeherAcer, Erik Boman, Siva Rajamanickam Sandia National

16 Exascale Computing Project

Coloring and Structural Orthogonality

c1 c2

c5

c4

c3

c1 c2

c5

c4

c3

a11 0 0 0 a15

0 a22 a23 0 0

0 a32 a33 a34 a35

0 0 a43 a44 a45

a51 0 a53 a54 a55

a11 a12 0 0 a15

a21 a22 0 0 0

a31 0 0 a34 0

0 0 a43 a44 a45

A Ga

Ga

2

c1

c3

c2

c4

c5

c1

c3

c2

c4

c5

r1

r2

r3

r4

Gb [V

2]2G

bA

symmetriccase

nonsymmetriccase

structurally orthogonalpartition distance-2 coloring distance-1 coloring