A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG...

25
A Scalable Algebraic Multigrid Implementation for Non-Linear Elasticity Problems Aurel Neic Institute of Biomechanics Medical University Graz September 17, 2014

Transcript of A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG...

Page 1: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

A Scalable Algebraic Multigrid Implementation forNon-Linear Elasticity Problems

Aurel Neic

Institute of Biomechanics

Medical University Graz

September 17, 2014

Page 2: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Outline

AMG for elasticity problemsIntroductionAdaptation

Parallel ScalingRedistribution of AMG HierarchyHybrid MPI/OpenMP Parallelization

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 2 / 25

Page 3: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

AMG for elasticity problems Introduction

AMG for elasticity problems

Introduction

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 3 / 25

Page 4: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

AMG for elasticity problems Introduction

Introduction

I Solve K u = f using a hierarchy of equation systems

K ` u` = f ` , ` = 0, . . . , L

I Hierarchy is constructed from algebraic operator K

K `+1 = R`K `P`

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 4 / 25

Page 5: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

AMG for elasticity problems Introduction

Exact Two-Grid Method

I Block System [KCC KCF

KFC KFF

] [uCuF

]=

[fCfF

]I Schur decomposition[

KCC KCF

KFC KFF

]=

[I KCFK

−1FF

I

] [SC

KFF

] [I

K−1FF KFC I

]SC = KCC − KCF K−1

FF KFC

I Exact two grid method

P :=

(I

−K−1FF KFC

), R :=

(I −KCFK

−1FF

), G :=

(0 00 K−1

FF

)SC = RKP, M = (I − GK)(I − P S−1

C R K) = 0

uk+1 := (I −M)K−1f + Muk , k ∈ N

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 5 / 25

Page 6: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

AMG for elasticity problems Introduction

Introduction

I Strong connection criterion

|K [i , j ] | > ε |K [i , i ] |

I Construction of Prolongation operator

P :=

(ICCPFC

)

ni := | {j ∈ C : K [i , j ] > ε K [i , i ]} |

PFC [i , j ] :=

{1/ni if |K [i , j ] | > ε |K [i , i ] |0 otherwise

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 6 / 25

Page 7: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

AMG for elasticity problems Adaptation

AMG for elasticity problems

Adaptation

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 7 / 25

Page 8: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

AMG for elasticity problems Adaptation

Adaptation of AMG for Elasticity Problems

I Extension of the available AMG algorithm.

I Auxiliary matrix for coarsening based on ‖K 3×3i,j ‖ matrix norms

I Extended prolongation operator based on auxiliary matirx

I Future adaptations: Vanka smoothers, K-cycle AMG

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 8 / 25

Page 9: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Redistribution of AMG Hierarchy

Parallel Scaling

Redistribution of AMG Hierarchy

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 9 / 25

Page 10: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Redistribution of AMG Hierarchy

Introduction

I Multigrid methods efficient as solvers and preconditioners:Stable convergence w.r.t. grid refinement, spatial resolution

I Commonly used for elliptic PDEs.

I With the rise of multicore computing architectures:Concern about parallel scalability.Baker, Allison H. and Schulz, Martin and Yang, Ulrike M., On the performance of analgebraic multigrid solver on multicore clusters

I Main bottleneck is communication on intermediate grids

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 10 / 25

Page 11: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Redistribution of AMG Hierarchy

Scaling Problems

Compare AMG-PCG with Jacobi-PCG as a limit case.

0.01

0.1

1

10

16 32 64 96 128 256

Tim

e [sec]

Number of Processes

AMG-PCGJacobi-PCG

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 11 / 25

Page 12: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Redistribution of AMG Hierarchy

Scaling Problems

I Fine-GridI Big subdomains, interfaces, message sizesI Sparse communication patternI → Point-To-Point type

I Coarse-GridI Small subdomains, interfaces, message sizesI Few processes, accumulate interface valuesI → Scatter/Gather type

I Intermediate GridsI Small subdomains, interfaces, message sizesI Still most processes, accumulate interface valuesI → All-To-All type

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 12 / 25

Page 13: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Redistribution of AMG Hierarchy

Redistribution of MG hierarchy

I Common solution: Reduce parallelismI Redistribute parallel data on fewer processes

I Open questions:I When to redistribute: Once, multiple times, systematic?I Redistribution pattern: Based an which criteria?

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 13 / 25

Page 14: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Redistribution of AMG Hierarchy

Proposed Method

I Systematic reduction of processes, rate is configurable.

I Redistribution in block-pattern.

Reasoning:I Systematic:

I Logarithmic reduction of processes, reduction of unknowns should be much higherI Robust configuration, doesn’t need fine tuning.

I Block-Pattern:I Should match hardware layoutI Intermediate levels redistributed in local memory.

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 14 / 25

Page 15: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Redistribution of AMG Hierarchy

Block Pattern

Figure: Redistribution pattern for a blocksize of 2. Red: Active Ranks, Blue: Idle Ranks

I Can be computed based on process rank if the assignment to computing units wascontinuous.

I In case of 16 processes per compute node, the first 4 redistributions are in sharedmemory!

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 15 / 25

Page 16: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Redistribution of AMG Hierarchy

AMG Setup

1. Coarsening1.1 Coarse-grid selection1.2 Generation of

Prolongation/Restriction1.3 Triple-matrix-product

2. Redistribution2.1 Active-idle selection2.2 Data transfer2.3 Matrix accumulation

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 16 / 25

Page 17: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Redistribution of AMG Hierarchy

Adapted V-Cycle

Require: K`p , f`p , u

`p , r`p ,P

`p , J

`p ,G , a ∈ G ,Ap, activep ; ` = 1, . . . , L

if ` < L thenif activep [`] = true then

u`p ← 0

u`p ← J`

p (u`p , f`p) {Pre-smooth}

r`p ← f`p − K`p u

`p {Compute residual}

f`+1p ← (P`

p )T r`p {Restrict residual}f`+1a ←

∑i∈G

AaATi f`+1

i {Gather}

end ifmultigrid(` + 1) {Recursion}if activep [`] = true then

u`+1i ← Ai A

Ta u

`+1a i ∈ G {Scatter}

c`p ← P`p u`+1

p {Prolongate correction}u`p ← u`

p + c`p {Add correction}u`p ← J`

p (u`p , f`p) {Post-smooth}

end ifelse

if activep [L] = true then

uLp ← (KL

p)−1 fLp {Direct solve}end if

end if

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 17 / 25

Page 18: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Redistribution of AMG Hierarchy

Benchmarks

0.01

0.1

1

10

16 32 64 96 128 256

Tim

e [sec]

Number of Processes

Block-AMG-PCGDefault-AMG-PCG

Jacobi-PCG

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 18 / 25

Page 19: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Redistribution of AMG Hierarchy

Benchmarks

0.1

1

10

64 128 256 528 1024 2048

Tim

e [sec]

Number of Processes

Block-AMG g=2Block-AMG g=4

AMGJacobi

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 19 / 25

Page 20: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Hybrid MPI/OpenMP Parallelization

Parallel Scaling

Hybrid MPI/OpenMP Parallelization

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 20 / 25

Page 21: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Hybrid MPI/OpenMP Parallelization

Motivation

I Increase the number of OpenMP threads per MPI processI SuperMuc Thin Node: 16 cores, 32 GB RAMI SuperMuc Fat Node: 40 core, 256 GB RAM

I Pin OpenMP threads to computing tasksI Desktop: OpenMPI rankfilesI Server: OpenMPI rankfiles, Custom modules

I Advantage: More computing tasks with less communication

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 21 / 25

Page 22: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Hybrid MPI/OpenMP Parallelization

Motivation

Matrix-Vector benchmarks:

runtime speedup

# cores OMP MPI OMP MPI

1 4.579 4.336 1.00 1.002 2.306 2.161 1.99 2.014 1.347 1.123 3.40 3.866 1.133 1.009 4.04 4.3012 0.565 0.378 8.10 11.48

OpenMP kernels show reduced computational efficiency

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 22 / 25

Page 23: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Hybrid MPI/OpenMP Parallelization

Implementation

I OpenMP parallelization only for computational kernels

I Independent of MPI parallelizationI Scheduler of choice: guided

I Background load should be assumed → dynamic scheduling usually betterI Optimal chunksize unknown or varying → guided scheduling decreases the chunksize

linearly to a defined minimum size

I Optimal number of threads per process is hardware and problem dependent

I Parallelization across multiple CPUs disadvantageous

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 23 / 25

Page 24: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Hybrid MPI/OpenMP Parallelization

Benchmarks

Reduced number of MPI processes is advantageous for a large number of threads.

0.1

1

64 128 256 528 1024 2048

Tim

e [sec]

# Cores

HYB8HYB4

MPI

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 24 / 25

Page 25: A Scalable Algebraic Multigrid Implementation for Non ... · Parallel Scaling Redistribution of AMG Hierarchy Introduction I Multigrid methods e cient as solvers and preconditioners:

Parallel Scaling Hybrid MPI/OpenMP Parallelization

Thank you!

The research was funded by the Austrian Science Fund (FWF)

Aurel Neic (Medical University Graz) AMG for Non-Linear Elasticity Problems September 17, 2014 25 / 25