A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March...

28
A Framework for Large-Scale, High-Performance Lattice Boltzmann Simulations in Complex Geometries C. Godenschwager , M. Bauer, F. Schornbaum, and U. Rüde Chair for System Simulation, FAU Erlangen-Nürnberg SIAM PP18, Tokyo, Japan March 8, 2018

Transcript of A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March...

Page 1: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

A Framework for Large-Scale, High-Performance Lattice

Boltzmann Simulations in Complex Geometries

C. Godenschwager, M. Bauer, F. Schornbaum, and U. Rüde

Chair for System Simulation, FAU Erlangen-Nürnberg

SIAM PP18, Tokyo, Japan

March 8, 2018

Page 2: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Outline

• The waLBerla framework

• Simulation setup & results

• Scaling experiments

• Summary & Outlook

March 8, 2018 2

Page 3: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 3A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

The waLBerla framework

• written in C++14

• main focus on CFD simulations based on the lattice Boltzmann

method (LBM)…

• …but generally suitable for all kinds of numeric codes working with

uniform domain decompositions (Multigrid solvers, phase field method, physics engine)

• at its very core designed as an HPC software framework:

• scales from laptops to current petascale supercomputers

• largest simulation: 1,835,008 processes (IBM Blue Gene/Q @ Jülich)

• hybrid parallelization: MPI + OpenMP

• vectorization of compute kernels

• Many modules are available open source→ http://www.walberla.net

Page 4: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 4A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Vocal Fold Study(Florian Schornbaum)

Fluid Structure Interaction (Simon Bogner)

Free Surface Flow(Martin Bauer)

Rigid Body Dynamics(Sebastian Eibl, Christian Godenschwager)

Electron Beam Melting(Matthias Markl, Regina Ammer)

Phase Field Simulations(Martin Bauer, Johannes Hötzer)

Page 5: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 5A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

LBMhydrodynamics

pe

rigid body dynamics

Static grid refinement

Dynamic grid refinement

Multigrid methods

Free surface flows

Fluid-solid interaction

Phase field models

Thermal LBMComplex geometry handling

waLBerla building blocks

ElectrokineticsCompute kernel

generation

Page 6: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

The lattice Boltzmann method (TRT)

• Meaning: relax the PDFs 𝑓𝑞 linearly towards their equilibrium values

• TRT: 𝑓𝑞 are split into their symmetric (+) and anti-symmetric (-) parts

• 𝜆+: determines fluid viscosity

• 𝜆−: improves boundary accuracy & stability

𝑓𝑞 Ԧ𝑥 + Ԧ𝑐𝑞, 𝑡 + 1 = 𝑓𝑞 Ԧ𝑥, 𝑡 − 𝜆+ 𝑓𝑞𝑒𝑞,+

𝜌, 𝑢 − 𝑓𝑞+ Ԧ𝑥, 𝑡

−𝜆− 𝑓𝑞𝑒𝑞,−

𝜌, 𝑢 − 𝑓𝑞− Ԧ𝑥, 𝑡

density: 𝜌 = σ𝑞 𝑓𝑞momentum: 𝜌𝑢 = σ𝑞 𝑓𝑞 Ԧ𝑐𝑞

March 8, 2018 6

Page 7: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

LBM in waLBerla

March 8, 2018 7

Godenschwager et al. - A framework for hybrid parallel flow

simulations with a trillion cells in complex geometries, 2013

• Oldest module of the waLBerla framework

• Highly optimized compute kernels

• Scalability: over 1 trillion (1012) lattice cells

on 1,835,008 processes

Page 8: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 8A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

geometry given by surface mesh

allocation of block data (→ grids)

domain partitioning into blocks

load balancing empty blocks are discarded

Page 9: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 9A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

allocation of block data (→ grids)

geometry given by surface mesh domain partitioning into blocks

load balancing empty blocks are discarded

DISK

separation ofdomain

partitioningfrom simulation

file size: kilobytes to few megabytes

DISK

Page 10: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 10A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

allocation of block data (→ grids)

geometry given by surface mesh domain partitioning into blocks

load balancing empty blocks are discarded

DISK

separation ofdomain

partitioningfrom simulation

file size: kilobytes to few megabytes

DISK

More on Octree based setup,

static and dynamic refinement

tomorrow 2:30 PM in MS91

„Extreme-Scale Block-Structured Adaptive

Mesh Refinement“ by Florian Schornbaum

Page 11: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 11A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

143 → 649

183 → 413

233 → 277

293 → 201

373 → 149

333 → 154

313 → 184

303 → 190

303 → 190

block size → #blocksdx = 0.2mm target: ≤ 200 blocks

Page 12: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 12A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Domain partitioning of coronary tree dataset one block per process

512 processes

485 blocks

458,752 processes

458,184 blocks

Page 13: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

Surface Meshes

Page 14: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 14A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Page 15: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 15A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Page 16: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 16A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Page 17: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

Scaling experiments

Page 18: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 18A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

JUQUEEN SuperMUC (Phase 1)

Forschungszentrum Jülich, Germany LRZ, Garching (Munich), Germany

IBM system IBM system

Blue Gene/Q Intel Sandy Bridge-EP

28,672 nodes 9,216 nodes

458,752 cores 147,456 cores

5.9 Petaflops peak 3.2 Petaflops peak

448 TB main memory 288 TB main memory

5D Torus Network Non-blocking tree / 4:1 pruned tree

Page 19: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 19A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Weak scaling experiments

Page 20: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 20A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Weak scaling experiments

Page 21: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 24A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Strong scaling experiments

Page 22: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 25A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Strong scaling experiments

Page 23: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 26A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Strong scaling JUQUEEN

Page 24: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 27A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Strong scaling SuperMUC

Page 25: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

Automatic generation of LBM compute kernels

(Martin Bauer)

Page 26: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 29A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Models / Features Hardware / Optimization

too many combinations to provide handcrafted compute kernels for

code generation

• GPU/CUDA support• (manual) or guided vectorization (AVX2,

AVX512, QPX)• inner loop splitting to improve prefetching due

to lower number of load/store streams• sparse (list-based) kernels for domains with

many boundary cells• data layout: simple two grid stream-collide,

AABB pattern, EsoTwist

• stencils• moment-based methods (MRT)

• efficient SRT and TRT implementations• moment basis construction• various equilibria• forcing approaches

• different collision space: cumulant method• entropic stabilization• locally varying relaxation rates e.g. to include

turbulence models• coupling of multiple kernels

Page 27: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

March 8, 2018 30A Framework for Large-Scale, High-Performance LB Simulations in Complex Geometries - C. Godenschwager et al.

Page 28: A Framework for Large-Scale, High-Performance Lattice ......Mar 08, 2018  · LBM in waLBerla March 8, 2018 7 Godenschwager et al. - A framework for hybrid parallel flow simulations

Thank you!

A special thank you to the LRZ in Garching and the JSC in Jülich for the

compute time and their friendly support!