Ultra-Skalierbare Multiphysiksimulationen für ... · waLBerla Framework • widely applicable...
Transcript of Ultra-Skalierbare Multiphysiksimulationen für ... · waLBerla Framework • widely applicable...
Ultra-Skalierbare Multiphysiksimulationen für
Erstarrungsprozesse in Metallen (SKAMPY)
HPC-Status-Konferenz der Gauß-Allianz, 29. November 2016, Hamburg
Harald Köstler, Bauer, Schornbaum, Godenschwager, Rüde, Hammer, Wellein, Hötzer, Nestler
Chair for System SimulationFriedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
2
SKAMPY Project
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
• waLBerla Framework
• SKAMPY Project
3
Outline
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
The waLBerla Framework
waLBerla Framework
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016 5
7
waLBerla Framework
• widely applicable Lattice-Boltzmann from Erlangen • HPC software framework, originally developed for CFD simulations with
Lattice Boltzmann Method (LBM) • evolved into general framework for algorithms on block-structured grids
• www.walberla.net
Vocal Fold Study(Florian Schornbaum)
Fluid Structure Interaction (Simon Bogner)
Free Surface Flow
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
8
Block-structured Grids
Complex geometry given by surface Add regular block partitioning
Discard empty blocks
Allocate block data
Load balancing
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
• Domain Decomposition & Distribution to Processes:• regular decomposition into blocks containing uniform grids
• grid refinement: octree-like decomposition
9
Block-structured Grids
In most cases, if a regular decomposition of a uniform
grid is used, exactly one block is assigned to each process.
forest of octrees:each block contains a uniform grid
of the same size→ 2:1 balance between
neighboring cells on level transitions
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
• Distributed Memory Parallelization: MPI• data exchange on borders between blocks via ghost layers
• support for overlapping communication and computation
• some advanced models require more complex communication patterns ( e.g. free-surface and fluid-structure interaction)
10
Hybrid Parallelization
receiverprocess
senderprocess
(slightly more complicated for non-uniform domain decompositions, but the same general ideas still apply)
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
SKAMPY ProjectApplication
Overview
Johannes Hötzer- Institute of Applied Materials – Computational Material Science, KIT, 2016 12
• ternary eutectic alloys
• directional solidification
• analytically moving
• temperature gradient
• massively parallel phase-field simulations
• large domain sizes
Application Setting
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016 13
Overview
Johannes Hötzer- Institute of Applied Materials – Computational Material Science, KIT, 2016 14
Phase-field model
Johannes Hötzer- Institute of Applied Materials – Computational Material Science, KIT, 2016 15
16
Microstructure prediction Al-Ag-Cu
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
Pattern features in a Al-Ag-Cu
Johannes Hötzer- Institute of Applied Materials – Computational Material Science, KIT, 2016 17
Spiral growth in ternary systems
Johannes Hötzer- Institute of Applied Materials – Computational Material Science, KIT, 2016 18
SKAMPY ProjectPerformance Engineering
20
Work packages
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
21
Single Node Tuning
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
80 x faster compared to original version
22
Intranode Scaling
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
intranode weak scaling on SuperMUC
23
Single Node Optimization Summary
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
𝜙-Sweep 21 %
μ-Sweep 27 %
Complete Program 25%
Single Node Optimizations
• replace/remove expensive operations like square roots and divisions
• pre-compute and buffer values where possible
• SIMD intrinsics
Percent Peak on SuperMUC
Why not 100% Peak?
• unbalanced number of multiplications and addition
• divisions counted as 1 FLOP but they cost 43 times as much as a multiplication or addition
24
Scaling
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
• scaling on SuperMUC up to 32,768 cores
• ghost layer based communication
• communication hiding
25
Execution-Cache-Memory Model
Julian Hammer, Georg Hager, Gerhard Wellein – RRZE HPC group, FAU Erlangen-Nürnberg, 2016
• Automatic Layer Conditions Model
26
Execution-Cache-Memory Model
Julian Hammer, Georg Hager, Gerhard Wellein – RRZE HPC group, FAU Erlangen-Nürnberg, 2016
SKAMPY ProjectSoftware Engineering andParallelization
• Written in C++ with Python extensions
• Hybridly parallelized (MPI + OpenMP)
• No data structures growing with number of processes involved
• Scales from laptop to recent petascale machines
• Parallel I/O
• Portable (Compiler/OS)
• Automated tests / CI servers
• Open Source release planned
28
Continuous Integration
llvm/clang
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
29
Continuous Integration
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
30
Outlook: Selected Work packages I
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
SA2: Steering & Prototyping Interface
• based on Python
SA3: Adaptivity and Dynamic Load Balancing
31
Outlook: Selected Work packages II
Harald Köstler - Chair for System Simulation, FAU Erlangen-Nürnberg, 2016
MC1: Many-Core Data Structures
MC2: Communication Optimization and Multi-scale Data Reduction
• Communication strategies for heterogeous systems• Problem specific data compression
MC3: Asynchronous Execution and Resilience
Acknowledgements
• Funded by• Bundesministerium für Bildung und Forschung
• KONWIHR. Bavarian project
• DFG SPP 1648/1 – Software for Exascale computing
• Industry
• Supercomputing centers
http://www.exastencils.org/
43
Thank you!
Questions?