Transcript of Scalable System for Large Unstructured Mesh Simulation Miguel A. Pasenau, Pooyan Dadvand, Jordi...
- Slide 1
- Scalable System for Large Unstructured Mesh Simulation Miguel
A. Pasenau, Pooyan Dadvand, Jordi Cotela, Abel Coll and Eugenio
Oate
- Slide 2
- 29th Nov 2012 / 2 Overview Introduction Preparation and
Simulation More Efficient Partitioning Parallel Element Splitting
Post Processing Results Cache Merging Many Partitions Memory usage
Off-screen mode Conclusions, Future lines Acknowledgements
- Slide 3
- 29th Nov 2012 / 3 Overview Introduction Preparation and
Simulation More Efficient Partitioning Parallel Element Splitting
Post Processing Results Cache Merging Many Partitions Memory usage
Off-screen mode Conclusions, Future lines Acknowledgements
- Slide 4
- 29th Nov 2012 / 4 Introduction Education: Masters in Numerical
Methods, trainings, seminars, etc. Publishers: magazines, books,
etc. Research: PhDs, congresses, projects, etc. One of the
International Centers of Excellence on Simulation-Based Engineering
and Sciences [Glotzer et al., WTEC Panel Report on International
Assessment of Research and Development in Simulation Based
Engineering and Science. World Technology Evaluation Center
(wtec.org), 2009].
- Slide 5
- 29th Nov 2012 / 5 Introduction Simulation: structures
- Slide 6
- 29th Nov 2012 / 6 Introduction CFD: Computer Fluid
Dynamics
- Slide 7
- 29th Nov 2012 / 7 Introduction Geomechanics Industrial forming
processes Electromagnetism Acoustics Bio-medical engineering
Coupled problems Earth sciences
- Slide 8
- 29th Nov 2012 / 8 Introduction Simulation Preparation of
analysis data Visualization of results GiD Geometry description
Provided by CAD or using GiD Computer Analysis
- Slide 9
- 29th Nov 2012 / 9 Introduction Analysis Data generation Read in
and correct CAD data Assignment of boundary conditions Definitions
of analysis parameters Generation of analysis data Assignment of
material properties, etc.
- Slide 10
- 29th Nov 2012 / 10 Introduction Visualization of Numerical
Results Deformed shapes, temperature distributions, pressures, etc.
Vector, contour plots, graphs, Line diagrams, results surfaces
Animated sequences Particle line flow diagrams
- Slide 11
- 29th Nov 2012 / 11
- Slide 12
- 29th Nov 2012 / 12 Introduction Goal: do a CFD simulation with
100 Million elements using in-house tools Hardware: cluster with
Master node: 2 x Intel Quad Core E5410, 32 GB RAM 3 TB disc with
dedicated Gigabit to Master node 10 nodes: 2 x Intel Quad Core
E5410 and 16 GB RAM 2 nodes: 2 x AMD Opteron Quad Core 2356 and 32
GB Total of 96 cores, 224 GB RAM available Infiniband 4x DDR, 20
Gbps
- Slide 13
- 29th Nov 2012 / 13 Introduction Airflow around a F1 car
model
- Slide 14
- 29th Nov 2012 / 14 Introduction Kratos: Multi-physics, open
source framework Parallelized for shared and distributed memory
machines GiD: Geometry handling and data management First coarse
mesh Merging and post-processing results
- Slide 15
- 29th Nov 2012 / 15 Introduction Geometry Conditions Materials
Coarse mesh generation Partition Distribution Communication plan
part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n
Merge Visualize
- Slide 16
- 29th Nov 2012 / 16 Overview Introduction Preparation and
Simulation More Efficient Partitioning Parallel Element Splitting
Post Processing Results Cache Merging Many Partitions Memory usage
Off-screen mode Conclusions, Future lines and Acknowledgements
- Slide 17
- 29th Nov 2012 / 17 Preparation and simulation Geometry
Conditions Materials Coarse mesh generation Partition Distribution
Communication plan part 1 part 2 Refinement Calculation part n res.
1 res. 2 res. n Merge Visualize
- Slide 18
- 29th Nov 2012 / 18 Meshing Single workstation: limited memory
and time Three steps: Single node: GiD generates a coarse mesh with
13 Million tetrahedrons Single node: Kratos + Metis divide and
distribute In parallel: Kratos refines the mesh locally
- Slide 19
- 29th Nov 2012 / 19 Preparation and simulation Geometry
Conditions Materials Coarse mesh generation Partition Distribution
Communication plan part 1 part 2 Refinement Calculation part n res.
1 res. 2 res. n Merge Visualize
- Slide 20
- 29th Nov 2012 / 20 Rank0 read the model, partitions it and send
the partitions to the other ranks Rank 0Rank 1 Rank 2Rank 3
Efficient partitioning: before
- Slide 21
- 29th Nov 2012 / 21 Rank0 read the model, partitions it and send
the partitions to the other ranks Rank 0Rank 1 Rank 2Rank 3
Efficient partitioning: before
- Slide 22
- 29th Nov 2012 / 22 Requires large memory in node 0 Using the
cluster time for partitioning which can be done outside Each rerun
need repartitioning Same working procedure for OpenMP and MPI run
Efficient partitioning: before
- Slide 23
- 29th Nov 2012 / 23 Dividing and writing the partitions in
another machine Reading data of each rank separately Efficient
partitioning: now
- Slide 24
- 29th Nov 2012 / 24 Preparation and simulation Geometry
Conditions Materials Coarse mesh generation Partition Distribution
Communication plan part 1 part 2 Refinement Calculation part n res.
1 res. 2 res. n Merge Visualize
- Slide 25
- 29th Nov 2012 / 25 Local refinement: triangle k i j l m n i l j
m k n 1 3 4 2 k k i j l i l j 1 2 i j m k 1 2 k i j l m i l j m k 1
3 2 i l j m k 1 3 2
- Slide 26
- 29th Nov 2012 / 26 Local refinement: triangle Selecting the
case respecting nodes Id The decision is not for best quality! It
is very good for parallelization OpenMP MPI k i j l m i l j m k 1 3
2 i l j m k 1 3 2
- Slide 27
- 29th Nov 2012 / 27 Local refinement: tetrahedron Father Element
Child Elements
- Slide 28
- 29th Nov 2012 / 28 Local refinement: examples
- Slide 29
- 29th Nov 2012 / 29 Local refinement: examples
- Slide 30
- 29th Nov 2012 / 30 Local refinement: examples
- Slide 31
- 29th Nov 2012 / 31 Local refinement: uniform A Uniform
refinement can be used to obtain a mesh with 8 times more elements
Does not improve the geometry representation
- Slide 32
- 29th Nov 2012 / 32 Introduction Geometry Conditions Materials
Coarse mesh generation Partition Distribution Communication plan
part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n
Merge Visualize
- Slide 33
- 29th Nov 2012 / 33 Parallel calculation Calculated using 12 x 8
MPI processes Less than 1 day for 400 time steps About 180 GB
memory usage Single volume mesh of 103 Million tetrahedrons split
into 96 files ( mesh portion and its results)
- Slide 34
- 29th Nov 2012 / 34 Overview Introduction Preparation and
Simulation More Efficient Partitioning Parallel Element Splitting
Post Processing Results Cache Merging Many Partitions Memory usage
Off-screen mode Conclusions, Future lines and Acknowledgements
- Slide 35
- 29th Nov 2012 / 35 Post processing Geometry Conditions
Materials Coarse mesh generation Partition Distribution
Communication plan part 1 part 2 Refinement Calculation part n res.
1 res. 2 res. n Merge Visualize
- Slide 36
- 29th Nov 2012 / 36 Post-process Challenges to face: Single node
Big files: tens or hundreds of GB Merging: Lots of files Batch
post-processing Maintain generality
- Slide 37
- 29th Nov 2012 / 37 Big Files: results cache Uses a defined
memory pool to store results. Used to cache results stored in
files. Mesh information Created Results: cuts, extrusions, tcl
Temporal results User definable Memory pool Results from files:
single, multiple, merge
- Slide 38
- 29th Nov 2012 / 38 Big Files: results cache Results cache table
RC entry timestamp RC entry timestamp RC entry timestamp Result RC
info RC Info file 1offsettype file 2offsettype file noffsettype
memory footprint Open files table filehandletype filehandletype
filehandletype Result RC info Result RC info Granularity of
result
- Slide 39
- 29th Nov 2012 / 39 Big Files: results cache Verifies results
file(s) and gets results position in file and memory footprint.
Results of latest analysis step in memory. Loaded on demand. Oldest
results unloaded if needed. Touch on use.
- Slide 40
- 29th Nov 2012 / 40 Big Files: results cache Chinese harbour:
104 GB results file 7,6 Million tetrahedrons 2.292 time steps 3,16
GB memory usage ( 2 GB results cache)
- Slide 41
- 29th Nov 2012 / 41 Big Files: results cache Chinese harbour:
104 GB results file 7,6 Million tetrahedrons 2.292 time steps 3,16
GB memory usage ( 2 GB results cache)
- Slide 42
- 29th Nov 2012 / 42 Merging many partitions Before: 2, 4,... 10
partitions Now: 32, 64, 128,... of a single volume mesh Postpone
any calculation: Skin extraction Finding boundary edges Smoothed
normals Neighbour information Graphical objects creation
- Slide 43
- 29th Nov 2012 / 43 Merging many partitions Telescope example
23,870,544 tetrahedrons Before32 partitions24 10 After32
partitions4 34 128 partitions10 43 Single file2 16
- Slide 44
- 29th Nov 2012 / 44 Merging many partitions
- Slide 45
- 29th Nov 2012 / 45 Merging many partitions Racing car example
103,671,344 tetrahedrons Before96 partitions> 5 hours After96
partitions51 21 Single file13 25
- Slide 46
- 29th Nov 2012 / 46 Memory usage Around 12 GB of memory used
with a spike of 15 GB ( MS Windows) 17,5 GB ( Linux), including:
Volume mesh ( 103 Mtetras) Skin mesh ( 6 Mtriangs) Several surface
and cut meshes Stream line search tree 2 GB of results cache
Animations
- Slide 47
- 29th Nov 2012 / 47 Pictures
- Slide 48
- 29th Nov 2012 / 48 Pictures
- Slide 49
- 29th Nov 2012 / 49 Pictures
- Slide 50
- 29th Nov 2012 / 50 Batch post-processing: off-screen GiD with
no interaction and no window Command line: gid -offscreen [ WxH]
-b+g batch_file_to_run Useful to: launch costly animations in bg or
in queue use gid as template generator use gid behind a web server:
Flash Video animation Animation window: added button to generate
batch file for offscreen-gid to be sent to a batch queue.
- Slide 51
- 29th Nov 2012 / 51 Animation
- Slide 52
- 29th Nov 2012 / 52 Overview Introduction Preparation and
Simulation More Efficient Partitioning Parallel Element Splitting
Post Processing Results Cache Merging Many Partitions Memory usage
Off-screen mode Conclusions, Future lines and Acknowledgements
- Slide 53
- 29th Nov 2012 / 53 Conclusions The implemented improvements
helped us to achieve the milestone: Prepare, mesh, calculate and
visualize a CFD simulation with 103 Million tetrahedrons GiD: also
modest machines take profit of these improvements
- Slide 54
- 29th Nov 2012 / 54 Future lines Faster tree creation for stream
lines. Now: ~ 90 s. creation time, 2-3 s. per stream line Mesh
simplification, LOD geometry and results criteria Surface meshes,
iso-surfaces, cuts: faster drawing Volume meshes: faster cuts,
stream lines Near real-time Parallelize other algorithms in GiD:
Skin and boundary edges extraction Parallel cuts and stream lines
creation
- Slide 55
- 29th Nov 2012 / 55 Challenges 10 9 10 10 tetrahedrons, 610 8
610 9 triangles Large workstation with Infiniband to cluster and 80
GB or 800 GB RAM? Hard disk? Post process as backend of a web
server in cluster? Security issues? Post process embedded in
solver? Output of both: the original mesh and a simplified
one?
- Slide 56
- 29th Nov 2012 / 56 Acknowledgements Ministerio de Ciencia e
Innovacin, E-DAMS project European Commission, Real-time
project
- Slide 57
- 29th Nov 2012 / 57 Comments, questions...... ?
- Slide 58
- Thanks for your attention Scalable System for Large
Unstructured Mesh Simulation