OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna,...
Transcript of OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna,...
![Page 1: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/1.jpg)
www.cineca.it
OpenFOAM on BG/Q porting and performance
Paride Dagna, SCAI Department, CINECA
![Page 2: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/2.jpg)
www.cineca.it
OpenFOAM : selected application inside of PRACE project
Fermi : PRACE Tier-0 System
Model: IBM-BlueGene /Q
Architecture: 10 BGQ Frame with 2 MidPlanes each
Front-end Nodes OS: Red-Hat EL 6.2
Compute Node Kernel: lightweight Linux-like kernel
Processor Type: IBM PowerA2, 16 cores, 1.6 GHz
Computing Nodes: 10.240
Computing Cores: 163.840
RAM: 16GB / node
Internal Network: Network interface
with 11 links ->5D Torus
Disk Space: more than 2PB of scratch space
Peak Performance: 2.1 PFlop/s
SYSTEM OVERVIEW
![Page 3: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/3.jpg)
www.cineca.it
Single Chip Module
Compute card: One chip module,
16 GB DDR3 Memory
SYSTEM OVERVIEW
Compute node (back-end): • each compute node comprise 17 cores on a single chip
with16 GB of dedicated physical memory
• Applications run on 16 of the cores with the 17th core reserved for system software.
• Nearly the full 16 GB of physical memory is dedicated to application usage.
• On each core it’s possible to run up to 4 processes/threads for a total of 64 processes/threads per node
Applications : • Applications are submitted to the compute nodes by the
batch scheduler system • To run on the compute nodes (back-end), applications
must be cross-compiled
![Page 4: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/4.jpg)
www.cineca.it
Porting of OpenFOAM on BG/Q
Compiling OpenFOAM for the back-end nodes on BG/Q requires some system specific changes to the configuration scripts of OpenFOAM and Third-party package
It’s not possible to use Third-party MPI, rules for BG/Q MPI must be inserted
Environment configuration:
• Configure environment with compilers and zlib using modules
module load bgq-gnu
module load zlib
OpenFOAM configuration scripts and rules:
• Files “bashrc” and “settings.sh” must be changed inserting the rules for BG/Q MPI
• Files c/c++ in wmake/rules folders must be modified for dynamic linking
Scotch library build
• Before running “Allwmake” in the OpenFOAM main folder some changes need to be made to the compiling and dynamic linking rules in the file “Makefile .inc” contained in the scotch library.
• Cross-compile and execute on the back-end the “dummysizes” scotch utility to build properly the header files scotch.h and scotchf.h
Compile
• Go in $WM_PROJECT/$WM_PROJECT_VERSION and compile with ./Allwmake
![Page 5: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/5.jpg)
www.cineca.it
Performance of OpenFOAM on BG/Q
Test cases Cavity 3D
Isothermal Incompressible Flow
Solver : icoFoam
BoxTurb 3D Omogeneus Isotropic Turbulence on compressible flow
Solver : sonicFoam
Airfoil – wing section External aerodynamic
Solver : simpleFoam
Dtmb hull Marine hydrodynamics
Solver : interFoam
![Page 6: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/6.jpg)
www.cineca.it
Performance of OpenFOAM on BG/Q
Systems
Model: IBM-BlueGene /Q (Fermi)
Processor Type: IBM PowerA2, 1.6 GHz
Computing Node: 16 cores
RAM: 16GB / node; 1GB/core
Internal Network: Network interface
with 11 links ->5D Torus
Model: Hewlett Packard C7000 (Lagrange)
Processor Type: Intel, Xeon Westmere,
2.8 GHz
Computing Node: 12 cores
RAM: 24GB / node; 2GB/core
Internal Network: Infiniband QDR/DDR Voltaire, Fat Tree
![Page 7: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/7.jpg)
www.cineca.it
Cavity – 3D
Flow : laminar, isothermal, incompressible
Mesh : fully structured 3D
Mesh elements : cubes
Elements 10.000.000
Scotch Simple
icoFoam
Elements 20.000.000
Scotch Simple
icoFoam
![Page 8: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/8.jpg)
www.cineca.it
Cavity – 3D Speed up and Efficiency
Mesh :10.000.000
Solution saved at final time step
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024 2048 4096
Effi
cie
ncy
# cores
Partition method - simple
Fermi Lagrange Ideal
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024 2048 4096
Effi
cie
ncy
# cores
Partition method - scotch
Fermi Lagrange Ideal
0500
10001500200025003000350040004500
64 128 256 512 1024 2048 4096
Spe
ed
up
# cores
Partition method - scotch
Fermi Lagrange Ideal
0500
10001500200025003000350040004500
64 128 256 512 1024 2048 4096
Spe
ed
up
# cores
Partition method - simple
Fermi Lagrange Ideal
![Page 9: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/9.jpg)
www.cineca.it
Cavity – 3D Speed up and Efficiency
Mesh :10.000.000
Solution saved every 10 time steps
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024
Effi
cie
ncy
# cores
Partition method - simple
Fermi Lagrange Ideal
0
200
400
600
800
1000
1200
64 128 256 512 1024
Spe
ed
up
# cores
Partition method - simple
Fermi Lagrange Ideal
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024
Effi
cie
ncy
# cores
Partition method - scotch
Fermi Lagrange Ideal
0
200
400
600
800
1000
1200
64 128 256 512 1024
Spe
ed
up
# cores
Partition method - scotch
Fermi Lagrange Ideal
![Page 10: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/10.jpg)
www.cineca.it
Cavity – 3D – Profiling
0%
50%
100%
150%
200%
250%
64 128 256 512 1024
Incr
em
en
t %
# cores
I/O overhead on simulation time
Fermi Lagrange
# Cores Cumulative I/O
(GB)
Files Size per core
(MB)
64 13,0 5,10
128 14,0 2,50
256 14,0 1,33
512 15,0 0,75
1024 22,0 0,40
Number of iterations : 100
Files per core : 3
MPI_Allreduce average message size per core (B) : 8 -- #cores 1024
Average message size sent and received per core (KB) : 4,6 -- #cores 1024
MPI and I/O profiling : 512 cores
MPI and I/O profiling : 1024 cores
![Page 11: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/11.jpg)
www.cineca.it
Cavity – 3D Speed up and efficiency
Mesh :20.000.000
Solution saved at final time step
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024 2048 4096
Effi
cie
ncy
# cores
Partition method - simple
Fermi Lagrange Ideal
0500
10001500200025003000350040004500
64 128 256 512 1024 2048 4096
Spe
ed
up
# cores
Partition method - simple
Fermi Lagrange Ideal
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024 2048 4096
Effi
cie
ncy
# cores
Partition method - scotch
Fermi Lagrange Ideal
0500
10001500200025003000350040004500
64 128 256 512 1024 2048 4096
Spe
ed
up
# cores
Partition method - scotch
Fermi Lagrange Ideal
![Page 12: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/12.jpg)
www.cineca.it
Cavity – 3D Speed up and efficiency
Mesh :20.000.000
Solution saved every 10 time steps
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024
Effi
cie
ncy
# cores
Partition method - simple
Fermi Lagrange Ideal
0
200
400
600
800
1000
1200
64 128 256 512 1024
Spe
ed
up
# cores
Partition method - simple
Fermi Lagrange Ideal
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024
Effi
cie
ncy
# cores
Partition method - scotch
Fermi Lagrange Ideal
0
200
400
600
800
1000
1200
64 128 256 512 1024
Spe
ed
up
# cores
Partition method - scotch
Fermi Lagrange Ideal
![Page 13: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/13.jpg)
www.cineca.it
Cavity – 3D – Profiling
# Cores Cumulative I/O
(GB)
Files Size per core
(MB)
64 18,1 9,46
128 18,1 4,73
256 18,5 2,42
512 22,5 1,27
1024 23,1 0,63
MPI and I/O profiling : 512 cores
MPI and I/O profiling : 1204 cores
0%
50%
100%
150%
200%
250%
300%
64 128 256 512 1024
% In
cre
me
nt
# cores
I/O overhead on simulation time
Fermi Lagrange
Number of iterations : 100
Files per core : 3
MPI_Allreduce average message size per core (B) : 8 -- #cores 1024
Average message size sent and received per core (KB) : 6,4 -- #cores 1024
![Page 14: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/14.jpg)
www.cineca.it
BoxTurb – 3D
Flow : compressible
Case study : homogeneous, isotropic turbulence
Mesh : uniform 3D
Number of cells : ≈ 17.000.000
Solver : sonicFoam
Partition method : simple
Courtesy of : Matteo Cerminara (INGV), Pisa
![Page 15: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/15.jpg)
www.cineca.it
BoxTurb – 3D Speed up and
efficiency
Solution saved at the final time step
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024 2048
Effi
cie
ncy
# cores
Patition method - simple
Fermi Lagrange Ideal
0
500
1000
1500
2000
2500
64 128 256 512 1024 2048
Spe
ed
up
# cores
Partition method - simple
Fermi Lagrange Ideal
![Page 16: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/16.jpg)
www.cineca.it
BoxTurb – 3D Speed up and
efficiency
Solution saved every 10 time steps
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024
Effi
cie
ncy
# cores
Patition method - simple
Fermi Lagrange Ideal
0
200
400
600
800
1000
1200
64 128 256 512 1024
Spe
ed
up
# cores
Partition method - simple
Fermi Lagrange Ideal
![Page 17: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/17.jpg)
www.cineca.it
BoxTurb – 3D – Profiling
0%
20%
40%
60%
80%
100%
120%
140%
64 128 256 512 1024
Incr
em
en
t %
# cores
I/O overhead on simulation time
Fermi Lagrange
# Cores Cumulative I/O
(GB)
Files Size per core
(MB)
64 18,4 4,50
128 18,4 2,25
256 18,6 1,14
512 19,6 0,60
1024 21,2 0,32
MPI and I/O profiling : 512 cores
MPI and I/O profiling : 1024 cores
Number of iterations : 180
Files per core : 4
MPI_Allreduce average message size per core (B) : 8 -- #cores 1024
Average message size sent and received per core (KB) : 9,3 -- #cores 1024
![Page 18: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/18.jpg)
www.cineca.it
Airfoil – wing section
Flow : turbulent, incompressible
Case study : steady state, extruded NACA airfoil
Mesh : fully structured 3D
Number of cells : ≈ 9.000.000
Solver : simpleFoam
Method : simple - scotch
![Page 19: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/19.jpg)
www.cineca.it
Airfoil – wing section - Speed up
and efficiency
Solution saved at the final time step
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024
Effi
cie
ncy
# cores
Partition method - simple
Fermi Lagrange Ideal
0
200
400
600
800
1000
1200
64 128 256 512 1024
Spe
ed
up
# cores
Partition method - simple
Fermi Lagrange Ideal
0
200
400
600
800
1000
1200
64 128 256 512 1024
Spe
ed
up
# cores
Partition method - scotch
Fermi Lagrange Ideal
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024
Effi
cie
ncy
# cores
Partition method - scotch
Fermi Lagrange Ideal
![Page 20: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/20.jpg)
www.cineca.it
Airfoil – wing section – Profiling
MPI profiling – simple - 512 cores
MPI profiling – scotch - 512 cores
MPI profiling – simple - 512 cores
MPI profiling – scotch - 512 cores
![Page 21: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/21.jpg)
www.cineca.it
Airfoil – wing section - Speed up
and efficiency
Solution saved every 100 time steps
0
200
400
600
800
1000
1200
64 128 256 512 1024
Spe
ed
up
# cores
Partition method - simple
Fermi Lagrange Ideal
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024
Effi
cie
ncy
# cores
Partition method - simple
Fermi Lagrange Ideal
0
200
400
600
800
1000
1200
64 128 256 512 1024
Spe
ed
up
# cores
Partition method - scotch
Fermi Lagrange Ideal
0,00
0,20
0,40
0,60
0,80
1,00
1,20
64 128 256 512 1024
Effi
cie
ncy
# cores
Partition method - scotch
Fermi Lagrange Ideal
![Page 22: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/22.jpg)
www.cineca.it
Airfoil – wing section – Profiling
MPI and I/O profiling : 1024 cores
MPI and I/O profiling : 512 cores
# Cores Cumulative I/O
(GB)
Files Size per core
(MB)
64 5,6 1,46
128 5,8 0,76
256 6,6 0,43
512 7,9 0,26
1024 12,0 0,20
Number of iterations : 1000
Files per core : 6
MPI_Allreduce average message size per core (B) : 8 -- #cores 512
Average message size sent and received per core (KB) : 4,2 -- #cores 512
0%
10%
20%
30%
40%
50%
60%
70%
80%
64 128 256 512 1024
Spe
ed
up
# cores
Decomposition method - scotch
Fermi Lagrange
![Page 23: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/23.jpg)
www.cineca.it
Free surface - dtmb hull – 3D
Flow : turbulent, incompressible
Case study : unsteady, multiphase
Mesh : unstructured 3D
Number of cells : ≈ 5.500.000
Solver : interFoam
Method : simple - scotch
![Page 24: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/24.jpg)
www.cineca.it
Free surface - dtmb hull – 3D Speed
up and efficiency
Solution saved at the final time step
0,00
0,20
0,40
0,60
0,80
1,00
1,20
32 64 128 256 512
Effi
cie
ncy
# cores
Partition method - simple
Fermi Lagrange Ideal
0
100
200
300
400
500
600
32 64 128 256 512
Spe
ed
up
# cores
Partition method - simple
Fermi Lagrange Ideal
0
0,2
0,4
0,6
0,8
1
1,2
32 64 128 256 512
Effi
cie
ncy
# cores
Partition method - scotch
Fermi Lagrange Ideal
0
100
200
300
400
500
600
32 64 128 256 512
Spe
ed
up
# cores
Partition method - scotch
Fermi Lagrange Ideal
![Page 25: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/25.jpg)
www.cineca.it
Free surface, dtmb hull – 3D Speed
up and efficiency
Solution saved every 10 time steps
0
100
200
300
400
500
600
32 64 128 256 512
Spe
ed
up
# cores
Partition method - scotch
Fermi Lagrange Ideal
0
0,2
0,4
0,6
0,8
1
1,2
32 64 128 256 512
Effi
cie
ncy
# cores
Partition method - scotch
Fermi Lagrange Ideal
0
100
200
300
400
500
600
32 64 128 256 512
Spe
ed
up
# cores
Partition method - simple
Fermi Lagrange Ideal
0
0,2
0,4
0,6
0,8
1
1,2
32 64 128 256 512
Effi
cie
ncy
# cores
Partition method - simple
Fermi Lagrange Ideal
![Page 26: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/26.jpg)
www.cineca.it
Free surface - dtmb hull – 3D -
Profiling
# Cores Cumulative I/O
(GB)
Files Size per core
(MB)
64 18,4 4,50
128 18,4 2,25
256 18,6 1,14
512 19,6 0,60
Number of iterations : 100
Files per core : 8
MPI_Allreduce average message size per core (B) : 8 -- #cores 512
Average message size sent and received per core (KB) : 29,4 -- #cores 512
0%10%20%30%40%50%60%70%80%90%
100%
32 64 128 256 512
Incr
em
en
t %
# cores
I/O overhead on simulation time
Fermi
Lagrange
MPI and I/O profiling : 256 cores
MPI and I/O profiling : 512 cores
![Page 27: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/27.jpg)
www.cineca.it
Conclusions
OpenFOAM scaling and efficiency performance on Fermi and classic HPC systems are comparable but for well suited case studies with a good balancing between computation, I/O and MPI communications we could benefit from the larger amount of available cores on Fermi.
OpenFOAM efficiency and scaling are constrained by poor I/O design and intra-process communication
A new scheme of I/O based on MPI Parallel I/O routines or available parallel I/O libraries, able to use efficiently parallel file system facilities, should dramatically reduce I/O overhead
A multi-threaded hybrid MPI/OpenMP version of the solvers will indeed mitigate the time spent in MPI routines with the increase in the number of cores.
![Page 28: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/28.jpg)
www.cineca.it
Acknowledgements
Bob Danani VLSCI Carlton, Melbourne
Matteo Cerminara INGV
Massimiliano Culpo CINECA
Piero Lanucara CINECA
Andrea Penza CINECA
Francesco Salvadore CINECA
Ivan Spisso CINECA
![Page 29: OpenFOAM on BG/Q porting and performance - Prace ... on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA OpenFOAM: selected application inside of PRACE project Fermi](https://reader031.fdocuments.net/reader031/viewer/2022022010/5b0317847f8b9a4e538bd05c/html5/thumbnails/29.jpg)
www.cineca.it
Questions ?