Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in...
Transcript of Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in...
![Page 1: Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases](https://reader035.fdocuments.net/reader035/viewer/2022063005/5fb203efd23ad514506b5082/html5/thumbnails/1.jpg)
Damaris: Using Dedicated I/O Cores for Scalable Post-petascale HPC Simulations
Matthieu Dorier ENS Cachan Brittany extension
[email protected] Advised by Gabriel Antoniu
SRC
![Page 2: Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases](https://reader035.fdocuments.net/reader035/viewer/2022063005/5fb203efd23ad514506b5082/html5/thumbnails/2.jpg)
2
Context: HPC simulations on Blue Waters ² INRIA/UIUC Joint Lab for Petascale
Computing
² Targeting large-scale simulation of unprecedented accuracy
² Our concern: I/O performance scalability
![Page 3: Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases](https://reader035.fdocuments.net/reader035/viewer/2022063005/5fb203efd23ad514506b5082/html5/thumbnails/3.jpg)
3
Motivation: data management in HPC
![Page 4: Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases](https://reader035.fdocuments.net/reader035/viewer/2022063005/5fb203efd23ad514506b5082/html5/thumbnails/4.jpg)
4
Motivation: data management in HPC
² Problem: ² All processes entering I/O phases at the same time ² File system contention: lake of scalability ² High I/O overhead, high performance variability
+ 100.000 processes
~ 10.000 processes
~ 100 data servers
PetaBytes of data
![Page 5: Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases](https://reader035.fdocuments.net/reader035/viewer/2022063005/5fb203efd23ad514506b5082/html5/thumbnails/5.jpg)
5
I/O variability: an example
² CM1 tornado simulation: 672 processes sorted by write time
![Page 6: Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases](https://reader035.fdocuments.net/reader035/viewer/2022063005/5fb203efd23ad514506b5082/html5/thumbnails/6.jpg)
6
The Damaris approach: dedicated I/O cores
Leave a core, go faster!
² Use the SMP’s intra-node shared memory
![Page 7: Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases](https://reader035.fdocuments.net/reader035/viewer/2022063005/5fb203efd23ad514506b5082/html5/thumbnails/7.jpg)
7
Integration with the CM1 tornado simulation
² Less than an hour to write an I/O backend with Damaris ² The I/O core spends 25% of its time writing è 75% spare time!
How to use the spare time? ² Custom plugin system:
² Data post-processing, indexing, analysis
² End-to-end scientific process ² Connect visualization/analysis tools è inline visualization
![Page 8: Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases](https://reader035.fdocuments.net/reader035/viewer/2022063005/5fb203efd23ad514506b5082/html5/thumbnails/8.jpg)
8
Results with the CM1 tornado simulation
² On Grid’5000: French national testbed (24 cores/node, 672 cores), with PVFS, comparison with collective I/O ² Communication overhead è leaving a core is more efficient ² No synchronization ² 6 times higher write throughput
² BluePrint: Power5 BlueWaters interim system at NCSA (16 cores/node, 1024 cores), with GPFS, comparison with file-per-process approach ² On 64 nodes è 64 files instead of 1024
![Page 9: Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases](https://reader035.fdocuments.net/reader035/viewer/2022063005/5fb203efd23ad514506b5082/html5/thumbnails/9.jpg)
9
Results with the CM1 tornado simulation
² On Grid’5000: French national testbed (24 cores/node, 672 cores), with PVFS, comparison with collective I/O ² Communication overhead è leaving a core is more efficient ² No synchronization ² 6 times higher write throughput
² BluePrint: Power5 BlueWaters interim system at NCSA (16 cores/node, 1024 cores), with GPFS, comparison with file-per-process approach ² On 64 nodes è 64 files instead of 1024
² Overall benefits ² Spare time usage
² Data layout adaptation for subsequent analysis ² Overhead-free compression (600%)
² No more I/O jitter
![Page 10: Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases](https://reader035.fdocuments.net/reader035/viewer/2022063005/5fb203efd23ad514506b5082/html5/thumbnails/10.jpg)
10
Results with the CM1 tornado simulation
![Page 11: Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases](https://reader035.fdocuments.net/reader035/viewer/2022063005/5fb203efd23ad514506b5082/html5/thumbnails/11.jpg)
11
Conclusion ² Damaris: dedicated I/O core in multicore SMP nodes
1 Better I/O and global performance 2 No more variability in write phases 3 Easy integration and configuration
² Targeting Blue Waters and future Post-petascale machines
² Very promising prospects in many directions ² Integration with other simulations: Enzo (AMR), GTC,… ² Leverage spare time for efficient inline visualization ² Data-aware self-configuration, scheduled data movements,
multi-simulations coupling
² http://damaris.gforge.inria.fr
![Page 12: Damaris: Using Dedicated I/O Cores for Scalable Post ... · Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases](https://reader035.fdocuments.net/reader035/viewer/2022063005/5fb203efd23ad514506b5082/html5/thumbnails/12.jpg)
12
Conclusion ² Damaris: dedicated I/O core in multicore SMP nodes
1 Better I/O and global performance 2 No more variability in write phases 3 Easy integration and configuration
² Targeting Blue Waters and future Post-petascale machines
² Very promising prospects in many directions ² Integration with other simulations: Enzo (AMR), GTC,… ² Leverage spare time for efficient inline visualization ² Data-aware self-configuration, scheduled data movements,
multi-simulations coupling
² http://damaris.gforge.inria.fr
Thank you, questions?