Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A...
Transcript of Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A...
![Page 1: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/1.jpg)
Speculative AtomicsCase-study of the GPU Optimization of the Material Point Method for Graphics
Gergely KlarUCLA Computer Graphics & Vision Laboratory
![Page 2: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/2.jpg)
MotivationA. Stomakhin, C. Schroeder, L. Chai, J. Teran, A. Selle, A Material Point Method for Snow Simulation,ACM Transactions on Graphics (SIGGRAPH 2013), 32(4), pp. 102:1-102:10, 2013.
Images ©Disney.
![Page 3: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/3.jpg)
MPM Simulations
● Amazing new effects● Extremely computationally demanding
○ millions of particles○ 200 million cell grids○ up to 30 minutes per frame
![Page 4: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/4.jpg)
MPM Simulations
● Amazing new effects● Extremely computationally demanding
○ millions of particles○ 200 million cell grids○ up to 30 minutes per frame
GPU to the rescue!
![Page 5: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/5.jpg)
MPM Overview
1. Particles-to-grid2. Grid operations3. Particles-from-grid4. Particle operation
![Page 6: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/6.jpg)
Parallelizing steps
1. Particles-to-grid2. Grid operations3. Particles-from-grid4. Particle operations
![Page 7: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/7.jpg)
Parallelizing steps
1. Particles-to-grid 2. Grid operations3. Particles-from-grid
✓ 4. Particle operations --- trivial
![Page 8: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/8.jpg)
Parallelizing steps
1. Particles-to-grid 2. Grid operations
✓ 3. Particles-from-grid --- trivial✓ 4. Particle operations --- trivial
![Page 9: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/9.jpg)
Parallelizing steps
1. Particles-to-grid ✓ 2. Grid operations --- case by case✓ 3. Particles-from-grid --- trivial✓ 4. Particle operations --- trivial
![Page 10: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/10.jpg)
Parallelizing steps
✘ 1. Particles-to-grid --- challenge✓ 2. Grid operations --- case by case✓ 3. Particles-from-grid --- trivial✓ 4. Particle operations --- trivial
![Page 11: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/11.jpg)
What’s wrong with particles-to-grid?
It is a classical scatter operation● up to 10 particles per cell initially● 53 support (region of influence) per particle⇒ Many race conditions, if processed particle-by-particle
![Page 12: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/12.jpg)
How to do super fast scatter?
1. Use atomics operations!2. Protect coalesced access!3. Disperse the particles!
![Page 13: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/13.jpg)
Why not gather?
Classical work around:Turn scatter it into gather!
Problems:● No constraint on particle count per cell● Overhead● Need to do it in every step!
![Page 14: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/14.jpg)
Why not gather?
Classical work around:Turn scatter it into gather!
Problems:● No constraint on particle count per cell● Overhead● Need to do it in every step!
![Page 15: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/15.jpg)
Let’s use atomics!
(and I mean atomic instructions)
Pro: Don’t need to worry about race conditionsCon: Can get very slow
![Page 16: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/16.jpg)
Speculative atomics
Guaranteeing no race conditions is expensive.Atomic instructions are getting better, minimal overhead when there’s no need to block
Idea: Don’t try to eliminate all race conditions, but reduce their likelihood and use atomic
instructions
![Page 17: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/17.jpg)
Bad case
![Page 18: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/18.jpg)
Bad case
![Page 19: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/19.jpg)
Worst case
![Page 20: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/20.jpg)
Best case
![Page 21: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/21.jpg)
Best case
![Page 22: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/22.jpg)
How to get the best case all the time?
Concurrent threads need to process particles that● are from different cells● are consecutive in memory● access consecutive grid cells
![Page 23: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/23.jpg)
Bad idea: Indirect indexingprocessParticle(p[particleIndexShuffler(threadIdx)])
Problem: particles processed in parallel are spread out in memory → can not use coalesced memory access → poor performanceSolution: need to physically shuffle particles
![Page 24: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/24.jpg)
Blocked layoutParticles in memory
Particles in space
![Page 25: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/25.jpg)
Blocked layout in action
![Page 26: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/26.jpg)
Blocked layout
1. Assign cell ID to each particle by the grid cell it is in. cell IDs define groups.
2. Sort particle IDs by cell IDs3. Rearrange particles by picking the 1st from
each group, then the 2nd, etc.
![Page 27: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/27.jpg)
Blocked layout Pseudocode 1/2
kernelComputeCellID // per particle
cellX = floor(x[pID]/dx)
cellY = floor(y[pID]/dx)
cellZ = floor(z[pID]/dx)
cellID[pID] = cellX+
cellY*GRID_WIDTH+
cellZ*GRID_WIDTH*GRID_HEIGHT
![Page 28: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/28.jpg)
Blocked layout Pseudocode 2/2
sequence(pIDs)
sort_by_key(cellIDs, pIDs)
exclusive_scan_by_key(cellIDs, constant(1), newIDs)
sort_by_key(newIDs,
permutation(zip(<all particle data>),
pIDs))
![Page 29: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/29.jpg)
Re-blocking
No need to rearrange the particles in every step, but● Goal of MPM is to simulate deformations
and fracturing● Particles can move around a lot● Monitor scatter performance, re-block if
drops below threshold
![Page 30: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/30.jpg)
Results
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Relative speeds compared to the CPU implementation
CPU MPM: 100%
GPU Naïve Atomics: 9.8%
GPU Gather: 4.9%
GPU Blocked Layout: 2.2%
![Page 31: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/31.jpg)
Results
CPU GPU Naïve GPU Gather GPU B. Layout
CPU 1x 10.20x 20.41x 45.45x
GPU Naïve 1x 2.00x 4.45x
GPU Gather 1x 2.23x
GPU B. Layout 1x
Relative speed-ups compared to other methods
![Page 32: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/32.jpg)
Summary● Speculative Atomics:
○ The overhead of unnecessary atomic instructions is lower than the overhead of complicated gather transforms
● Blocked layout:○ Special arrangement of particles to minimize the number
of race conditions● Not just for MPM:
○ Can be adopted to other particles-and-grids simulations as well
![Page 33: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/33.jpg)
Takeaway
1. Don’t be afraid to use atomic operations!2. Use coalesced access at all costs!3. Spread out the particles (but not too far)!
![Page 34: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/34.jpg)
Questions?
![Page 35: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:](https://reader036.fdocuments.net/reader036/viewer/2022071016/5fcf503354ac9a28de605e15/html5/thumbnails/35.jpg)
Thank you!