Sparse Fluid Simulation in DirectX -...
Transcript of Sparse Fluid Simulation in DirectX -...
Sparse Fluid Simulation in DirectX Alex Dunn Dev. Tech. – NVIDIA [email protected]
Agenda
● We want more fluid in games
● Eulerian (grid based) fluid.
● Sparse Eulerian Fluid.
● Feature Level 11.3 Enhancements!
● (Not a talk on fluid dynamics)
Why Do We Need Fluid in Games?
● Replace particle kinematics!
● more realistic == better immersion
● Game mechanics?
● occlusion ●smoke grenades
●interaction
● Dispersion ●air ventilation systems
●poison, smoke
● Endless opportunities!
Eulerian Simulation #1
Inject
Advect
Pressure
Vorticity
Evolve
2x Velocity
2x Pressure
1x Vorticity
My (simple) DX11.0 eulerian fluid simulation:
Eulerian Simulation #2
Inject
Advect
Pressure
Vorticity
Evolve
Add fluid to simulation
Move data at, XYZ (XYZ+Velocity)
Calculate localized pressure
Calculates localized rotational flow
Tick Simulation
Too Many Volumes Spoil the…
● Fluid isn’t box shaped.
● clipping
● wastage
● Simulated separately.
● authoring
● GPU state
● volume-to-volume interaction
● Tricky to render.
Problem!
● N-order problem
● 64^3 = ~0.25m cells
● 128^3 = ~2m cells
● 256^3 = ~16m cells
● …
● Applies to:
● computational complexity
● memory requirements
0
1024
2048
3072
4096
5120
6144
7168
8192
0 256 512 768 1024
Mem
ory (
Mb
)
Dimensions (X = Y = Z)
Texture3D - 4x16F
And that’s just 1 texture…
Bricks
● Split simulation space into groups of cells (each known as a brick).
● Simulate each brick independently.
Brick Map
● Need to track which bricks contain fluid
● Texture3D<uint>
● 1 voxel per brick
● 0 Unoccupied
● 1 Occupied
● Could also use packed binary grids [Gruen15], but this requires atomics
● Initialise with emitter
● Expansion (unoccupied occupied)
● if { V|x|y|z| > |Dbrick| }
● expand in that axis
● Reduction (occupied unoccupied)
● inverse of Expansion
● handled automatically
Tracking Bricks
Sparse Simulation
Inject
Advect
Pressure
Vorticity
Evolve*
Clear Tiles
Fill List
Reset all tiles to 0 (unoccupied) in brick map.
Texture3D<uint> g_BrickMapRO; AppendStructredBuffer<uint3> g_ListRW; if(g_BrickMapRO[idx] != 0) { g_ListRW.Append(idx); }
Read value from brick map. Append brick coordinate to list if occupied.
*Includes expansion
Uncompressed Storage
Allocate everything; forget about unoccupied cells
Pros: • simulation is coherent in memory. • works in DX11.0.
Cons: • no reduction in memory usage.
Compressed Storage
Similar to, List<Brick> Pros: • good memory consumption. • works in DX11.0.
Cons: • allocation strategies. • indirect lookup.
• “software translation” • filtering particularly costly
Indirection Table
Physical Memory
Enter; Feature Level 11.3 ● Volume Tiled Resources (VTR)!
● Extends 2D functionality in FL11.2
● Must check HW support: (DX11.3 != FL11.3)
ID3D11Device3* pDevice3 = nullptr; pDevice->QueryInterface(&pDevice3); D3D11_FEATURE_DATA_D3D11_OPTIONS2 support; pDevice3->CheckFeatureSupport(D3D11_FEATURE_D3D11_OPTIONS2, &support, sizeof(support)); m_UseTiledResources = support.TiledResourcesTier == D3D11_TILED_RESOURCES_TIER_3;
Tiled Resources #1
Pros: • only mapped memory is
allocated in VRAM
• “hardware translation”
• logically a volume texture
• all samplers supported
• 1 Tile = 64KB (= 1 Brick)
• fast loads
Tiled Resources #2
Gotcha: Tile mappings must be updated from CPU
1 Tile = 64KB (= 1 Brick)
BPP Tile Dimensions
8 64x32x32
16 32x32x32
32 32x32x16
64 32x16x16
128 16x16x16
Latency Resistant Simulation #1
Naïve Approach:
● clamp velocity to Vmax
● CPU Read-back:
● occupied bricks.
● 2 frames of latency!
● extrapolate “probable” tiles.
Frame N+1 Frame N+2 Frame N+3 Frame N
Frame N+1 Frame N Frame N+2
N; Data Ready
N+1; Data Ready
N+2; Data Ready
N; Tiles Mapped
CPU: GPU:
Latency Resistant Simulation #2
Tight Approach:
● CPU Read-back:
● occupied bricks.
● max{|V|} within brick.
● 2 frames of latency!
● extrapolate “probable” tiles.
Frame N+1 Frame N+2 Frame N+3 Frame N
Frame N+1 Frame N Frame N+2
N; Data Ready
N+1; Data Ready
N+2; Data Ready
N; Tiles Mapped
CPU: GPU:
Latency Resistant Simulation #3
Sparse Eulerian
Simulation
Readback Brick List
CPU Readback Ready?
Yes
Prediction Engine
Emitter Bricks
No
UpdateTileMappings
CPU GPU
Performance #1
NOTE: Numbers captured on a GeForce GTX980
2.3
19.9
64.7
0.4 1.8 2.7 2.9 6.0
128 256 384 512 1024
Sim
. T
ime (
ms)
Grid Resolution
Full Grid
Sparse Grid
Performance #2
NOTE: Numbers captured on a GeForce GTX980
80
640
2,160
11
46 57 83
138
128 256 384 512 1024
Mem
ory (
MB
)
Grid Resolution
Full Grid
Sparse Grid
Summary
● Fluid simulation in games is justified.
● Fluid is not box shaped!
● One volume is better than many small.
● Un/Compressed storage a viable fallback.
● VTRs great for fluid simulation.
● Other latency resistant algorithms with tiled resouces?