Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year:...
Transcript of Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year:...
![Page 1: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/1.jpg)
Optimization for DirectX9 Optimization for DirectX9 GraphicsGraphics
Ashu Rege
![Page 2: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/2.jpg)
Last Year: Batch, Batch, Batch
• Moral of the story: Small batches BAD
• What is a “batch”– Every DrawIndexedPrimitive call is a batch– All render, texture, shader, ... state is same
![Page 3: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/3.jpg)
Simple Test App
• Degenerate Triangles (no fill cost)• Post TnL Cache Vertices (no xform cost)• Static Data (minimal AGP overhead)• Fixed (~100 K) Tris/Frame• Vary Number of Batches
![Page 4: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/4.jpg)
Last Year’s Graph Updated Measured Performance: Different Batch-Sizes
0.0%10.0%20.0%30.0%40.0%50.0%60.0%70.0%80.0%90.0%
100.0%
10 60 110
160
300
800
1300
triangles/batch
mill
ion
tria
ngle
s/s
3GHz Pentium 4; RADEON 9800 XT
3Ghz Pentium 4; NVIDIA GeForce FX 5950 Ultra
Axis scale changeAxis scale change
![Page 5: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/5.jpg)
This Year: Son Of A Batch
• What makes an app ‘batchy’?– Too many state changes
• What kinds of state changes?
• Techniques to reduce batches
![Page 6: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/6.jpg)
State Changes
• Analysis of some popular games
• Top State Changes:– Texture State– Vertex Shaders and Vertex Shader Constants– Pixel Shaders and Pixel Shader Constants
![Page 7: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/7.jpg)
Do State Changes Really Matter?
• Cost of state changes• Comparison with no state changes• One state change:
– Factor of 4 drop in fps (on average)• Multiple state changes:
– Another factor of 2-5 drop
![Page 8: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/8.jpg)
How To Sort?
• Seems like an n-dimensional problem
• Should I sort by texture, pixel shader, vertex shader, ... what?
![Page 9: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/9.jpg)
Texture v. Pixel Shader
Different
Textures
Different Pixel Shaders
![Page 10: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/10.jpg)
Collapse One Of The Axes
Different
Textures
Different Pixel Shaders
![Page 11: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/11.jpg)
Texture Atlases
(0,0) (0.5,0) (1,0)
Texture A Texture B(0,0) (1,0)
(1,1)(0,1)
(0,0)
(0,1)
(1,0)
(1,1)
Texture Atlas
(0,1) (0.5,1) (1,1)
![Page 12: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/12.jpg)
Basic Idea
• Select batch-breaking textures• Pack into one or more texture atlases• Update the uv-coordinates of models
• Convert multiple DIP calls into one
![Page 13: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/13.jpg)
What About Mip-Maps?
• What happens to the lowest 1x1 level?– Smearing?
• Tool-chain should generate mip-maps before packing
• Use special purpose mip-map filters
![Page 14: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/14.jpg)
What About Lower Levels? 1 16x16 Sub-Texture
12 8x8 Sub-Textures
4x4 Level
2x2 Level
Smearing
![Page 15: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/15.jpg)
Auto-Generation of Mip-Maps
• 2x2 Box filter can also work for power-of-2 textures– Both atlas and sub-textures in it are pow2– Textures should not cross pow2 lines
![Page 16: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/16.jpg)
Proper Placement For Box Filter
‘4’ power-of-2 lines
32x32 Atlas
‘16’ power-of-2 line
‘8’ power-of-2 lines
A 16x16 sub-texture cannot cross any ‘16’ power-of-2 lines.
![Page 17: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/17.jpg)
What About Lower Levels? 1 16x16 Sub-Texture
12 8x8 Sub-Textures
4x4 Level
2x2 Level
Smearing
![Page 18: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/18.jpg)
Possible Solutions
• Terminate mip chain to fit smallest sub-texture– Image Quality and Performance Issues
• Use only sub-textures of same size– May be inflexible
• But there’s good news...
![Page 19: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/19.jpg)
Cannot Access Lower Levels• A triangle’s texture coordinates never
span across sub-textures
• Worst case: pixel-sized triangle spanning entire sub-texture
• Only “1-texel” level is accessed– Fill it with valid data
![Page 20: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/20.jpg)
Cannot Access Lower Levels
Pixel Sized Quad
• DirectX raster rules make it unlikely for smaller quad (or tri) to generate pixel
![Page 21: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/21.jpg)
Other Issues
• Address modes such as clamp?– Use ddx, ddy in pixel-shader to emulate modes
• Smearing due to filtering– Texels on border of sub-textures get smeared– Aniso can help: smaller footprint– Do re-mapping of texcoords in pixel shaders– Pad textures with border texels
![Page 22: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/22.jpg)
DirectX9 Instancing API• What is it?
– Single draw call to draw multiple instances of the same model
• Why should you care?– Avoid DIP calls and minimize batching overhead
• What do you need?– DirectX 9.0c– VS 3.0/PS 3.0 support
![Page 23: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/23.jpg)
When To Use Instancing
• Many Instance of Same Model– Forest of trees, particle systems, sprites
• Encode per-instance data in auxiliary stream– Colors, texture coordinates, per-instance consts
• Not as useful if batching overhead is low– Fixed overhead to instancing
![Page 24: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/24.jpg)
How Does It Work?• Vertex stream frequency divider API
• Primary stream is a single copy of the model data
• Secondary stream: per instance data– pointer is advanced for each rendered instance
Vertex Data
Per instance dataVS_3_0
0
1
![Page 25: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/25.jpg)
Simple Instancing Example• 100 poly trees
– Stream 0 contains just the one tree model– Stream 1 contains model WVP transforms
• Possibly calculated per frame based on the instances in the view
– Vertex Shader is the same as normal, except you use the matrix from the vertex stream instead of the matrix from VS constants
• If you are drawing 10k trees that’s a lot of draw call savings!– You could manipulate the VB and pre-transform verts, but
it’s often tricky, and you are replicating a lot of data
![Page 26: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/26.jpg)
Some Test ResultsInstancing versus Single DIP calls
0 500 1000 1500 2000 2500
Batch Size
FPS
InstancingNo Instancing
1 million diffuse shaded polys in each run
![Page 27: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/27.jpg)
Test Summary• Big win for small batch sizes• Fixed overhead for instancing• Cross-over point changes depending on
CPU and GPU, engine overhead etc.
![Page 28: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/28.jpg)
More Information• White paper and tools soon for texture
atlases on www.nvidia.com/developer• “Profiling Your DirectX Application” in
NVIDIA sponsored session on Wed.
![Page 29: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures](https://reader033.fdocuments.net/reader033/viewer/2022052518/5f0b4a227e708231d42fc84a/html5/thumbnails/29.jpg)
Questions?• Contact: [email protected]