Z-Buffer Optimizations

43
Z-Buffer Optimizations Patrick Cozzi University of Pennsylvania CIS 565 - Fall 2013

description

Z-Buffer Optimizations. Patrick Cozzi University of Pennsylvania CIS 565 - Fall 2013. Announcements. Student work on course website Twitter Eric Haines’ talk. Announcements. Project 4 demos today Project 5 Due tomorrow Demos on Monday Project 6 – Deferred shading - PowerPoint PPT Presentation

Transcript of Z-Buffer Optimizations

Page 1: Z-Buffer Optimizations

Z-Buffer Optimizations

Patrick CozziUniversity of PennsylvaniaCIS 565 - Fall 2013

Page 2: Z-Buffer Optimizations

Announcements

Student work on course website Twitter Eric Haines’ talk

2

Page 3: Z-Buffer Optimizations

Announcements

Project 4 demos today Project 5

Due tomorrowDemos on Monday

Project 6 – Deferred shadingReleased Friday. Due next Friday 11/15

HackathonNext Saturday 11/16, 6pm-12am, SIG lab

3

Page 4: Z-Buffer Optimizations

Overview

Hardware: Early-Z Software: Front-to-Back Sorting Hardware: Double-Speed Z-Only Software: Early-Z Pass Software: Deferred Shading Hardware: Buffer Compression Hardware: Fast Clear Hardware: Z-Cull Future: Programmable Culling Unit

Page 5: Z-Buffer Optimizations

Z-Buffer

Also called Depth BufferFragment vs PixelAlternatives: Painter’s, Ray Casting, etc.

Page 6: Z-Buffer Optimizations

Z-Buffer History

“Brute-force approach”“Ridiculously expensive”

Sutherland, Sproull, and, Schumacker, “A Characterization of Ten Hidden-Surface Algorithms”, 1974

Page 7: Z-Buffer Optimizations

Z-Buffer Quiz10 triangles cover a pixel. Rendering these in random order with a Z-buffer, what is the average number of times the pixel’s z-value is written?

See Subtle Tools Slides: erich.realtimerendering.com

Page 8: Z-Buffer Optimizations

Z-Buffer Quiz

1st triangle writes depth2nd triangle has 1/2 chance of writing depth3rd triangle has 1/3 chance of writing depth

1 + 1/2 + 1/3 + …+ 1/10 = 2.9289…

See Subtle Tools Slides: erich.realtimerendering.com

Page 9: Z-Buffer Optimizations

Z-Buffer Quiz

See Subtle Tools Slides: erich.realtimerendering.com

Harmonic Series

# Triangles # Depth Writes1 14 2.0811 3.0231 4.0383 5

12,367 10

Page 10: Z-Buffer Optimizations

Z-Test in the Pipeline

When is the Z-Test?

FragmentShader

FragmentShader

Z-Test

Z-Test

or

Page 11: Z-Buffer Optimizations

Early-Z

Avoid expensive fragment shaders

FragmentShader

Z-Test

Page 12: Z-Buffer Optimizations

Early-Z

Automatically enabled on GeForce (8?) unless1

Fragment shader discards or write depthDepth writes and alpha-test2 are enabled

Fine-grained as opposed to Z-CullATI: “Top of the Pipe Z Reject”

FragmentShader

Z-Test

1 See NVIDIA GPU Programming Guide for exact details2 Alpha-test is deprecated in GL 3

Page 13: Z-Buffer Optimizations

Front-to-Back Sorting

Utilize Early-Z for opaque objectsOld hardware still has less z-buffer writesCPU overhead. Need efficient sorting

Bucket SortOctree

Conflicts with state sorting1

0 - 0.25 0.25 – 0.5 0.5 – 0.75 0.75 - 1

01

12

[1] For example, see http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BRenderstate%20change%20costs%5D%5D

Page 14: Z-Buffer Optimizations

Double Speed Z-Only

GeForce FX and later render at double speed when writing only depth or stencilEnabled when

Color writes are disabledFragment shader discards or write depthAlpha-test is disabled

See NVIDIA GPU Programming Guide for exact details

Page 15: Z-Buffer Optimizations

Early-Z PassSoftware technique to utilize Early-Z and Double Speed Z-OnlyTwo passes

Render depth only. “Lay down depth”Double Speed Z-Only

Render with full shaders and no depthEarly-Z (and Z-Cull)

Page 16: Z-Buffer Optimizations

Early-Z PassOptimizations

Depth passCoarse sort front-to-backOnly render major occluders

Color passSort by stateRender depth just for non-occluders

Page 17: Z-Buffer Optimizations

Deferred Shading

Similar to Early-Z Pass1st Pass: Visibility tests2nd Pass: Shading

Different than Early-Z PassGeometry is only transformed once

Page 18: Z-Buffer Optimizations

Deferred Shading

1st PassRender geometry into G-Buffers:

Images from Tabula Rasa. See Resources.

Diffuse Normals Depth

Page 19: Z-Buffer Optimizations

Deferred Shading

2nd PassShading == post processing effectsRender full screen quads that read from G-Buffers

Geometry is no longer needed

Page 20: Z-Buffer Optimizations

Deferred Shading

Light Accumulation Result

Image from Tabula Rasa. See Resources.

Page 21: Z-Buffer Optimizations

Deferred Shading

Eliminates shading fragments that fail Z-TestIncreases video memory requirementHow does it affect bandwidth?

Page 22: Z-Buffer Optimizations

Buffer Compression

Reduce depth buffer bandwidthGenerally does not reduce memory usage of actual depth bufferSame architecture applies to other buffers, e.g. color and stencil

Page 23: Z-Buffer Optimizations

Buffer Compression

Tile Table: Status for nxn tile of depths, e.g. n=8[state, zmin, zmax]state is either compressed, uncompressed, or cleared

0.10.50.50.1

0.5 0.5 0.10.8 0.80.8 0.8

0.50.5

0.5 0.5 0.1

[uncompressed, 0.1, 0.8]

Page 24: Z-Buffer Optimizations

Buffer Compression

TileTable

Decompress Compress

Compressed Z-Buffer

Rasterizerupdatedz-values

updated z-max

nxn uncompressed z values[zmin, zmax]

Page 25: Z-Buffer Optimizations

Buffer Compression

Depth Buffer WriteRasterizer modifies copy of uncompressed tileTile is lossless compressed (if possible) and sent to actual depth buffer

Update Tile Tablezmin and zmax

status: compressed or decompressed

Page 26: Z-Buffer Optimizations

Buffer Compression

Depth Buffer ReadTile Status

Uncompressed: Send tileCompressed: Decompress and send tileCleared: See Fast Clear

Page 27: Z-Buffer Optimizations

Buffer Compression

ATI: Writing depth interferes with compressionRender those objects last

Minimize far/near ratioImproves Zmin, Zmax precision

Page 28: Z-Buffer Optimizations

Fast Clear

Doesn’t touch depth bufferglClear sets state of each tile to clearedWhen the rasterizer reads a cleared buffer

A tile filled with GL_DEPTH_CLEAR_VALUE is sentDepth buffer is not accessed

Page 29: Z-Buffer Optimizations

Fast Clear

Use glClearNot full screen quadsNot the skyboxNo "one frame positive, one frame negative“ trickClear stencil together with depth – they are stored in the same buffer

Page 30: Z-Buffer Optimizations

Z-Cull

Cull blocks of fragments before shadingCoarse-grained as opposed to Early-ZAlso called Hierarchical Z

Page 31: Z-Buffer Optimizations

Z-CullZmax-CullingRasterizer fetches zmax for each tile it processesCompute ztriangle

min for a triangleCulled if ztriangle

min > zmax

FragmentShader

Z-Cull

Ztrianglemin > tile’s zmax

ztrianglemin

Page 32: Z-Buffer Optimizations

Z-CullZmin-Culling

Support different depth testsAvoid depth buffer readsIf triangle is in front of tile, depth tests for each pixel is unnecessary

FragmentShader

Z-Cull

Ztrianglemax < tile’s zmin

ztrianglemax

Page 33: Z-Buffer Optimizations

Z-Cull

Automatically enabled on GeForce (6?) cards unlessglClear isn’t usedFragment shader writes depth (or discards?)Direction of depth test is changed

ATI: avoid = and != depth compares on old cardsATI: avoid stencil fail and stencil depth fail operationsLess efficient when depth varies a lot within a few pixels

See NVIDIA GPU Programming Guide for exact details

Page 34: Z-Buffer Optimizations

ATI HyperZ

HyperZ = Early Z + Z Compression + Fast Z clear + Z-Cull

See ATI's Depth-in-depth

Page 35: Z-Buffer Optimizations

Programmable Culling Unit

Cull before fragment shader even if the shader writes depth or discardsRun part of shader over an entire tile to determine lower bound z value

Hasselgren and Akenine-Möller, “PCU: The Programmable Culling Unit,” 2007

Page 36: Z-Buffer Optimizations

Summary

What was once “ridiculously expensive” is now the primary visible surface algorithm for rasterization

Page 37: Z-Buffer Optimizations

Resources

www.realtimerendering.com

Sections 7.9.2 and 18.3

Page 38: Z-Buffer Optimizations

Resources

developer.nvidia.com/object/gpu_programming_guide.html

GeForce 8 Guide: sections 3.4.9, 3.6, and 4.8GeForce 7 Guide: section 3.6

Page 39: Z-Buffer Optimizations

Resources

http://developer.amd.com/media/gpu_assets/Depth_in-depth.pdf

Depth In-depth

Page 40: Z-Buffer Optimizations

Resources

http://www.graphicshardware.org/previous/www_2000/presentations/ATIHot3D.pdf

ATI Radeon HyperZ TechnologySteve Morein

Page 41: Z-Buffer Optimizations

Resources

http://ati.amd.com/developer/dx9/ATI-DX9_Optimization.pdf

Performance Optimization Techniques for ATI Graphics Hardware with DirectX® 9.0

Guennadi Riguer

Sections 6.5 and 8

Page 42: Z-Buffer Optimizations

Resources

developer.nvidia.com/object/gpu_gems_home.html

Chapter 28: Graphics Pipeline Performance

Page 43: Z-Buffer Optimizations

Resources

developer.nvidia.com/object/gpu-gems-3.html

Chapter 19: Deferred Shading in Tabula Rasa