Beyond Programmable Shading: In...

276
Beyond Programmable Shading: In Action SIGGRAPH 2008 5/19/2008 Course Organizers: Aaron Lefohn, Intel Mike Houston, AMD Course Speakers: Aaron Lefohn Intel Mike Houston AMD David Luebke NVIDIA Jon Olick Id Software Fabio Pellacini Dartmouth College Speaker Contact Info: Aaron Lefohn 2700 156 th Ave NE, Suite 300 Bellevue, WA 98007 [email protected] Mike Houston 4555 Great America Parkway Suite 501 Santa Clara, CA 95054 [email protected] David Luebke 1912 Lynchburg Dr. Charlottesville, VA 22903 [email protected] Jon Olick 3819 Towne Crossing #222 Mesquite, TX 75150 [email protected] Fabio Pellacini 6211 Sudikoff Lab Hanover, NH 03755 [email protected]

Transcript of Beyond Programmable Shading: In...

Page 1: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

SIGGRAPH 2008

5/19/2008

Course Organizers: Aaron Lefohn, Intel Mike Houston, AMD

Course Speakers:

Aaron Lefohn Intel Mike Houston AMD David Luebke NVIDIA Jon Olick Id Software Fabio Pellacini Dartmouth College

Speaker Contact Info:

Aaron Lefohn 2700 156th Ave NE, Suite 300 Bellevue, WA 98007 [email protected]

Mike Houston 4555 Great America Parkway Suite 501 Santa Clara, CA 95054 [email protected]

David Luebke 1912 Lynchburg Dr. Charlottesville, VA 22903 [email protected]

Jon Olick 3819 Towne Crossing #222 Mesquite, TX 75150 [email protected]

Fabio Pellacini 6211 Sudikoff Lab Hanover, NH 03755 [email protected]

Page 2: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Course Description:

This second course in a series will demonstrate case studies of combining traditional rendering API usage with advanced parallel computation from game developers, researchers, and graphics hardware vendors. There are strong indications that the future of interactive graphics programming is a model more flexible than today’s OpenGL/Direct3D pipelines. As such, graphics developers need to have a basic understanding of how to combine emerging parallel programming techniques and more flexible graphics processors with the traditional interactive rendering pipeline. Each case study includes a live demo and discusses the mix of parallel programming constructs used, details of the graphics algorithm, and how the rendering pipeline and computation interact to achieve the technical goals. Intended Audience: We are targeting researchers and engineers interested in investigating advanced graphics techniques using parallel programming techniques on many-core GPU and CPU architectures, as well as graphics and game developers interested in integrating these techniques into their applications. Prerequisites: Attendees are expected to have experience with a modern graphics API (OpenGL or Direct3D), including basic experience with shaders, textures, and framebuffers and/or background with parallel programming languages. Some background with parallel programming on CPUs or GPUs is useful but not required as an overview of will be provided in the course. Attendees are strongly encouraged to attend the first course in this series, “Beyond Programmable Shading: Fundamentals.” Level of difficulty: Advanced Special presentation requirements: Several speakers will bring their own demo machines for use in the course. Speakers: - Aaron Lefohn, Intel - Mike Houston, AMD - David Luebke, NVIDIA - Jon Olick, Id Software - Fabio Pellacini, Dartmouth College

Page 3: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

“Beyond Programmable Shading: In Action” - Introduction: Mike Houston: 15 min

o Brief recap of course 1 From programmable shading to fully programmable graphics List of programming environment possibilities Recap of benefits over using the traditional graphics pipeline Teaser pictures for case studies

(NOTE: All case studies will discuss the graphics algorithms, the mix of parallel computation being used to achieve the effect(s), as well as performance and of course pretty pictures)

- Research Case Study

o Interactive Cinematic Lighting: Fabio Pellacini: 30 min The fake: pre-computing your way to interactive global illumination

on a GPU The real deal: no-pre-computation global illumination in 5 seconds

with CPUs + GPU What it will take to do this 500 times faster

- Games Case Studies

o Current Generation Parallelism in Games: Jon Olick (Sony/Id): 30 min Offloading work from the GPU to many CPU cores Triangle culling, progressive meshes, displacement maps Geometry compression / decompression: indices, vertices Tightly-coupled CPU-GPU synchronization techniques

o Next Generation Parallelism in Games: Jon Olick (Id): 30 min Why ray casting? What are the advantages over rasterization? Does it add end-user value? Risks? Efficient implementations, control flow, data structures

- Break, 15 minutes - Graphics Hardware Vendor Case Studies

o Each of the hardware vendor talks will give a case study covering several advanced graphics algorithms that require a mix of data- and/or task-parallel computation as well as the traditional graphics pipeline. Example topics may include, but are not limited to:

Interactive global illumination Building/using irregular data structures during interactive rendering Combining real-time physics and rendering Combining ray tracing and rasterization

o Each case study will describe the graphics algorithms, the data structures used/built, the parallel algorithms and programming tools used, as well as give live demos.

Page 4: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

o The details of the case studies cannot be revealed at this time due to intellectual property limitations; however, each of the vendors and speakers has a strong reputation of delivering cutting-edge content and algorithm design to the SIGGRAPH community.

NVIDIA case study: 30 min AMD case study: 30 min Intel case study: 30 min

- Conclusions: Lefohn + All Speakers: 15 min

o Summary of state-of-the-world o Guesses at the future o Open Q & A with all speakers

Speaker Biographies

Aaron Lefohn, Ph.D.

Aaron Lefohn is a Senior Graphics Architect at Intel on the Larrabee project. Previously, he designed parallel programming models for graphics as a Principal Engineer at Neoptica, a computer graphics startup that was acquired by Intel in October 2007. Aaron's Ph.D. in Computer Science from the University of California Davis focused on data structure abstractions for graphics processors and data-parallel algorithms for rendering. From 2003 - 2006, he was a researcher and graphics software engineer at Pixar Animation Studios, focusing on interactive rendering tools for artists and GPU acceleration of RenderMan. Aaron was formerly a theoretical chemist and was an NSF graduate fellow in computer science.

Aaron Lefohn 2700 156th Ave NE, Suite 300 Bellevue, WA 98007 425-881-4891 [email protected]

Mike Houston, Ph.D.

Mike Houston is a System Architect in the Advanced Technology Development group at AMD in Santa Clara working in architecture design and programming models for parallel architectures. He received his Ph.D. in Computer Science from Stanford University in 2008 focusing on research in programming models, algorithms, and runtime systems for parallel architectures including GPUs, Cell, multi-core, and clusters. His dissertation includes the Sequoia runtime system, a system for programming hierarchical memory machines. He received his BS in

Page 5: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Computer Science from UCSD in 2001 and is a recipient of the Intel Graduate Fellowship.

Mike Houston 4555 Great America Parkway Suite 501 Santa Clara, CA 95054 408-572-6010 [email protected],

David Luebke, Ph.D.

David Luebke is a Research Scientist at NVIDIA Corporation, which he joined after eight years on the faculty of the University of Virginia. He has a Ph.D. in Computer Science from the University of North Carolina and a B.S. in Chemistry from the Colorado College. Luebke's research interests are GPU computing and realistic real-time computer graphics. Recent projects include advanced reflectance and illumination models for real-time rendering, image-based acquisition of real-world environments, temperature-aware graphics architecture, and scientific computation on GPUs. Past projects include leading the book "Level of Detail for 3D Graphics" and the Virtual Monticello museum exhibit at the New Orleans Museum of Art.

David Luebke 1912 Lynchburg Dr. Charlottesville, VA 22903 434-409-1892 [email protected]

Jon Olick

Jon Olick has been working on games and games technology professionally for the past 9 years, helping to create multi-million selling games such as Medal of Honor: Allied Assault. Currently, Jon is a Programmer and Engineer at id Software. Previously, on the ICE team (a technology group based at Naughty Dog) he played key architecture roles in the design and development of many Sony first and third party tools and technologies for PlayStation 3. This work has already been used in game titles such as Uncharted: Drakes Fortune, Resistance: Fall of Man, Heavenly Sword, NBA 07, Warhawk, MotorStorm and MLB 2007. He is a published author in the book GPU Gems 2. Jon Olick 3819 Towne Crossing #222 Mesquite, TX 75150

Page 6: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

918-407-3744 [email protected]

Fabio Pellacini, Ph.D.

Fabio Pellacini is an assistant professor in computer science at Dartmouth College. His research focuses on algorithms for interactive, high-quality rendering of complex environments and for artist-friendly material and lighting design to support more effective content creation. Prior to joining academia, Pellacini worked at Pixar Animation Studios on lighting algorithms, where he received credits on various movie productions. Pellacini received his Laurea degree in physics from the University of Parma (Italy), and his M.S. and Ph.D. in computer science from Cornell University. Fabio Pellacini 6211 Sudikoff Lab Hanover, NH 03755 603-646-8710 [email protected]

Page 7: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Beyond Programmable Shading:Beyond Programmable Shading: In ActionIn Action

Mike HoustonMike HoustonAMDAMD

Page 8: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

2Beyond Programmable Shading: In Action

Disclaimer about these Course NotesDisclaimer about these Course Notes

• The material in this course is bleeding edge– Unfortunately, that means we can’t share most of

the details with you until SIGGRAPH 2008– Several talks are missing from the submitted notes– The talks that are included may change substantially

• To address this inconvenience– We will post all course notes/slides on a permanent

web page, available the first day of SIGGRAPH 2008

Page 9: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

3Beyond Programmable Shading: In Action

Review from course 1 (Fundamentals)Review from course 1 (Fundamentals)

Future interactive rendering techniques

will be an inseparable mix of data- and task-parallel algorithms

and graphics pipelines

Page 10: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

4Beyond Programmable Shading: In Action

How do we write new interactive 3D rendering algorithms?

Page 11: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

5Beyond Programmable Shading: In Action

FixedFixed--Function Graphics PipelineFunction Graphics Pipeline

• Writing new rendering algorithms means– Tricks with stencil buffer, depth buffer, blending, …

– Examples• Shadow volumes• Hidden line removal• …

Page 12: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

6Beyond Programmable Shading: In Action

Programmable ShadingProgrammable Shading

• Writing new rendering algorithms means– Tricks with stencil buffer, depth buffer, blending, …– Plus: Writing shaders

– Examples• Parallax mapping • Shadow-mapped spot light• …

Page 13: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

7Beyond Programmable Shading: In Action

Beyond Programmable ShadingBeyond Programmable Shading

• Writing new rendering algorithms means– Tricks with stencil buffer, depth buffer, blending, …– Plus: Writing shaders– Plus: Writing data- and task-parallel algorithms

• Analyze results of rendering pipeline• Create data structures used in rendering pipeline

– Examples• Dynamic summed area table• Dynamic quadtree adaptive shadow map• Dynamic ambient occlusion• …

Page 14: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

8Beyond Programmable Shading: In Action

“Fast Summed-Area Table Generation and its Applications,” Hensley et al., Eurographics 2005

“Resolution Matched Shadow Maps,” Lefohn et al., ACM Transactions on Graphics 2007

“Dynamic Ambient Occlusion and Indirect Lighting,” Bunnell, GPU Gems II, 2005

Page 15: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

9Beyond Programmable Shading: In Action

Beyond Programmable ShadingBeyond Programmable Shading

• Writing new rendering algorithms means– Tricks with stencil buffer, depth buffer, blending, …– Plus: Writing shaders– Plus: Writing data- and task-parallel algorithms

• Analyze results of rendering pipeline• Create data structures used in rendering pipeline

– Plus: Extending, modifying, or creating graphics pipelines

– Examples• PlayStation 3 developers creating hybrid Cell/GPU graphics

pipelines– See upcoming talk from Jon Olick (Id Software)

• Active area of research

Page 16: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

10Beyond Programmable Shading: In Action

Why Why ““Beyond Programmable Shading?Beyond Programmable Shading?””

• Short answer:– The parallel processors in your desktop machine or

game console are now flexible and powerful enough to execute both• User-defined parallel programs and • Graphics pipelines

• …All within 1/30th of a second

Page 17: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

11Beyond Programmable Shading: In Action

This CourseThis Course

• Case studies from game developers, academics, and industry

Page 18: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

12Beyond Programmable Shading: In Action

This CourseThis Course

• Show-casing new interactive rendering algorithms that result in more realistic imagery than is possible using only the pre- defined DX/OpenGL graphics pipeline by

• Combining task-, data-, and/or graphics pipeline parallelism,

• Analyzing intermediate data produced by graphics pipeline,

• Building and using complex data structures every frame, or

• Modifying/extending the graphics pipelines

Page 19: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

13Beyond Programmable Shading: In Action

Speakers (in order of appearance)Speakers (in order of appearance)

• Mike Houston, AMD• Fabio Pellacini, Dartmouth College • Jon Olick, Id Software• Dave Luebke, NVIDIA • Aaron Lefohn, Intel

Page 20: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

14Beyond Programmable Shading: In Action

ScheduleSchedule• Intro 1:45 – 2:00 Houston• Research Case Study

– Interactive Cinematic Lighting 2:00 – 2:30 Pellacini

• Games Case Studies – Current Generation Parallelism in Games 2:30 – 3:00 Olick

– Next Generation Parallelism in Games 3:00 – 3:30 Olick

<Break> 3:30 – 3:45

• Graphics Hardware Vendor Case Studies – NVIDIA 3:45 – 4:15 Luebke– AMD 4:15 – 4:45 Houston– Intel 4:45 – 5:15 Lefohn

• Q & A 5:15+

Page 21: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

interactive cinematic lighting

fabiopellacini

joint work with M. Hasan (Cornell), K. Bala (Cornell), K. Vidimce et al. (Pixar)

Page 22: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

disclaimer: will skip details

ask questions if interested

Page 23: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

digital lighting design

user repeatedly set lighting parameters

feedback: run the rendering algorithm

Page 24: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

digital lighting design

user repeatedly set lighting parameters

feedback: run the rendering algorithm

goal: interactive feedback

Page 25: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

digital lighting design

user repeatedly set lighting parameters

feedback: run the rendering algorithm

goal: intuitive interfaces

Page 26: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

“I have started to work in this field more than 15 years ago. I do not understand why, while computers get faster, it takes us more time than before to make an image”

- jc kalache, dp, pixar

Page 27: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

cinematic rendering

• high geometric complexity105 smooth primitives

• high shading complexity103shaders with 105 ops with 5 GBs textures

• high quality antialiasingno artifacts with depth-of-field, motion blur

• complex lightingindirect illumination, subsurface scattering,environment lighting, ambient occlusion

Page 28: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

cinematic rendering

• high geometric complexity105 smooth primitives

• high shading complexity103shaders with 105 ops with 5 GBs textures

• high quality antialiasingno artifacts with depth-of-field, motion blur

• complex lightingindirect illumination, subsurface scattering,environment lighting, ambient occlusion

Page 29: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

goal: interactive feedback

in complex environments with high accuracy

Page 30: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

cinematic rendering

• high geometric complexity105 smooth primitives

• high shading complexity103shaders with 105 ops with 5 GBs textures

• high quality antialiasingno artifacts with depth-of-field, motion blur

• complex lightingindirect illumination, subsurface scattering,environment lighting, ambient occlusion

Page 31: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

direct indirect

first indirect bounce other indirect bouncesdirect bounce

Page 32: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

[image courtesy of B. Walter]

Page 33: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

complex lighting as many-lights

lighting simulation converted to “virtual” point lights

[image courtesy of B. Walter]

Page 34: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

= Σ

= Σ

= Σ

Page 35: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

~1M

pix

els

~100k lights

Page 36: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel
Page 37: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

~ 0.1 s

~ 30 m

real

tim

eof

flin

e

computing one column

[SIGGRAPH 2005b][SIGGRAPH 2005a]

lpicsimages

Page 38: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

method:

cache

approx.

data-parallel hardware

lpicsimages

Page 39: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

results: artist at work

one column at a time

lpicsimages

Page 40: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

motion blur and depth of field

[Regan-Kelley et al., SIGGRAPH 2007]

lightspeedimages

Page 41: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

impact: interactivity for cinematic lighting changes

artists workflow

Page 42: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

lessons learned

• interactive cinematic lighting is possible• interactive algorithms mostly match offline• hard to match: shader as content is bad!• render-dependent look definition

Page 43: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

10 min 13 min 20 min

brute force too slow

Page 44: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

direct on virtual lights

indirect illumination

cache

[SIGGRAPH 2006]

Page 45: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

method:

precompute (sparsely) matrix compress by signal processing data-parallel sparse multiply

Page 46: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel
Page 47: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

3D virtual lights arrange into image wavelet transform

method: impose a wavelet basis, by clustering, then lossily compress

coherence → sparse light vector

Page 48: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

method: impose same wavelet basis for rowsconstruct sparse images for each columns

coherence → sparse matrixaf

ter w

avel

et x

form

orig

inal

col

umns

lit b

y w

avel

et li

ghts

lit b

y or

igin

al li

ghts

Page 49: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

results

precomputation: 1.6 h

11.4 – 18.7 fps

107k polys

Page 50: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

results

precomputation: 2.5 h

8.5 – 25.8 fps

2.1M polys

Page 51: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

impact: highest quality interactive illumination for

lighting design to date

Page 52: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

lessons learned

• interactive global illumination is possible• interactive algorithms do not match offline• operations: rasterize, shade, other kernels• other kernels are CPU or (painfully) GPU

Page 53: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

required improvement:

dynamic environments with no precomputation

Page 54: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

900

pixe

ls

700 lights

insight:

[SIGGRAPH 2007]

Page 55: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

similarcolumns

intuition: coherence → low rank

Page 56: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel
Page 57: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

renderrepresentatives

final image asweighted sum

Page 58: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

clustercolumns

renderrepresentatives

final image asweighted sum

method:cluster similar columns render representatives

computed sum

Page 59: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

unknownmatrix

render rowson GPU

assemblerows intoreducedmatrix

clusterreducedcolumns

renderrepresentative

columnson GPU

final image asweighted sum ofrepresentative

columns

method:render a few rows,

cluster reduced columns

Page 60: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

sum of column pairs

norms of reduced columns

dist of reduced columns

cost of all

clusters

cluster metric

clustering as unbiased importance sampling

Page 61: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

results

300 rows / 900 cols: ~ 17 s reference: ~ 17 m

2.1 M polys / 100 k lights

Page 62: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

results

complex geometry and arbitrary materials/lights

Page 63: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

comparison to state of the art

> 30x for same quality

insight: data parallel queries

stoc

hast

ic p

oint

sam

plin

g on

CPU

stoc

hast

ic “

cohe

rent

”sa

mpl

ing

on G

PU

Page 64: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

impact: fastest algorithm for previewing accurate lighting

Page 65: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

next: geometric scalability (cities, forests, crowds in secs)

Page 66: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

take home message

• shaders are not content (renderer lock)• complex lighting does not map to GL/DX• GL/DX are great for other applications• for us, GL/DX complexity is not useful• for us, GL/DX constraints’ stifle innovation

Page 67: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

questions?

Page 68: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Current Generation Parallelism Current Generation Parallelism In GamesIn Games

Jon Jon OlickOlickid Softwareid Software

Presenter
Presentation Notes
Hi, my name is Jon Olick. I’d first like to talk about the history of parallelism with regards to games. Then take a segue into what games are currently doing with parallelism on the Cell Processor, and follow up with our plans for the next generation.
Page 69: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Brief History of ParallelismBrief History of Parallelism

• 1 Processor– The good old days.– Why parallelize? Just wait a little and your programs

will get faster.

Presenter
Presentation Notes
The good old days where computers doubled in performance every 18 months stopped in about 2003 or 2004. So before then, why parallelize your programs? Just wait until the next faster sequential processor to come out. Unfortunately this is no longer the case. Had this trend continued, you would have computers about 3 or 4 times faster than there are today.
Page 70: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Brief History of ParallelismBrief History of Parallelism

• 2 to 3 Processors– Logical splitting of game process into pipelined

pieces.• Game• Rendering• Sound• Loading/Decompression

Presenter
Presentation Notes
So now parallelism is the way to make your programs run faster. For games this first step is a pretty straight forward logical splitting of the game process into pipelined pieces. The first typical step is to have a game and rendering thread. Next comes sound and in some cases loading and decompression. At this point though, the game is pipelining for parallelism.
Page 71: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Brief History of ParallelismBrief History of Parallelism

• About 6 to 8 Processors– The transition to a job scheduling type architecture– 1st order parallelism

• Game• Rendering• Sound• Physics• Collision• Loading/Decompression• Etc…

Presenter
Presentation Notes
The pipelining continues for the most part until about 6 processors or so. The 4 or 5 processors range is kind of like a no-mans land of parallelism. The overhead of doing job based parallelism doesn’t really outweigh the cost in most cases. At 6 processors though, the transition to a more data parallel approach really takes root. I like to call this 1st order parallelism.
Page 72: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Brief History of ParallelismBrief History of Parallelism

• About 8 to 16 Processors– End of CPU history.– Enter 1998 in GPU history.

• Approx # of processors as average parallel scalar operations.

– 2nd order parallelism– Jobs which create and manage the resources of

other jobs.• GPU Command Processor (DMA engine)

Presenter
Presentation Notes
At 8 to 16 processors we have reached the end of CPU history, but we can enter the year 1998 of GPU history and approximate the number of equivalent CPU processors by the number of average parallel scalar operations. At this point a transition to 2nd order parallelism takes place. Which means making a job which creates and manages other jobs. In a GPU, the manager job is the command processor which spawns pixel jobs.
Page 73: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Brief History of ParallelismBrief History of Parallelism

• About 16+ processors– 3rd order parallelism– Jobs which create and manage the resources of

other jobs which create and manage the resources of other jobs• GPU Vertex Processors

Presenter
Presentation Notes
At 16 or more processors we enter the realm of current day GPUs and 3rd order parallelism. GPUs entered this realm when they added vertex transformation to their processing pipeline. So we have a command buffer which specifies groups of work to do which is spawns off vertex jobs which spawns off pixel jobs.
Page 74: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Brief History of ParallelismBrief History of Parallelism

Presenter
Presentation Notes
So here we have a chart showing the order of parallelism of GPUs with respect to the approximate number of equivalent CPU processors. As you can see GPUs quickly reached 3 orders of parallelism which is the most a GPU would need for typical rasterization. CPUs may follow a similar path with regards to orders of parallelism, but probably to different ends and with a more generic spin on it. This outcome may be likely if the CPU would be used to do graphics or graphics related tasks.
Page 75: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Current State of ParallelismCurrent State of Parallelism

• Desktop Processors– Intel Core 2 Quad - 4 processors - 3.2 ghz - 51.2

Gflops (102.4 Gflops theoretical)• Soon to be 8 core?

• Multimedia Processors– Cell Processor - 8 processors - 3.2 ghz - 134.4 Gflops

(192.0 Gflops theoretical)• 1 main, 7 co-processors

• Graphics Accelerators– 8800 GTX - 1.35 ghz - 345.6 Gflops (518.43 Gflops

theoretical)• 128 stream processors

Presenter
Presentation Notes
So this is the current state of parallelism. We have desktop processors at 4 cores which are easy to program and easy to debug. We have multimedia processors which are hard to program and hard to debug, but you get roughly twice the processing power there to possibly take advantage of. At the other end of the spectrum, we have graphics accelerators. Which are easy to program and decent to debug, but has some fixed function restrictions that limits its usefulness.
Page 76: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

THE CELL PROCESSORTHE CELL PROCESSOR

Presenter
Presentation Notes
For the next 20 minutes or so I’m going to focus on the middle ground. The Cell Processor.
Page 77: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

PLAYSTATIONPLAYSTATION®®3 Cell Processor Overview3 Cell Processor Overview

• Game• Animation• Geometry Processing• Post Processing• Occlusion Rasterization• Sorting• Collision Detection• Fourier Transform• (De)Compression• Not going to cover all of these…

Presenter
Presentation Notes
This is a list of the various things people are doing with the Cell. They can be divided into two main groups.
Page 78: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

PLAYSTATIONPLAYSTATION®®3 Cell Processor Overview3 Cell Processor Overview

• Parallelize ordinarily sequential CPU processing

• Assist in what is typically considered GPU processing

Presenter
Presentation Notes
Jobs that parallelize ordinarily sequential tasks and jobs that assist the GPU in its processing.
Page 79: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

PLAYSTATIONPLAYSTATION®®3 Cell Processor Overview3 Cell Processor Overview

• Fitting code and data in the 256k local co- processor memory

• Best solutions are ones that don't treat the 256k local store as a typical on demand caching architecture– Scattered reads bad, sequential reads good

• Software Pipelining• Only 16 bytes aligned reads/writes• Synchronization

Presenter
Presentation Notes
The primary programming challenges when writing code for the Cell Processor is first breaking up your workloads so that they fit in the 256k SPU local store. This is instead of treating the local store as an on demand cache. Second, due to the high latency of the instructions, software pipelining can play a big role in the performance of a low level kernel. Third, any reads or writes to the local store must be aligned to 16 bytes. Finally and most importantly, synchronization between the various processors of the system.
Page 80: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 ANIMATION PROCESSINGMD6 ANIMATION PROCESSING

Presenter
Presentation Notes
First, highlighting some typically sequential work, is the animation processing.
Page 81: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

GameLogic

MD6 Animation ProcessingMD6 Animation Processing

Presenter
Presentation Notes
In a typical game you have your game logic.
Page 82: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Blending TreeGeneration

GameLogic

MD6 Animation ProcessingMD6 Animation Processing

Presenter
Presentation Notes
Which creates animation blending trees.
Page 83: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Low Level OperationList Generation

Blending TreeGeneration

GameLogic

MD6 Animation ProcessingMD6 Animation Processing

Presenter
Presentation Notes
And from those trees come an blending operation list
Page 84: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Low Level Operation Execution

Low Level OperationList Generation

Blending TreeGeneration

GameLogic

MD6 Animation ProcessingMD6 Animation Processing

Presenter
Presentation Notes
And from that list comes the actual execution of the data.
Page 85: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Low Level Operation Execution

Low Level OperationList Generation

Blending TreeGeneration

GameLogic

Serial

Parallel

MD6 Animation ProcessingMD6 Animation Processing

Presenter
Presentation Notes
So what we are doing is making everything but the game logic work in parallel here.
Page 86: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

• Additive Blending• Subtractive Blending• Animation Algebra

– Blend Equations• Animation blending trees in the form of an equation.• Example equation:

– (animA + animB) – animC

MD6 Animation ProcessingMD6 Animation Processing

Presenter
Presentation Notes
Some of the features of the animation tree processing are additive and subtractive blending. These operations create an algebra of operations which we express in the form of an equation so that our artists can work with them easier.
Page 87: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Partial Animation BlendingPartial Animation Blending

• Generalized play an animation only on the face, torso, etc…

• One weight per joint per animation• Compute alpha for slerp via following

equation:– For each joint

• Let w0 = weight of joint in animation A• Let w1 = weight of joint in animation B• If(w1 > w0)

– Let alpha = (alpha * w1) / w0

• Else– Let alpha = ((w1 – w0) + alpha * w0) / w1

Presenter
Presentation Notes
There is also support for abstract partial animation blending where you have a weight per joint in each animation. This is for playing an animation only on the face or the torso of a character, however it works seamlessly with the rest of the animation system without the need for specific configuration of the blending process.
Page 88: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Varying parameter treatmentVarying parameter treatment

Presenter
Presentation Notes
To generate the data we sample the joints of an animations at regular intervals.
Page 89: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Varying parameter treatmentVarying parameter treatment

Presenter
Presentation Notes
We then remove any samples which can be approximated through interpolation.
Page 90: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Varying parameter treatmentVarying parameter treatment

16k 16k 16k

Presenter
Presentation Notes
And finally split up this information into pieces small enough to fit in the local store of a SPU.
Page 91: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

• Separates Thinking from Representation– Game Object says what it wants to look like.– Animation Webs take care of the rest.

• Unstructured graph– Each node has a blend tree

• Designed with simplicity in mind– Animators should animate, not fiddle with nodes. – Extract as much information as possible directly

from the animation data.

Presenter
Presentation Notes
To generate those blend trees there is animation webs. This has the great property of separating the thinking of the AI from the display of the AI. The AI simply specifies a goal and the webs are responsible for making it look good getting there. It is a unstructured graph where each node in the graph is a blending equation. The primary design goal is to be simple as we want animators to be animating and not fiddling with nodes. To this end, the webs extract as much information as possible from the animation data itself. So just very quickly, lets show the power of iteration within this framework.
Page 92: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Presenter
Presentation Notes
We start with a single node.
Page 93: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

= “play standAnim”

Presenter
Presentation Notes
Which has a very simple blend equation which just plays an animation.
Page 94: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

You Are Here

Desired State = Stand•

Blend Equation = “play standAnim”

Presenter
Presentation Notes
For this example, the red node is where you are at. The desired state is information from the game logic which says where we want to go. Lets add some more nodes to make this more interesting.
Page 95: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Desired State = Stand•

Blend Equation = “play standAnim”

Presenter
Presentation Notes
walk
Page 96: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Desired State = Stand•

Blend Equation = “play standAnim”

Presenter
Presentation Notes
And run.
Page 97: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Desired State = WalkWalk•

Blend Equation = “play standAnim”

Presenter
Presentation Notes
Now if we change the destination to the walk node, the web will start moving the character through the animation web to the destination via path finding.
Page 98: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Desired State = Walk•

Blend Equation = “blend standAnim and walkAnim”

Presenter
Presentation Notes
However, we cannot immediately play the walk animation without first transitioning to it. When we are traversing this edge of the graph the blending equations from both nodes are combined to make the transition smooth.
Page 99: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Desired State = Walk•

Blend Equation = “play walkAnim”

Presenter
Presentation Notes
And now we have reached our destination. So the real power in this system comes when we start adding additional nodes, which the game logic code knows nothing about.
Page 100: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Desired State = Stand•

Blend Equation = “play standAnim”

Presenter
Presentation Notes
Say we wanted to add a transition animation between stand and walk to make things look more realistic. Lets take the next step here and add two different animations. One for if the direction he wants to run is in front of him and another one if the direction he wants to run is behind him.
Page 101: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Desired State = Stand•

Blend Equation = “play standAnim”

Presenter
Presentation Notes
The directional information is extracted from the animation and the webs will automatically choose the correct path no matter the situation.
Page 102: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Desired Traversal = Stand to Walk•

Desired Direction = ~5 degrees

Presenter
Presentation Notes
So going from a standing pose to running directly forward.
Page 103: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Desired Traversal = Stand to Walk•

Desired Direction = ~175 degrees

Presenter
Presentation Notes
And going from a standing pose to running in the opposite direction he is facing.
Page 104: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Desired State = Run

Presenter
Presentation Notes
Another typical example is an AI running to cover behind a table or a low wall.
Page 105: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Desired Traversal = Run to Cover••

Mechanical Behavior Mechanical Behavior –– Not believableNot believable

Presenter
Presentation Notes
Typically this would involve running to the location. Stopping. Crouching. And finally playing the cover animation. This however looks very mechanical and could use some improvement.
Page 106: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Desired State = Run

Presenter
Presentation Notes
We just add another node to handle this situation much more realistically. Realistically though, a single animation cannot possibly fit all stopping distances.
Page 107: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 MD6 Animation WebsAnimation Webs

Desired State = Run

Presenter
Presentation Notes
So you can add multiple nodes with different animations that support many different distances. The best path through the web will always be taken automatically by examining the travel distance in the animation itself.
Page 108: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

MD6 Animation WebsMD6 Animation Webs

• Automatically picks best animation for search criteria

• Time warps animation to perfectly match criteria

• Works in parallel with game code

Presenter
Presentation Notes
So to wrap this up. Optimal paths through the web are generated based on your search criteria, and the animations can be time warped to exactly match the desired motion.
Page 109: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

GEOMETRY PROCESSINGGEOMETRY PROCESSING

Presenter
Presentation Notes
To assist the GPU we have a geometry processing system which has a runtime and a tools component...
Page 110: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Two modes of usageTwo modes of usage

• Primary mode– Use offline tools– Partition into vertex sets– Use indexed triangles– All features of pipeline can be used

SPU

Presenter
Presentation Notes
There are two different ways to use the system.   The first way is to integrate the offline tools into your pipeline. This splits the game’s geometry into small chunks, called “vertex sets”. The size of a vertex set is small enough to fit on an SPU – typically between 500 to 1500 vertexes. The geometry tools output indexed triangles, because they have the highest RSX performance. The vertex sets which pass through the tools support all of our features. <slide> The geometry pipeline processes these vertex sets as discrete entities, using one or more SPUs.
Page 111: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Two modes of usage (cont)Two modes of usage (cont)

• Secondary mode– Data generated by other tools– Formats other than indexed triangles– Non-partitioned objects– Subset of pipeline features can be used

SPU

Presenter
Presentation Notes
The second way is to use objects that are not prepared with offline tools. These are either non-indexed or indexed objects. Objects without indexes occur when creating geometry on the fly. <slide> If the object is too large to fit on an SPU, we use a streaming like technique. With this, the pipeline reads the object’s data in SPU sized pieces, processes the data, and then reassembles those pieces on output. With streaming, all the vertex operations of the geometry pipeline, such as skinning and blend shapes can be performed, but the triangle operations such as triangle culling cannot. Nevertheless, this method is very useful, as it allows you to get the geometry system working first, and integrate the tools later.
Page 112: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

SPU Geometry Pipeline StagesSPU Geometry Pipeline Stages

Index Decompress

Blend Shapes

Skinning

Triangle Culling

Compression

Vertex Decompress

Output

SPU Pipeline

Progressive Mesh

Presenter
Presentation Notes
Here are the stages of the geometry processing pipeline.
Page 113: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Vertex DecompressionVertex Decompression

Index Decompress

Blend Shapes

Skinning

Triangle Culling

Compression

Vertex Decompress

Output

SPU Pipeline

Progressive Mesh

Presenter
Presentation Notes
The first stage is decompression of vertexes.
Page 114: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Vertex AttributesVertex Attributes

Unique VertexArray 0Unique VertexArray 0

Instance VertexArray 1Instance VertexArray 1

Presenter
Presentation Notes
This takes in one or two vertex arrays in a variety of formats, each potentially interleaving data of a few different types,
Page 115: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Vertex DecompressionVertex Decompression

Float TablesFloat TablesUnique VertexArray 0Unique VertexArray 0

Instance VertexArray 1Instance VertexArray 1

Presenter
Presentation Notes
and then converts them into separate tables of floating point values. The processing pipeline supports all native RSX formats as well as some compressed formats.
Page 116: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

24bit Unit Vector24bit Unit Vector

• Smallest 2 compression– Two smallest components with 10 bits each

• Encoded from –sqrt(2)/2 to +sqrt(2)/2– Largest component reconstructed via

• Largest = sqrt(1 – smallestA2 – smallestB2) • One additional bit for sign of largest component.

Presenter
Presentation Notes
For normals and tangents, there is a 24 bit unit vector format that increases the precision per component
Page 117: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

24bit Unit Vector24bit Unit Vector

• Smallest 2 compression– Two smallest components with 10 bits each

• Encoded from –sqrt(2)/2 to +sqrt(2)/2– Largest component reconstructed via

• Largest = sqrt(1 – smallestA2 – smallestB2) • One additional bit for sign of largest component.

• One more bit to represent W as +1 or -1– For constructing bi-normal from normal and tangent.

Presenter
Presentation Notes
and supports a handedness bit for defining the tangent frame.
Page 118: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

NN--bit Fixed Point bit Fixed Point with integer offsetswith integer offsets

• Simple n.x fixed point values– Per-segment integer offset

• Bit count may vary from attribute to attribute

Presenter
Presentation Notes
For positions and texture coordinate like attributes, there is a fixed point format with an integer offset, where the bit count can vary from attribute to attribute. These techniques usually remove about a quarter of the vertex data.
Page 119: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Index DecompressionIndex Decompression

Index Decompress

Blend Shapes

Skinning

Triangle Culling

Compression

Vertex Decompress

Output

SPU Pipeline

Progressive Mesh

Presenter
Presentation Notes
The second stage of the pipeline decompresses index tables. As I mentioned earlier, the pipeline supports indexed triangles as the primary format, because indexed triangles have a higher performance than the other triangle formats such as indexed triangle strips.
Page 120: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Index Table ConstructionIndex Table Construction

• Index table is created by a vertex cache optimizer– Based on K-cache algorithm

• Number of vertex program outputs affects Vertex Cache size.

• Four vertex mini cache most important optimization factor

Presenter
Presentation Notes
But, when using indexed triangles, the order of indexes has a significant impact on performance. The assignment of indexes to vertexes and the ordering of triangles in a model typically happen with a tools-side vertex cache optimizer. There is a four vertex mini-cache that the RSX uses in primitive assembly – this four vertex LRU is usually the most important factor to consider when optimizing index data.
Page 121: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Strip ExampleStrip Example

3 new vertices

3

Page 122: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Strip ExampleStrip Example

3

1

1 new vertex

Page 123: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Strip ExampleStrip Example

3

11

1 new vertex

Page 124: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Strip ExampleStrip Example

3

11

1

1 new vertex…

2 vertices + 1 per triangle in total

Page 125: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Free Form ExampleFree Form Example

3 new vertices

3

Page 126: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Free Form ExampleFree Form Example

1 new vertex

3

1

Page 127: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Free Form ExampleFree Form Example

1 new vertex

3

1

1

Page 128: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Free Form ExampleFree Form Example

1 new vertex

3

1

1

1

Page 129: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Free Form ExampleFree Form Example

1 new vertex

3

1

1

1

1

Page 130: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Free Form ExampleFree Form Example

0 new vertices!

3

1

1

1

1

0

Page 131: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Free Form ExampleFree Form Example

1 new vertex

3

1

1

1

1

0

1

Page 132: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Free Form ExampleFree Form Example

1 new vertex

3

1

1

1

1

0

1

1

Page 133: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Free Form ExampleFree Form Example

1 new vertex

3

1

1

1

1

0

1

1

1

Page 134: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Free Form ExampleFree Form Example

0 new vertices…

3

1

1

1

1

0

1

1

1

0

2 vertices + 3 per 4 triangles in total

Page 135: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Index Cache OptimizerIndex Cache Optimizer

• Our vertex cache optimizer produces very regular index data

Presenter
Presentation Notes
The index cache optimizer we use produces very regular index data.
Page 136: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Index DecompressionIndex Decompression

• Provided vertex cache optimizer produces very regular index data

Presenter
Presentation Notes
the resulting triangles are typically only simple combinations of the previous triangle and a single new vertex.
Page 137: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Index DecompressionIndex Decompression

0

1

2

Triangle Indexes

0 21

Presenter
Presentation Notes
Also, indexes can be rotated within a triangle without affecting its appearance. Effectively, these are all the same triangle.
Page 138: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Index DecompressionIndex Decompression

2

0

1

Triangle Indexes

2 10

Page 139: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Index DecompressionIndex Decompression

Presenter
Presentation Notes
Here is the same index buffer, but after rotating the triangles to common patterns.
Page 140: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Index DecompressionIndex Decompression

Presenter
Presentation Notes
There are very few resulting patterns that need to be represented, so the compression ratio can be quite high.
Page 141: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Index DecompressionIndex Decompression

85% compression6.5 : 1

Presenter
Presentation Notes
We’re typically removing about 85% of the index data in this fashion.
Page 142: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Blend ShapesBlend Shapes

Index Decompress

Blend Shapes

Skinning

Triangle Culling

Compression

Vertex Decompress

Output

SPU Pipeline

Progressive Mesh

Presenter
Presentation Notes
The pipeline also supports blend shapes, which is to say, additive vertex blending. Arbitrary numbers of blend shapes can be combined to form a character, and since quite often most vertexes in a blend shape match those of the base mesh, vertex data compression ratios can get very high.  
Page 143: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Blend Shapes in MLB 08: The ShowBlend Shapes in MLB 08: The Show

Presenter
Presentation Notes
Here is an example of blend shape for cloth simulation in MLB 08, The Show.
Page 144: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

SkinningSkinning

Index Decompress

Blend Shapes

Skinning

Triangle Culling

Compression

Vertex Decompress

Output

SPU Pipeline

Progressive Mesh

Presenter
Presentation Notes
Next in the pipeline is four bone skinning, I believe many teams are currently performing skinning on the SPUs rather than the RSX, and there are two large benefits of doing this. First, you pull
Page 145: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Skinning on Skinning on SPUsSPUs

void SkinVs(float4 inPosition : ATTR0, float4 weights : ATTR3,float4 matrixIndex : ATTR4,

out float4 position : POSITION, uniform float4 joints[72], uniform float4x4 modelViewProj)

{position = 0;

for (int i = 0; i < 4; i++) {

float idx = matrixIndex[i];

float3x4 joint = float3x4(joints[idx+0], joints[idx+1], joints[idx+2]);

position += weights[i] * mul(joint, inPosition);}

position = mul(modelViewProj, position);}

Presenter
Presentation Notes
this much code from your vertex program. Also, you can save any time that the RSX takes to read in the skinning weights and indexes.
Page 146: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Skinning on SPUsSkinning on SPUs

30% Performance Improvement

Presenter
Presentation Notes
Shifting this work to the SPUs typically results in about a 30% speed boost on the RSX,
Page 147: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Skinning on SPUsSkinning on SPUs

30% Performance Improvement

Shadow map generation.... 70%!

Presenter
Presentation Notes
but in situations where the graphical processing is simpler, such as shadow map generation, we’ve seen performance gains as high as 70%.
Page 148: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Discrete Progressive MeshDiscrete Progressive Mesh

• Smoothly reduces the triangle count as a model moves into the distance

• With discrete progressive mesh, the LOD calculation is done once for an entire object

Index Decompress

Blend Shapes

Skinning

Triangle Culling

Compression

Vertex Decompress

Output

SPU Pipeline

Progressive Mesh

Presenter
Presentation Notes
The SPU geometry pipeline contains two progressive mesh algorithms, both of which use smooth transformations to reduce the triangle count of an object as it moves into the distance. We chose this kind of smooth progressive mesh algorithm so the LOD transitions can occur up close to the camera. The first of these is discrete progressive mesh. It is called “discrete” because it is used on discrete objects, such as characters or standalone objects within a scene. These discrete objects morph smoothly, as an entire object, from one LOD to the next. Here is a short description of the algorithm.
Page 149: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

At an LOD there are two types of At an LOD there are two types of vertexesvertexes

LOD = 0.0

Parent Vertex

Child Vertex

Presenter
Presentation Notes
First, every LOD of a vertex set contains two types of vertexes – parent vertexes and child vertexes. Within a LOD, parent vertexes don’t move.
Page 150: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

As the LOD level decreases, the As the LOD level decreases, the children children ““slideslide”” towards their parentstowards their parents

LOD = 0.2

Parent Vertex

Child Vertex

Presenter
Presentation Notes
However, as the LOD starts to decrease, the child vertexes start to “slide” towards their parents.
Page 151: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

The children continue to move towards The children continue to move towards their parentstheir parents

LOD = 0.7

Parent Vertex

Child Vertex

Presenter
Presentation Notes
As the LOD continues to decrease, the children move even closer to their parents until finally
Page 152: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

At the next integral LOD, all child At the next integral LOD, all child vertexes disappear as do the trianglesvertexes disappear as do the triangles

LOD = 1.0

Parent Vertex

Child Vertex

Presenter
Presentation Notes
the next integral LOD is reached where the child vertexes and triangles are removed.
Page 153: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Continuous Progressive MeshContinuous Progressive Mesh

• Like discrete progressive mesh, child vertexes move smoothly toward their parents

• However, the LOD is calculated for each vertex instead of just once for the object

Index Decompress

Blend Shapes

Skinning

Triangle Culling

Compression

Vertex Decompress

Output

SPU Pipeline

Progressive Mesh

Presenter
Presentation Notes
Continuous progressive mesh is useful for ground terrain or any other object that covers a large distance. Like the discrete version, it reduces triangle counts smoothly through progressive approximation. However, with continuous progressive mesh, the object can span many different lod levels at the same time. So we must calculate the LOD level for each vertex.
Page 154: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Vertex set about to undergo continuous Vertex set about to undergo continuous progressive meshprogressive mesh

Parent Vertex

Child Vertex, LOD 1

Child Vertex, LOD 0

Presenter
Presentation Notes
Here is the same mesh from the previous example, but this time it is going to undergo continuous progressive mesh.
Page 155: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

A single vertex set can straddle several A single vertex set can straddle several LOD rangesLOD ranges

LOD = 1.0

Parent Vertex

Child Vertex, LOD 1

Child Vertex, LOD 0

LOD = 0.0

Presenter
Presentation Notes
As you can see, an object can straddle many LOD ranges at once, with some vertexes in LOD 0, some in LOD 1, and even some in LOD 2. But this time,
Page 156: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Vertexes move depending on their Vertexes move depending on their distancedistance

Parent Vertex

Child Vertex, LOD 1

Child Vertex, LOD 0

LOD = 1.0

LOD = 0.0

Presenter
Presentation Notes
each vertex moves based its own individual distance from the camera.
Page 157: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Triangle CullingTriangle Culling

Index Decompress

Blend Shapes

Skinning

Triangle Culling

Compression

Vertex Decompress

Output

SPU Pipeline

Progressive Mesh

Presenter
Presentation Notes
Triangle culling is supported in the pipeline.
Page 158: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Up to Up to 70%70% of triangles do not contribute to of triangles do not contribute to final image.final image.

Presenter
Presentation Notes
60 to 70 percent of triangles typically don’t result in any renderable area.
Page 159: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Off Screen TrianglesOff Screen Triangles

Presenter
Presentation Notes
Some are offscreen,
Page 160: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Back Facing TrianglesBack Facing Triangles

Presenter
Presentation Notes
some don’t face the camera,
Page 161: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Zero Area TrianglesZero Area Triangles

Presenter
Presentation Notes
some are degenerate,
Page 162: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Zero Area TrianglesZero Area Triangles

Presenter
Presentation Notes
And if we zoom in on this box here…
Page 163: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

No Pixel TrianglesNo Pixel Triangles

Presenter
Presentation Notes
We will find triangles which don't hit a pixel center.
Page 164: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Triangle CullingTriangle Culling

Presenter
Presentation Notes
The triangle cull stage of the SPU pipeline checks each triangle and produces a new index table, containing only those triangles that pass these tests.
Page 165: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Multisampling adds some complicationsMultisampling adds some complications……

Presenter
Presentation Notes
Multisampling adds some complications,
Page 166: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

CulledCulled

Presenter
Presentation Notes
but even then it is possible to do a good job of the pixel center test.
Page 167: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Triangle CullingTriangle Culling

10% to 20%Performance Improvement

Presenter
Presentation Notes
Overall, the performance improvements in a balanced scene are about 10 to 20 percent --
Page 168: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Compression for OutputCompression for Output

Index Decompress

Blend Shapes

Skinning

Triangle Culling

Compression

Vertex Decompress

Output

SPU Pipeline

Progressive Mesh

Presenter
Presentation Notes
The final stage of the pipeline prepares the data that the RSX will use.
Page 169: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Float TablesFloat Tables

Presenter
Presentation Notes
The geometry pipeline processes the attributes as independent tables of floats,
Page 170: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

When done, the vertex attributes are When done, the vertex attributes are compressed into one output streamcompressed into one output stream

OutputVertex ArrayOutputVertex Array

Float TablesFloat Tables

Presenter
Presentation Notes
in this stage the attribute data is converted into RSX friendly formats and interleaved into one table for output.
Page 171: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Output Buffering SchemesOutput Buffering Schemes

• Vertex and index data constructed by the SPUs is output from SPU local store

• Holes in the RSX command buffer are patched with pointers to the vertex and index data as well as the draw commands

Index Decompress

Blend Shapes

Skinning

Triangle Culling

Compression

Vertex Decompress

Output

SPU Pipeline

Progressive Mesh

Presenter
Presentation Notes
The last and final stage in the pipeline is output and synchronization.
Page 172: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Double BufferDouble Buffer

• Each buffer stores vertex and index data for an entire frame

• SPUs atomically access a mutex which is used to allocate memory from a buffer

• Easy synchronization with the RSX™ once a frame

• Uses lots of memory

VertexandIndexDataforFrame 0

VertexandIndexDataforFrame 0

VertexandIndexDataforFrame 1

VertexandIndexDataforFrame 1

Presenter
Presentation Notes
The simplest of these output buffering schemes is the double buffer. In this scheme there are two large buffers in main memory, and each buffer stores all of the vertex and index data created by the geometry processing pipeline on the SPUs for an entire frame. An SPU allocates some space by atomically accessing and updating a mutex. This inter SPU synchronization for allocation is fine for 8 processors, but breaks down when you have more than that.
Page 173: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

It is possible to completely fill a bufferIt is possible to completely fill a buffer

• Can use a callback to allocate new memory (which you may not have)

• Don’t draw geometry that doesn’t fit (difficult to pick which geometry not to draw)

SPUSPUData

VertexandIndexData

VertexandIndexData

Presenter
Presentation Notes
Another problem with outputting a variable amount of data into a fixed size buffer is that you can run out of space. But that’s not all,
Page 174: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Double buffering adds a frame of lagDouble buffering adds a frame of lag

Build Jobs on PPU Build Jobs on PPU

Process Jobs on SPU Process Jobs on SPU

Render on RSX™ Render on RSX™ Scan OutScan Out

Build Jobs on PPU Build Jobs on PPU

Process Jobs on SPU Process Jobs on SPU

Render on RSX™ Render on RSX™ Scan OutScan Out

Build Jobs on PPU Build Jobs on PPU

Process Jobs on SPU Process Jobs on SPU

Render on RSX™ Render on RSX™ Scan OutScan Out

Presenter
Presentation Notes
it also adds a frame of lag to your rendering pipeline making user input less responsive. So lets start fixing some of these draw backs.
Page 175: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Single BufferingSingle Buffering

• Uses only half the memory!• Still possible to completely

fill the bufferVertexandIndexDataforSingleFrame

VertexandIndexDataforSingleFrame

Presenter
Presentation Notes
Switching to a single buffering scheme solves the frame lag problem
Page 176: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Single Buffering has a shorter pipelineSingle Buffering has a shorter pipeline

• Vertex and index data is created just-in-time for the RSX™

• Draw commands are inserted into the command buffer while the RSX™ is rendering

• Requires tight SPU↔RSX™

synchronization

Build Jobs on PPU Build Jobs on PPU

SPU Processing/ RSX™ Rendering SPU Processing/ RSX™ Rendering Scan OutScan Out

Build Jobs on PPU Build Jobs on PPU

SPU Processing/ RSX™ Rendering SPU Processing/ RSX™ Rendering Scan OutScan Out

Build Jobs on PPU Build Jobs on PPU

SPU Processing/ RSX™ Rendering SPU Processing/ RSX™ Rendering Scan OutScan Out

Presenter
Presentation Notes
But this now requires tight synchronization with the RSX
Page 177: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

SPUSPU↔↔RSXRSX™™ Synchronization Synchronization Using Local StallsUsing Local Stalls

CommandBuffer

Draw 17Draw 17

Local StallLocal Stall

OtherOther

Local StallLocal Stall

Local StallLocal Stall

Local StallLocal Stall

Local StallLocal Stall

Local StallLocal Stall

Place local stalls in the command buffer where necessaryRSX™ will stop processing at a local stall until it is overwritten by new commandsSPUs will generally stay ahead of the RSX™, so stalls rarely occur

Local StallLocal Stall

Presenter
Presentation Notes
This is solved by placing local stalls in the RSX command buffer which stop the RSX
Page 178: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

SPU will overwrite local stalls when it SPU will overwrite local stalls when it outputs a set of new commandsoutputs a set of new commands

No SPU↔SPU synchronization required!

SPUSPU

CommandBuffer

Draw 17Draw 17

PutPointer

Local StallLocal Stall

OtherOther

Local StallLocal Stall

Local StallLocal Stall

Local StallLocal Stall

Local StallLocal Stall

Local Stall

Local StallNew Commands New Commands

Presenter
Presentation Notes
until the local stall is overwritten by the SPU. So next up is the running out of memory
Page 179: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Ring BuffersRing Buffers

• Small memory footprint• Will not run out of memory• Can stall the SPUs if buffers

become full• Objects need to be

processed in the same order the RSX™ renders them to prevent deadlock

Vertex and Index

Vertex and Index

DataData

End ofFree Area

Start ofFree Area

Presenter
Presentation Notes
Using a ring buffer will give us a small memory footprint. We can also never run out of memory now. However, this adds a complication where you have to balance the size of the ring buffer to avoid stalls and the RSX now has to report how much memory it has consumed to the SPUs.
Page 180: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

RSXRSX™™ writes a semaphore once a writes a semaphore once a chunk of data has been consumedchunk of data has been consumed

• A command to write a semaphore needs to be added to the command buffer after all commands that use the data– The value of the semaphore to be written is the new

end of free area pointer

Data 19Data 19

Data 6

Data 14Data 14

Start ofFree Area

Draw 6Draw 6

Draw 5Draw 5

Draw 7Draw 7

SemaphoreSemaphore

SemaphoreSemaphore

SemaphoreSemaphore

CurrentRSX™Execution

New End of Free Area

Presenter
Presentation Notes
Luckily the RSX has support for writing semaphores which we can use here to report back the used memory pointer. Now lets get rid of that inter SPU synchronization so they don’t fight over the ring buffer.
Page 181: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Each SPU has its own bufferEach SPU has its own buffer

Data 23Data 23

Data 17Data 17

Data 10Data 10

Data 22Data 22

Data 11Data 11

Data 16Data 16

Data 19Data 19

Data 6Data 6

Data 14Data 14 Data 20Data 20

Data 7Data 7

Data 15Data 15

Data 21Data 21

Data 12Data 12

Data 9Data 9

Data 18Data 18

Data 8Data 8

Data 13Data 13

SPU 0SPU 0

SPU 1SPU 1

SPU 2SPU 2

SPU 3SPU 3

SPU 4SPU 4

SPU 5SPU 5

Buffer 0 Buffer 1

Buffer 3Buffer 2

Buffer 4 Buffer 5

Presenter
Presentation Notes
The fastest synchronization is no synchronization. So lets give each processor its own ring buffer.
Page 182: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Geometry PerformanceGeometry Performance

Presenter
Presentation Notes
Performance of the geometry pipeline is quite good. Cycle counts vary, based on which particular features of the pipeline are used, as well as the particular model being displayed, here are the cycle counts per triangle for one of our test cases.
Page 183: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Software Pipelined C with SPU IntrinsicsSoftware Pipelined C with SPU Intrinsics

do{

m1 = in1;in1 = si_lqx(pIn1, offset);m2 = in2;in2 = si_lqx(pIn2, offset);m3 = in3;in3 = si_lqx(pIn3, offset);temp2 = si_selb(m3, m1, mask_0X00);si_stqx(out1, pOut1, offset);temp3 = si_selb(m2, m1, mask_00X0);si_stqx(out2, pOut2, offset);temp1 = si_selb(m1, m2, mask_0X00);si_stqx(out3, pOut3, offset);offset = si_ai(offset, 0x30);out2 = si_shufb(m2, temp2, qs_bCaD);out1 = si_selb(temp1, m3, mask_00X0);out3 = si_shufb(m3, temp3, qs_caBD);

} while(si_to_int(offset) != 0);

Presenter
Presentation Notes
We wrote the individual pipeline routines using SPU intrinsics. At first party, intrinsics are more popular than hand scheduled assembly for a variety of reasons. They are easier to modify and maintain, and performance with intrinsics is quite good,
Page 184: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Software Pipelined C with SPU IntrinsicsSoftware Pipelined C with SPU Intrinsics

Up to 20x faster

than naive C/C++

do{

m1 = in1;in1 = si_lqx(pIn1, offset);m2 = in2;in2 = si_lqx(pIn2, offset);m3 = in3;in3 = si_lqx(pIn3, offset);temp2 = si_selb(m3, m1, mask_0X00);si_stqx(out1, pOut1, offset);temp3 = si_selb(m2, m1, mask_00X0);si_stqx(out2, pOut2, offset);temp1 = si_selb(m1, m2, mask_0X00);si_stqx(out3, pOut3, offset);offset = si_ai(offset, 0x30);out2 = si_shufb(m2, temp2, qs_bCaD);out1 = si_selb(temp1, m3, mask_00X0);out3 = si_shufb(m3, temp3, qs_caBD);

} while(si_to_int(offset) != 0);

Presenter
Presentation Notes
Loops tend to be about 20x faster than the naive C code from which they are derived. Because the compiler does the scheduling, they have the additional advantage that they can be much more flexible than hand scheduled assembly.
Page 185: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

1 SPU

Presenter
Presentation Notes
In general, with a single SPU
Page 186: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

1 SPU

800,000+ Triangles Per Frame

at 60 Frames per Second

Presenter
Presentation Notes
you can process more than three quarters of a million triangles per frame at sixty frames per second –
Page 187: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

1 SPU

800,000+Triangles Per Frame

at 60 Frames per Second

60% of which are culled!

Presenter
Presentation Notes
while hopefully culling well over 60% of them.
Page 188: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Next Generation ParallelismNext Generation Parallelism In GamesIn Games

Jon Jon OlickOlickid Softwareid Software

Presenter
Presentation Notes
-- 25 minutes to here – So lets move on to the next few years.
Page 189: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

GAME ENTITY PROCESSINGGAME ENTITY PROCESSING

Presenter
Presentation Notes
Entity Processing is by far the most sequential part of the game code.
Page 190: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Game Entity ProcessingGame Entity Processing

• Current Generation– Serial Processing of entities in a giant for loop.

for(int i = 0; i < numEntities; ++i) {entity[i]->Think();

}

Presenter
Presentation Notes
Currently, this is typically a straight forward loop through each game entity to compute its thinking process for the frame.
Page 191: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Game Entity ProcessingGame Entity Processing

• Current Generation– Serial Processing of entities in a giant for loop.

• Next Generation– Parallelism via Double Buffering

Presenter
Presentation Notes
So, for our next generation engine, were planning on parallelizing the thinking process through double buffering the game data. This forces certain rules on game entity processing.
Page 192: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Game Entity ProcessingGame Entity Processing

• Current Generation– Serial Processing of entities in a giant for loop.

• Next Generation– Parallelism via Double Buffering– Each entity can only read from previous frame’s

results

Presenter
Presentation Notes
Each entity can only read from the previous frame’s results.
Page 193: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Game Entity ProcessingGame Entity Processing

• Current Generation– Serial Processing of entities in a giant for loop.

• Next Generation– Parallelism via Double Buffering– Each entity can only read from previous frame’s

results– Each entity can only write to itself– Every entity runs in parallel with each other with no

dependency stalls.

Presenter
Presentation Notes
Each entity can only write to itself. This allows each entity to run in parallel with the other. It also gives other benefits…
Page 194: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Game Entity ProcessingGame Entity Processing

• Record the progress of the game and replay it to debug.

• Single thread and randomize processing of entities to help find bugs.

• Can protect memory so that bad accesses cause exceptions to enforce double buffering rules.

Presenter
Presentation Notes
You can implicitly record the progress of the game, frame by frame and replay any recorded frame to debug it. To help find bugs, you can switch back to a single big for loop and randomize the processing order. To enforce the reading and writing rules you can protect memory.
Page 195: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

RAY CASTINGRAY CASTING

Presenter
Presentation Notes
So for our next generation rendering pipeline, were planning on focusing on ray casting.
Page 196: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Why Ray Casting?Why Ray Casting?

• A good question…

Presenter
Presentation Notes
So first of all... Why ray casting? That’s a very good question.
Page 197: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Presenter
Presentation Notes
Back in Quake 1, if you had to make a decision between having an additional processor or having a GPU, which would you choose?
Page 198: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

• Back in Quake 1– If you had to make a decision between an additional

CPU and a Graphics Card which would you choose?

Presenter
Presentation Notes
Why is this any different today?
Page 199: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

• Back in Quake 1– If you had to make a decision between an additional

CPU and a Graphics Card which would you choose?– Why is this any different today?

Presenter
Presentation Notes
So, thats a little misleading. Due to the fact that early rasterization hardware could process a pixel per cycle, a single additional CPU wouldn't match the improved performance of the GPU. So it really comes down to price. You get faster hardware that can do the same thing for less money. I do believe that this is still true today and for the foreseeable future.��However, as hardware becomes more programmable for more fancy shading, uses other than rasterization become possible. Although using the hardware for things other than rasterization will be slower, simply because the hardware is not designed for it, once a proof of concept is out there showing the value of ray casting, specific hardware that really accelerates it can then become a viable money making strategy. When that happens the hardware will be built with that in mind, perhaps by adding some assisting fixed function hardware. At this point though, ray casting technology is far too infant to determine any common patterns worthy of hardware abstraction. Its too risky.
Page 200: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

• Back in Quake 1– If you had to make a decision between an additional

CPU and a Graphics Card which would you choose?– Why is this any different today?– Its not.

Presenter
Presentation Notes
So the answer is of course, is that its not.
Page 201: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Why Ray Casting?Why Ray Casting?

• What value does it provide to game developers?– Scene complexity no longer significantly affects

performance or memory

Presenter
Presentation Notes
As is evident in the last half of this presentation, much of the work in making games goes into optimizing for performance, fitting everything in memory, and most importantly content generation. But, what if you never had to worry about triangle counts, so that the model the artists create in Z-brush are the final models? What if you never had to worry about rendering performance because it was fairly even no matter how complex the view is. And what if you never had to worry about the memory footprint of your textures or geometry as it is streamed in from disk at the highest level of detail you can perceive. While all these things are possible with rasterization, they are far easier with ray casting. So the answer is, shorter and cheaper development, with higher quality games.
Page 202: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Why Ray Casting?Why Ray Casting?

• What value does it provide to end users?

Presenter
Presentation Notes
We are making a bet that it will provide good value. However, until it is realized there is still the risk that it won't pay off. I do believe though that it will pay off in spades. Here is why. What we have seen in Rage with unique texturing on every surface, and the improved quality level from that, we can also see a few areas of improvement which would increase the realism factor there.
Page 203: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

• Screen shot from RAGE goes here

Presenter
Presentation Notes
As we can see from this screen shot, while it looks great.
Page 204: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

• Screen shot from RAGE goes here• (highlight flat regions)

Presenter
Presentation Notes
If we take a closer look at the oblique surfaces we can clearly see that everything is flat. Even though the textures and normal maps say differently, it is clearly a flat surface. With ray casting, we will be able to, like a pop-up book, make the apparent geometry become real geometry. So there is some real value that can be added here to practically every surface. The content production pipeline doesn't really have to change very much either from a unique texturing pipeline. You continue to stamp detail on surfaces where the stamps are created from high poly models.�
Page 205: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Why Ray Casting?Why Ray Casting?

Presenter
Presentation Notes
Just briefly, the speedup of a program using multiple processors in parallel computing is limited by the sequential fraction of the program. For example, if 95% of the program can be parallelized, the theoretical maximum speedup using parallel computing would be 20 times faster as shown in the diagram, no matter how many processors are used.��So why is ray casting better for Amdahl's Law? Ray casting is inherently much more easily parallelizable than rasterization. Lets take a trip to the future of rasterization to show a likely course of how this will play out.
Page 206: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Current State of RasterizationCurrent State of Rasterization

Vertex Processing

Triangle Setup

Fragment Processing

Command BufferVertex

Processing

Fragment Processin

g

Presenter
Presentation Notes
In this diagram the triangle setup is by far the biggest sequential element of rasterization. And this problem is only going to get worse as games will use smaller and more triangles. If you remember from earlier, a lot of the performance benefit of the SPUs were gained in assisting this part of the GPU.
Page 207: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of RasterizationFuture of Rasterization

Presenter
Presentation Notes
�Lets take a look at how hardware vendors may overcome this obstacle.
Page 208: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of RasterizationFuture of Rasterization

Presenter
Presentation Notes
If we subdivide the screen into multiple regions,
Page 209: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of RasterizationFuture of Rasterization

Presenter
Presentation Notes
Then we can process those triangles in each region in parallel with the other regions. There are two diminishing returns problems with this. First
Page 210: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of RasterizationFuture of Rasterization

Presenter
Presentation Notes
Any overlapping triangles will have to be setup twice, so as the screen gets split more finely.
Page 211: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of RasterizationFuture of Rasterization

Page 212: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of RasterizationFuture of Rasterization

Presenter
Presentation Notes
So does re-setup of triangles also increase. However, I don’t think this is going to be a huge problem,
Page 213: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of RasterizationFuture of Rasterization

Presenter
Presentation Notes
because the amount screen splitting will likely naturally follow the reduction of size of triangles and not precede it by much. There are two primary ways that the hardware vendors could implement this.
Page 214: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of RasterizationFuture of Rasterization

Vertex Processing

Triangle Setup

Fragment Processing

Command BufferVertex

Processing

Fragment Processing

Triangle Sorting

Presenter
Presentation Notes
One way is the addition of a triangle sorting unit, which now becomes the new sequential bottleneck. The other way,
Page 215: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of RasterizationFuture of Rasterization

Vertex Processing

Triangle Setup

Fragment Processing

Command Buffer

Multiple Cores

Presenter
Presentation Notes
Is by duplicating the entire rendering pipeline to multiple cores. However, both of those approaches have two fundamental problems. The first is that the parallel regions on the screen have size limitations set by the minimum size of a fragment block, which cannot easily be broken. Although GPUs are many generations away from hitting that limit, it will still eventually be a problem. The other issue with this approach is the source of the problem.
Page 216: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of Future of RasterizationRasterization x2x2

GPU Triangle Setup

Presenter
Presentation Notes
Which is linear access to the triangle index buffer which the GPU is forced to read one triangle at a time until all triangles are iterated
Page 217: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of Future of RasterizationRasterization x2x2

150

GPU Triangle Setup

Presenter
Presentation Notes
So how do we get random access to this data?
Page 218: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of Future of RasterizationRasterization x2x2

151

GPU Triangle Setup

Page 219: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of Future of RasterizationRasterization x2x2

152

GPU Triangle Setup

Page 220: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of Future of RasterizationRasterization x2x2

153

GPU Triangle Setup

Page 221: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Future of Future of RasterizationRasterization x2x2

154

GPU Triangle Setup

Presenter
Presentation Notes
One answer is a pre-processed or a dynamically updated spatial data structure which would convert the problem into a binary search. Where for each pixel the data structure would to be intersected with a ray to gather the input for rendering that pixel. But wait, that’s a whole lot like ray-casting isn't it? So it seems like destiny that drives us towards a image based approach. What’s interesting is that if the control flow of this tree structure can be standardized and abstracted slightly, then a ray casting approach may be able to be piggy backed on this technology. So this may be an interesting win-win approach for the hardware vendors.
Page 222: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Why Why voxelsvoxels, and not triangles?, and not triangles?

• Unifying Primitive• No Near Phase Collision• Solves Two Problems in One

– Unique Texturing & Unique Geometry

• Good for CUDA / Larabee

Presenter
Presentation Notes
�Using voxels for everything is a kind of great unifying primitive. Voxels can enumerate any surface, from triangles to fractals, with a specific degree of accuracy. However, even if you did use triangles, you still need a broad phase collision structure. So why not just throw away the triangles and just use the broad phase collision as your only phase and save a ton of work in the process? Also, if you used triangles and unique texturing you would have to solve the texturing problem separately. As a side benefit, it also turns out that only using voxels makes NVidia's CUDA much more efficient due to the consistent program counter in your inner loop. This should also be the case for Intel's Larabee.
Page 223: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Why is the control flow efficient?Why is the control flow efficient?

Presenter
Presentation Notes
If we have a few rays here to trace against that sphere over there. We can see that as they traverse through the world they generally follow the same data up until the last few recursions. But,
Page 224: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Why is the control flow efficient?Why is the control flow efficient?

Presenter
Presentation Notes
what if the camera was really far away so the ray end points were far apart, the data accesses would become much more scattered in memory, wouldn't they?�
Page 225: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Why is the control flow efficient?Why is the control flow efficient?

Presenter
Presentation Notes
If we stop traversing the structure when the size of a voxel is less than a pixel, much of that data coherency is maintained and very little detail is lost.
Page 226: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Why is the control flow efficient?Why is the control flow efficient?

Presenter
Presentation Notes
So through this there is a geometry mip mapping of sorts.
Page 227: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

VoxelVoxel MipMip Mapping Mapping –– Thin WallsThin Walls

Presenter
Presentation Notes
There are artifacts with mip mapping the geometry that occur when you have two walls very close to each other.
Page 228: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Voxel Mip Mapping Voxel Mip Mapping –– Thin WallsThin Walls

Page 229: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Voxel Mip Mapping Voxel Mip Mapping –– Thin WallsThin Walls

Page 230: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Voxel Mip Mapping Voxel Mip Mapping –– Thin WallsThin Walls

Presenter
Presentation Notes
You can have this incorrect bleed through effect which can be mitigated in various ways, but just saying "don't do that" to the artists should be sufficient. �
Page 231: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Caveats of RayCaveats of Ray--Tracing?Tracing?

• “Primary rays cache, secondary rays thrash”™– Importance sampling to the rescue!

• Ray Tracing != Ray Casting

Presenter
Presentation Notes
Although primary rays have this really nice access pattern, secondary and tertiary rays don't always have coherent patterns. Luckily importance sampling comes in to help here. However, for our next generation renderer, we will only be considering primary rays and baking most of the secondary information in.
Page 232: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Results May VaryResults May Vary

Presenter
Presentation Notes
�The following implementation details are highly speculative. It is meant to detail a particular implementation path, but not necessarily the one we will eventually end up using.
Page 233: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Sparse Voxel OctSparse Voxel Oct--treestrees

• Oct-trees as collection of maximal blocks.– Related to run-length encoding.

• Possibly variable splitting planes in future.– Guided by artist placed hint planes.

Presenter
Presentation Notes
�The sparse voxel oct-tree is the data structure of choice for the initial implementation due to its regularity and simplicity. In the future having movable splitting planes would help with the bleed through artifacts of the geometry mip mapping. The artists could place hint planes which could guide the splitting plane choice for oct-tree generation.�
Page 234: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data StructureData Structure

• Disk Caching with Virtual and Physical Pages– Start out with a single virtual page.– Render some voxels into the tree until page capacity

is reached.– Split page into 8 sub-pages and attempt to add the

overflow voxel again.– Store out virtual pages to disk.– Load/Unload each page’s levels as necessary at

runtime.

Presenter
Presentation Notes
These oct-trees are going to be very large, on the order of tens of gigs. So a paging system is necessary to fit it all in memory. This turns out to be quite simple though. The pages are organized in a oct-tree and each page is an oct-tree. Start out with a single page and add voxels to it until it reaches capacity, then split the page into 8 sub-pages and attempt to add the overflow voxel again. You will probably want to store the different levels of the oct-trees together and only load as many levels of a page as you need for efficient runtime memory use.
Page 235: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data StructureData Structure

• Page capacity can be based on...– CUDA's shared memory size– Cell's SPU local store size– Optimum disk streaming performance– Minimum physical page memory

Presenter
Presentation Notes
The capacity can be derived from many different parameters, such as the ones listed here.
Page 236: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data Structure Data Structure –– Page FragmentationPage Fragmentation

• Traverse indexing oct-tree– Write out pages according to optimal layout (breadth

first, depth first, etc...)

Presenter
Presentation Notes
If the pages are not of a constant size, the on disk cache will fragment as pages get added and deleted over time. Solving this is a simple process of copying the pages and removing any empty space in the file.
Page 237: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data Structure Data Structure -- Page OptimizationPage Optimization

• Execution time proportional to number of nodes.

• Number of nodes can be reduced through translation.

• Translating by 2n doesn’t affect any oct-tree level smaller than 2n

Presenter
Presentation Notes
Most algorithms that execute on a oct-tree representation instead of an array representation have an execution time proportional to the number of blocks in the volume rather than the number of voxels. So oct-trees are somewhat like dimension reducing devices. This also reflects that the amount of space occupied by an oct-tree is extremely sensitive to where it is positioned. It can be shown that translating the oct-tree by two to the power of any number doesn’t affect any nodes smaller than that. Through this property we can develop the following method to optimize an oct-tree.
Page 238: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data Structure Data Structure -- Page OptimizationPage Optimization

• Create scratch page with enlarged region – 2n+1 x 2n+1 x 2n+1

• Apply successive translations of magnitude power of 2 in the x, y, & z directions and keep a count of the number of nodes required.

• Store off total translation so that the ray casting can be adjusted appropriately.

• O(4n)– n is the number of levels in the oct-tree

Presenter
Presentation Notes
Here is the algorithm to optimize a page. It exhaustively tests all possible meaningful translations. To help hit home just how important this is,
Page 239: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data Structure Data Structure -- Page OptimizationPage Optimization

172

Presenter
Presentation Notes
Here is a diagram showing a typical case where translation of the data here will really help. If this oct-tree is shifted to the left or right by one,
Page 240: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data Structure Data Structure -- Page OptimizationPage Optimization

173

Presenter
Presentation Notes
then it consumes nearly half the total number of nodes. But we can take this one step further here.
Page 241: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data Structure Data Structure -- Page OptimizationPage Optimization

• Minimize outside nodes for faster tracing

Presenter
Presentation Notes
If we know the difference between the outside and the inside of an object, we can alternatively tune for the minimum number of outside nodes, thereby improving tracing performance.
Page 242: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data Structure Data Structure -- Page OptimizationPage Optimization

175

Presenter
Presentation Notes
In this example, the outside nodes are white, and the inside nodes are grey. Shooting some rays here
Page 243: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data Structure Data Structure -- Page OptimizationPage Optimization

176

Presenter
Presentation Notes
we can see that casting a ray along this oblique angle will have a costly performance hit. However,
Page 244: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data Structure Data Structure -- Page OptimizationPage Optimization

177

Presenter
Presentation Notes
If we translate the data one unit to the left, the performance cost is significantly reduced.
Page 245: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data StructureData Structure

• Different structures for runtime and storage.

Presenter
Presentation Notes
If we don’t have intelligent storage formats, the offline storage costs here will be huge. Probably on the order of hundreds of gigs or terabytes for a full game. As such, the runtime and disk data structures will be slightly different. The runtime will be optimized for performance, whereas the disk format will be optimized for space
Page 246: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Runtime Data StructureRuntime Data Structure

• child pointers : 32• diffuse rgb : 3• specular scale/power : 1• normal xy : 2• total : 38 bytes per node

Presenter
Presentation Notes
This is the current runtime data structure. If we wanted to increase performance some more, we could add parent and neighbor pointers to the structure. However, at this point the additional data is not necessary. Also, it is possible to compute the normal from the neighbor voxels, however it is a bit expensive to perform at runtime requiring up to 26 additional traversals of the tree or storing neighbor pointers for perfect accuracy or you could approximate the normal by looking at the immediate siblings doing only a single upwards traversal, so it is probably better at this point to just spend the extra data and pre-compute it. If we want high dynamic range lighting, just add an additional exponent component to make it an RGBE format.
Page 247: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Storage Data StructureStorage Data Structure

• children bit mask : 1• diffuse rgb : 3• specular scale/power : 1• normal xy : 2• total : 7 bytes per node

Presenter
Presentation Notes
For the storage data structure, the child pointers are now a simple bit mask where the ordering of the nodes is assumed by the compression functions.�
Page 248: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data CompressionData Compression

• Compressing child bits• Compressing Colors

Presenter
Presentation Notes
�So how do you compress the gigs or possibly terabytes of data? Compressing this data is a simple 3D extension of some compression technology for mega-textures that I was working on, and it breaks down into two separate problems. Problem number one is to compress the children bit masks.
Page 249: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Compressing Child BitsCompressing Child Bits

• (diagram showing statistical distribution of child bytes)

Presenter
Presentation Notes
Here you can see that if we plot out the statistical distribution of the children bit masks, we can see that they have definite patterns that we can take advantage of. But lets take this one step further. If we separate out the bits by level of the oct-tree the distribution is quite distinct at each level allowing further compression.�
Page 250: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Compressing Child BitsCompressing Child Bits

• Split by oct-tree level. • Arithmetic Compression• (diagram showing statistical distribution of

child bytes at each oct-tree level)

Presenter
Presentation Notes
So, for each level we can gather the bits and then perform arithmetic compression on the pieces.
Page 251: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Compressing Color DataCompressing Color Data

• (Diagram showing each node of the tree as a delta between itself and its parent)

Presenter
Presentation Notes
�Compressing the color data is a process very similar to Wavelet Compression if it were applied to three dimensional data. You take each level of the oct-tree and subtract the value from the higher level. So each octant's color data becomes a delta of its parent.  �
Page 252: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Compressing Color DataCompressing Color Data

• Split by oct-tree level.• Quantization • Arithmetic Compression

Presenter
Presentation Notes
Each color delta is separated according to its tree level. The Quantization of the color delta data is fairly straight forward. Quantizing small values to the statistically most common small values. The resulting data would then be piped through an arithmetic compressor to get the final data. The 2D analogue of this technique with some additional bells and whistles not mentioned here produce a very impressive 24 to 1 compression ratio on average, with the worst case noisy sections compressing at a 12 to 1 compression ratio. These ratios are on par and sometimes exceed HD Photo or JPEG 2000. A conservative estimate here is that the 3D data will compress at least half as well. Which puts us at an average of 12 to 1 compression ratio. �
Page 253: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Data Structure SizeData Structure Size

• 1.15 bits of positional data per voxel• Cost savings improves as triangle size

decreases.• 72 bits equivalent per triangle in oct-tree

– (for next generation)

• 160 bits per triangle in traditional format– x,z,y,s,t all 32-bits– 2.2:1 compression ratio

• 80 bits per triangle in compressed format– x,y,z,s,t all 16-bits– 1.1:1 compression ratio

Presenter
Presentation Notes
1 + 1/8 + 1/64 + 1/(8*8*8) + 1/(8*8*8*8) + 1/(8*8*8*8*8) + 1/(8*8*8*8*8*8) + 1/(8*8*8*8*8*8*8) + 1/(8*8*8*8*8*8*8*8) 1.142857134342193603515625 Current Generation 2000 / 2 * 1.15 / 8 143.75 bits per triangle Next Generation 2000 / 32 * 1.15 / 8 // Generational Skip so 32 instead of 8 8.984375 bits per triangle
Page 254: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Generating the dataGenerating the data

• Every surface can enumerate into voxels.• 3D Scan Conversion• Volume Projection• Subdivision

Presenter
Presentation Notes
When generating the voxel oct-tree it is important to note that every primitive can be broken down into discrete voxels. Triangles can be converted to these voxels through various different methods of 3D scan conversion. The first of which is a extension of 2D scan conversion into 3D. �
Page 255: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

3D Scan Conversion3D Scan Conversion

Z=0

Z=1

Z=2

Z=3

Z=4

=

Presenter
Presentation Notes
Here each third dimension slice of the triangle creates a 2D polygon on that slice which would need to be filled using a traditional polygon filling algorithm.�
Page 256: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

3D Scan Conversion3D Scan Conversion

Presenter
Presentation Notes
Another approach is to create a bounding box around the triangle and for each voxel, test to see if it collides with the triangle. Then at each voxel using barycentric coordinates to reconstruct the other properties such as texture coordinates.
Page 257: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

3D Scan Conversion3D Scan Conversion

Presenter
Presentation Notes
Care must be taken with this approach as the center of a voxel can lie outside the borders of the triangle. �
Page 258: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

3D Scan Conversion3D Scan Conversion

Presenter
Presentation Notes
Yet another approach is to subdivide the triangle and ensure that there are no cracks by simply comparing the voxel space location and making sure they are all touching, if not subdivide again. This approach has problems with jittered sample locations which are not regular along the surface of the triangle, but it really darn easy to implement.
Page 259: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

3D Scan Conversion3D Scan Conversion

• Only 8 connectivity required.• Flood fill world and remove unnecessary

voxels.

Presenter
Presentation Notes
Some of the previous approaches also create water tight voxel representations. This is unnecessary for rendering and only 8 connectivity is required. To fix this using a 3D voxel filling routine to mark and sweep out any unnecessary voxels will clean that data right up.
Page 260: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Generating the dataGenerating the data

• Generate geometry mip-maps• Geometry mips consume 15% extra data

Presenter
Presentation Notes
To generate the geometry mip maps, recursively average all the 8 children together to generate the parent's voxel data.
Page 261: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Generating the dataGenerating the data

• Save off the un-lit data.

Presenter
Presentation Notes
Now that we have complete voxel data, save this off so we can recompute the lighting later without re-rendering the surfaces.�
Page 262: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Generating the dataGenerating the data

• Perform ray-tracing to light the voxels.

Presenter
Presentation Notes
The next step is to perform ray-tracing to light the voxels. This can take a while to do your global illumination, photon mapping, etc..., so you might as well go grab some coffee, take a nap, or whatever. In the long term, distributing the build will help here.�
Page 263: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Using the dataUsing the data

• For each pixel on the screen– Shoot out a ray into the oct-tree and record the

color (and depth?)

Presenter
Presentation Notes
To use the data, for each pixel on the screen, shoot out a ray into the oct-tree and record the color.  If we want to mix ray casting with standard rasterization, we also need to output a depth buffer.
Page 264: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Need More Detail?Need More Detail?

• Bottom up?• Top Down?• Etc…

Page 265: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Rendering Skinned CharactersRendering Skinned Characters

• Ray cast against triangle mesh• Transform to base pose• Trace with local oct-tree

Presenter
Presentation Notes
So how do you render skinned characters using ray casting? You would ray-cast against a triangle mesh and transform the ray origin and direction back to the original base mesh where you would then collide with its local oct-tree.�If you don't require perfect shadowing in your game, it would probably be faster to use a hybrid approach and render the characters and its shadows using a traditional rasterization method similar to deferred rendering.
Page 266: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Abstracting Skinned CharactersAbstracting Skinned Characters

• Moving world geometry without oct-tree re- organization cost.

• Enables instancing.

Presenter
Presentation Notes
We can combine this type of refractive skinning and standard rasterization to the world at large to achieve moving geometry without the oct-tree re-organization cost.
Page 267: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Hybrid RenderingHybrid Rendering

• Render a coarse hull of the geometry into a z-buffer.– Automatically calculate from voxel geometry.

• Post-process the depth-buffer to start the ray casting at the point specified by the depth-buffer instead of at the ray origin.– Possibly transforming it as well.

Page 268: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Hybrid RenderingHybrid Rendering

• Skips most of the traversal process. – 2x to 4x expected speed improvement

• Allows dynamic geometry• Prevents second and third order ray tracing

– No z-buffer pre-pass to transform ray start for dynamic geometry

Page 269: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Hybrid Rendering Hybrid Rendering with Atmospheric Effectswith Atmospheric Effects• Atmospheric effects in combination with

rasterization possible.– Rasterize coarse hull into depth buffer– Rasterize triangles into color and alternate depth

buffer– At ray-collision time, keep track of atmospheric

collisions along the way.– If hit rasterized depth,

• stop and modify existing color

– Else • modify voxel color.

Page 270: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Adaptive SubAdaptive Sub--SamplingSampling

• After rendering the scene, perform a Sobel edge filter over the frame buffer to figure out where additional rays would improve the quality of the image.

• Cast additional rays.• Repeat until 16 ms.

Page 271: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Adaptive SubAdaptive Sub--Sampling ProblemsSampling Problems

• Inherently always sampling the most divergent parts of the scene

• Can manage performance hit by sampling highly aliased to less aliased in chunks

Page 272: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Infinite Surface DetailInfinite Surface Detail

• Oct-tree node's recursively point back in on themselves to create an infinite amount of detail

• Create detail types octree sub-segments to simulate rough, smooth, porous, sharp edges, etc.. – Recursive trees are entirely in cache, so very fast

calculation.– Requires delta compressed colors to be in the

runtime format for correct shadowing• Not a huge computation burden, but more bandwidth

required

Page 273: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

How much time to innovate?How much time to innovate?

• 1 year tools• 3 months runtime

Page 274: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Expected Runtime PerformanceExpected Runtime Performance

• 33% of the time rendering characters / etc• 66% of the time rendering world• Ray-casting the world must complete in

~20ms for 30 FPS• Theoretically possible on today's technology

(GeForce 8800 Series)– Possible on 7800 with additional memory

requirements.

Page 275: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

How would this affect a platform launch?How would this affect a platform launch?

• Generational skip in geometric complexity• Next gen platforms 4 times better at least• Completely plausible at 720p and 60 FPS w/

antialaising– or 1080p and 60 FPS

Page 276: Beyond Programmable Shading: In Actionwebstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph08/class/... · Each case study includes a live demo and discusses the mix of parallel

Beyond Programmable Shading: In Action

Questions

Jon Olick ([email protected])id Software