Advances in High-Performance GPU Ray Tracing for Physics-Based

56
Advances in High-Performance GPU Ray Tracing for Physics-Based Simulation Christiaan Gribble & Lee A. Butler GPU Technology Conference 21 March 2013

Transcript of Advances in High-Performance GPU Ray Tracing for Physics-Based

Advances in High-Performance GPU Ray Tracing for Physics-Based Simulation

Christiaan Gribble & Lee A. Butler GPU Technology Conference

21 March 2013

Introductions

Christiaan Gribble SURVICE Engineering [email protected]

Lee A. Butler US Army Research Laboratory [email protected]

Alexis Naveros SURVICE Engineering [email protected]

Mark Butkiewicz SURVICE Engineering [email protected]

SURVICE Engineering

• Support DoD community

• Focus on combat systems – Safety

– Survivability

– Effectiveness

• 400+ employees

• 10 locations nationally

US Army Research Laboratory

• US Army RDECOM – Corporate laboratory

– 2000 civilian employees

• Directorates – SLAD

– Army Research Office

– Many others

• Still in the Top 500 list

Agenda

• Application domains

• Technical motivation

• Rayforce GPU ray tracing engine

• Cognition-Driven Simulation

• Visual Simulation Laboratory

0 1

Agenda

• Application domains

• Technical motivation

• Rayforce GPU ray tracing engine

• Cognition-Driven Simulation

• Visual Simulation Laboratory

0 1

Agenda

• Application domains

• Technical motivation

• Rayforce GPU ray tracing engine

• Cognition-Driven Simulation

• Visual Simulation Laboratory

0 1

Agenda

• Application domains

• Technical motivation

• Rayforce GPU ray tracing engine

• Cognition-Driven Simulation

• Visual Simulation Laboratory

0 1

Agenda

• Application domains

• Technical motivation

• Rayforce GPU ray tracing engine

• Cognition-Driven Simulation

• Visual Simulation Laboratory

0 1

Application domains

• Ballistic penetration

• Radio frequency propagation

• Thermal radiative transport

• High-energy particle transport

Application domains

• Ballistic penetration

• Radio frequency propagation

• Thermal radiative transport

• High-energy particle transport

Application domains

• Ballistic penetration

• Radio frequency propagation

• Thermal radiative transport

• High-energy particle transport

Technical motivation

Optical rendering Non-optical rendering

Technical motivation

Interval computation Interval generation

• Difficult or impossible – Negative epsilon hacks

– Missed/repeated hits

• Performance impacts – Traversal restart

– Operational overhead

Technical motivation

Interval computation Interval generation

• Difficult or impossible – Negative epsilon hacks

– Missed/repeated hits

• Performance impacts – Traversal restart

– Operational overhead

Technical motivation

Interval computation Interval generation

• Difficult or impossible – Negative epsilon hacks

– Missed/repeated hits

• Performance impacts – Traversal restart

– Operational overhead

Rayforce

• Programmable ray tracing engine

• Designed for NVIDIA GPUs

• High performance

– Modern techniques

– Novel acceleration structure

– Multiple traversal algorithms

Rayforce

• Programmable ray tracing engine

• Designed for NVIDIA GPUs

• High performance

– Modern techniques

– Novel acceleration structure

– Multiple traversal algorithms

Rayforce

• Programmable ray tracing engine

• Designed for NVIDIA GPUs

• High performance

– Modern techniques

– Novel acceleration structure

– Multiple traversal algorithms

State-of-the-art ray tracing

• Leverages modern techniques – Ray packets – Frustum tracing

• Exploits hardware features – SIMD processing (v2.1) – Architecture-specific optimizations

Proven techniques bolster high performance

State-of-the-art ray tracing

• Leverages modern techniques – Ray packets – Frustum tracing

• Exploits hardware features – SIMD processing (v2.1) – Architecture-specific optimizations

Proven techniques bolster high performance

State-of-the-art ray tracing

• Leverages modern techniques – Ray packets – Frustum tracing

• Exploits hardware features – SIMD processing (v2.1) – Architecture-specific optimizations

Proven techniques bolster high performance

Acceleration structure

• kd-tree

• Binary Space Partitioning tree

• Regular grid

• Bounding Volume Hierarchy

Acceleration structure

• kd-tree

• Binary Space Partitioning tree

• Regular grid

• Bounding Volume Hierarchy

Graph-based spatial indexing

Graph-based spatial indexing

• Efficient

– Uses memory very carefully

– Improves cache performance

– Reduces memory bandwidth

• Flexible

• Scalable

Graph-based spatial indexing

• Efficient

• Flexible

– Several traversal algorithms

– Minimal overhead

– User-configurable pipelines

• Scalable

Graph-based spatial indexing

• Efficient

• Flexible

• Scalable

– Handles complex scenes

– Performance depends only on complexity along a ray

Traversal algorithms

• First-hit

– Nearest intersected primitive?

– Visibility/bounce rays

• Any-hit

• Multi-hit

Traversal algorithms

• First-hit

• Any-hit

– Is any primitive intersected?

– Shadow/ambient occlusion rays

• Multi-hit

Traversal algorithms

• First-hit

• Any-hit

• Multi-hit

– Which primitives are intersected?

– Transparency & non-optical rendering

Performance – tests

Coherent workloads

• vis – first-hit visibility

– N · V shading

• x-ray – all multi-hit intersections

– alpha blending

Incoherent workloads

• ao – first-hit visibility

– 32 AO rays/intersection

• kajiya – first-hit visibility

– shadows + 2 diffuse bounces

Performance – tests

Coherent workloads

• vis – first-hit visibility

– N · V shading

• x-ray – all multi-hit intersections

– alpha blending

Incoherent workloads

• ao – first-hit visibility

– 32 AO rays/intersection

• kajiya – first-hit visibility

– shadows + 2 diffuse bounces

Performance – tests

Coherent workloads

• vis – first-hit visibility

– N · V shading

• x-ray – all multi-hit intersections

– alpha blending

Incoherent workloads

• ao – first-hit visibility

– 32 AO rays/intersection

• kajiya – first-hit visibility

– shadows + 2 diffuse bounces

Performance – tests

Coherent workloads

• vis – first-hit visibility

– N · V shading

• x-ray – all multi-hit intersections

– alpha blending

Incoherent workloads

• ao – first-hit visibility

– 32 AO rays/intersection

• kajiya – first-hit visibility

– shadows + 2 diffuse bounces

Performance – scenes

Images rendered at 1024x768 pixels on a NVIDIA GeForce GTX 690

ktank 1M tris

conference 282K tris

san miguel 10M tris

Performance – results

0

200

400

600

800

1000

vis x-ray ao kajiya

Incoherent workloads

Coherent workloads

Mrps

Just for Fun …

0

200

400

600

800

1000

1200

1400

vis

• 1920x1080 vs 1024x768

• Single hit

• No color, Lambertian only

Mrps

Multi-hit traversal

• Which primitives are intersected? – One or more, & possibly all

– Ordered by t-value along ray

• Core operation in Rayforce

• Critical to interval generation

• Applications

Multi-hit traversal

• Which primitives are intersected?

• Core operation in Rayforce – Avoids negative epsilon hacks

– Alleviates traversal restart

• Critical to interval generation

• Applications

Multi-hit traversal

• Which primitives are intersected?

• Core operation in Rayforce

• Critical to interval generation – Handles bad geometry gracefully

– Enables early exit

• Applications

Multi-hit traversal

• Which primitives are intersected?

• Core operation in Rayforce

• Critical to interval generation

• Applications – Physically based simulation

– Order-independent transparency

– …

Naïve multi-hit

1 function TRAVERSE(root, ray)

2 INITIALIZE(hitList)

3 node root

4 while VALID(node) do

5 if !EMPTY(node) then

6 for tri in node do

7 if INTERSECT(tri, ray) then

8 hitData (t-value, u, v, …)

9 ADD(hitList, hitData)

10 end if

11 end for

12 end if

13 node NEXT(node)

14 end while

...

...

15 for hitData in hitList

16 if !USERHIT(ray, hitData) then

17 goto fini

18 end if

19 end for

20 label fini:

21 USEREND(ray)

22 end function

Simple & effective, but potentially slow

Find all hits

Process desired hits

Rayforce multi-hit

1 function TRAVERSE(root, ray)

2 node root

3 while VALID(node) do

4 if !EMPTY(node) then

5 SET(flags, INIT)

6 while TRUE do

7 INITIALIZE(hitList)

8 for tri in node do

9 if !DONE(hitMask, tri) then

10 if INTERSECT(tri, ray) then

11 hitData (t-value, u, v, …)

12 if ADD(hitList, hitData) then

13 SET(flags, REPEAT)

14 end if

15 end if

16 end if

17 end for

...

...

18 if GET(flags) == (INIT & REPEAT) then

19 INITIALIZE(hitMask)

20 UNSET(flags, INIT)

21 end if

22 for hitData in hitList do

23 if !USERHIT(ray, hitData) then

24 goto fini

25 end if

26 if GET(flags) == REPEAT then

27 DONE(hitMask, hitData, TRUE)

28 end if

29 end for

...

Find some hits

Early exit

Rayforce multi-hit

...

30 if GET(flags) != REPEAT then

31 break

32 end if

33 UNSET(flags, REPEAT)

34 end while

35 end if

36 node NEXT(node)

37 end while

38 label fini:

39 USEREND(ray)

40 end function

Gains efficiency with early exit

Per-ray cleanup

Early Exit Buys Performance

0

50

100

150

200

250

ktank conf san miguel

+39.05%

+91.00%

Rayforce multi-hit outperforms naïve algorithm by 1.8x +104.01%

Rayforce

• Battle-tested techniques

• Novel acceleration structure

• Multi-hit ray traversal

• Hand-tuned for CUDA

Demonstrated high performance GPU ray tracing

first-hit

any-hit

multi-hit

Demonstration Quadro 3000M

240 Fermi CUDA Cores @ 900 MHz

Rayforce

• Modern techniques

• Novel acceleration structure

• Multi-hit ray traversal

• Hand-tuned for CUDA

Demonstrated high performance GPU ray tracing

first-hit

any-hit

multi-hit

Rayforce

• Battle-tested techniques

• Novel acceleration structure

• Multi-hit ray traversal

• Hand-tuned for CUDA

Demonstrated high performance GPU ray tracing

first-hit

any-hit

multi-hit

Public LGPL v2.0 release of Rayforce now available!

Cognition-Driven Simulation

• Perform visualization during simulation – As a by-product of computation

– As computation progress

• Key advantages

• Managed computation

Cognition-Driven Simulation

• Perform visualization during simulation

• Key advantages – Enables exploration & steering

– Drives understanding & confidence

– User Cognition must be managed: • Too fast details missed

• Too slow disengage

• Managed computation

Cognition-Driven Simulation

Cognition-Driven Simulation

Cognition-Driven Simulation

• Perform visualization during simulation

• Key advantages

• Managed computation – Focus on most interesting features

– Avoid uninteresting parts of parameter space

Visual Simulation Laboratory

• A cross-platform, open-source application framework

– Qt, OpenSceneGraph, & other technologies

• The foundation used for several CDS simulation applications

Visual Simulation Laboratory

• A cross-platform, open-source application framework

– Qt, OpenSceneGraph, & other technologies

• The foundation used for several CDS simulation applications

Public LGPL v2.0 release of VSL now available!

Get the software

Rayforce

Rayforce Website:

http://rayforce.net

Source code:

http://sourceforge.net/projects/rayforce

VSL

VSL Website:

http://vissimlab.org

Source code:

http://sourceforge.net/projects/vissimlab