DRPU: A Programmable Hardware Architecture for Real-Time Ray Tracing of Coherent Dynamic Scenes Sven...

download DRPU: A Programmable Hardware Architecture for Real-Time Ray Tracing of Coherent Dynamic Scenes Sven Woop Computer Graphics Lab Saarland University.

If you can't read please download the document

Transcript of DRPU: A Programmable Hardware Architecture for Real-Time Ray Tracing of Coherent Dynamic Scenes Sven...

  • Slide 1

DRPU: A Programmable Hardware Architecture for Real-Time Ray Tracing of Coherent Dynamic Scenes Sven Woop Computer Graphics Lab Saarland University Slide 2 Overview Motivation: Why Ray Tracing? Previous Work DRPU Architecture FPGA Prototype ASIC Performance Estimates Conclusion & Future Work Slide 3 Why not Rasterization... Primitive Operation: Rasterize Isolated Triangles Perfect for dynamic scenes Very simple operation (good for HW) Parallel processing of triangles and fragments (good for HW) No global access to the scene All Interesting Visual Effects Need 2+ Triangles (Shadows, Reflection, Global Illumination, ) Approximations via multiple pass approaches have many issues Difficult to Use Algorithm Very Fast Hardware Implementations Slide 4 ... but Ray Tracing? Primive Operation: Trace a Ray O(log n) traversal operation Demand driven Global Access to Scene Automatic combination of effects (orthogonal shaders) Recursive evaluation Physical Light Simulation Embarrassingly parallel (good for HW) Accurate and realistic images Easy to use Algorithm Low Performance Slide 5 ... but Ray Tracing? Primive Operation: Trace a Ray O(log n) traversal operation Demand driven Global Access to Scene Automatic combination of effects (orthogonal shaders) Recursive evaluation Physical Light Simulation Embarrassingly parallel (good for HW) Accurate and realistic images Easy to use Algorithm Low Performance Slide 6 ... but Ray Tracing? Primive Operation: Trace a Ray O(log n) traversal operation Demand driven Global Access to Scene Automatic combination of effects (orthogonal shaders) Recursive evaluation Physical Light Simulation Embarrassingly parallel (good for HW) Accurate and realistic images Easy to use Algorithm Low Performance Slide 7 ... but Ray Tracing? Primive Operation: Trace a Ray O(log n) traversal operation Demand driven Global Access to Scene Automatic combination of effects (orthogonal shaders) Recursive evaluation Physical Light Simulation Embarrassingly parallel (good for HW) Accurate and realistic images Easy to use Algorithm Low Performance Slide 8 ... but Ray Tracing? Primive Operation: Trace a Ray O(log n) traversal operation Demand driven Global Access to Scene Automatic combination of effects (orthogonal shaders) Recursive evaluation Physical Light Simulation Embarrassingly parallel (good for HW) Accurate and realistic images Easy to use Algorithm Low Performance Slide 9 ... but Ray Tracing? Primive Operation: Trace a Ray O(log n) traversal operation Demand driven Global Access to Scene Automatic combination of effects (orthogonal shaders) Recursive evaluation Physical Light Simulation Embarrassingly parallel (good for HW) Accurate and realistic images Easy to use Algorithm Low Performance Slide 10 ... but Ray Tracing? Primive Operation: Trace a Ray O(log n) traversal operation Demand driven Global Access to Scene Automatic combination of effects (orthogonal shaders) Recursive evaluation Physical Light Simulation Embarrassingly parallel (good for HW) Accurate and realistic images Easy to use Algorithm Low Performance Slide 11 Previous Work Ray Tracers for Static Scenes CPU based: [OpenRT], [MLRT SIGGRAPH05] GPU based: Purcell (Grids) [SIGGRAPH02], Foley et al. (KD Trees) [GH05] Stefan Popov (Stackless KD Tree traversal) [EG07] Custom Hardware: ART-VPS (AR350 Chip for offline rendering) Schmittler (SaarCOR) [GH04] Woop (RPU) [SIGGRAPH05] Ray Tracers for Dynamic Scenes CPU based: Wald (Grids) [SIGGRAPH06] Wald (AABVHs) [TOG / Tech. Rep. 2006] Wchter and Keller (BIH) [EG06] Johannes Gnther (Motion Decomposition) [EG06] Custom Hardware: Woop (B-KD Trees) [GH06] Woop (DRPU-ASIC) [RT06] Slide 12 Why isnt everybody using Ray Tracing Low Performance High computational complexity 1 million pixels (minimal) 30 frames per second (minimal) 10 rays per pixel (minimal) At least 300 million rays 24 billion traversal steps (80 trav. steps per ray) 240 billion instructions (10 instructions) 0.5 trillion (5E11) cycles (instruction dependencies) Limited Support for Dynamic Scenes Due to need of spatial index structures (costly rebuild O(n log n)) But most graphics applications are highly dynamic (e.g. computer games) Slide 13 and what can be done? Hardware Implementation (DRPU) High performance through dedicated hardware units A high end ASIC implementation would provide enough performance for computer games using RT (about 200 million rays/s) Algorithmic Changes B-KD Trees as spatial index structure Supports most kinds of dynamic scenes Slide 14 DRPU Architecture vertices from memory Task Parallelism Optimized Hardware Units Slide 15 DRPU Architecture Rendering Units Synchronous execution of packets of 4 rays Memory bandwidth reduction (combining) Sharing of HW (e.g. caches) Highly multi-threaded Higher hardware usage First level caches Memory bandwidth reduction Memory latency reduction vertices from memory Slide 16 DRPU Hardware Architecture vertices from memory Slide 17 DRPU Architecture Programmable Shading Processor Fully programmable In-order execution 4-component SIMD operations Similar Instruction set to GPUs, but: Efficient recursion Flexible memory access Programming Model Material shading Ray generation tasks Calls Ray Casting Units to cast rays vertices from memory Slide 18 DRPU Architecture Programmable Shading Unit Ray Casting Units Find closest intersection of a ray with the scene High-performance traversal and intersection Implement the atomic trace instruction of Shading Processor SP can continue scheduling instruction not dependent on intersection result vertices from memory Slide 19 DRPU Architecture Programmable Shading Unit Ray Casting Units Traversal Processor B-KD Tree approach vertices from memory Slide 20 Definition of B-KD Trees B-KD Tree (Bounded KD-Tree) Binary Tree 1D bounding intervals (or slabs) for each child Leaf nodes point to a single primitive Bounding Volume Hierarchy (subdivides geometry) Slide 21 B-KD Tree Semantics B-KD Tree (Bounded KD-Tree) Each node T can be assigned a box B(T) B(T) Slide 22 B-KD Tree Semantics B-KD Tree (Bounded KD-Tree) H right (min_1) = { (x,y,z) | x >= min_1 } B(T) Slide 23 B-KD Tree Semantics B-KD Tree (Bounded KD-Tree) H right (min_1) = { (x,y,z) | x >= min_1 } H left (max_1) = { (x,y,z) | x