GPU-based Hierarchical Computations for View Independent Visibility Rhushabh Goradia, Prekshu...

Post on 31-Mar-2015

219 views 2 download

Tags:

Transcript of GPU-based Hierarchical Computations for View Independent Visibility Rhushabh Goradia, Prekshu...

GPU-based Hierarchical Computations for View Independent VisibilityRhushabh Goradia, Prekshu Ajmera, Sharat Chandran Sixth Indian Conference on Computer Vision,

Graphics & Image Processing, 2008http://www.cse.iitb.ac.in/~{rhushabh, prekshu, sharat}

Problem Statement

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Parallel Fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Problem Statement

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Parallel Fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Parallel Fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Problem Statement

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Problem Statement

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Parallel, fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Problem Statement

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Parallel, fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Problem Statement

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Parallel, fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Problem Statement

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Parallel, fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Problem Statement

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Parallel, fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Problem Statement

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Parallel, fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Problem Statement

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Parallel, fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Result Video

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

VIDEO !

Challenges

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

O(N3) Time ComplexityN2 point-pairsN occluders considered for every pair

No surface Information

Polygonal Model Point Model

A A BB

[GKSD 07], Visibility Map for Global Illumination in Point Clouds, GRAPHITE 2007

Building the Octree Hierarchy

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Building the Octree Hierarchy

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Building the Octree Hierarchy

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Building the Octree Hierarchy

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Building the Octree Hierarchy

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Visibility between Point Clusters

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Visibility between Point Clusters

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Visibility between Point Clusters

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Discussion: Visibility between Point Clusters O(M2 log M) Time Complexity (for Octree with M leaves)

M2 log M << N3

M << N

… but still not fast enough !

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Discussion: Visibility between Point Clusters O(M2 log M) Time Complexity (for Octree with M leaves)

M2 log M << N3

M << N

… but still not fast enough !

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Contributions O(M2 log M) Time Complexity (for Octree with M leaves)

M2 log M << N3

M << N

… but still not fast enough !

Parallel, Fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Problem Statement & Contributions:

Achieved upto 19x speed-up

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

O(M2 log M) Time Complexity (for Octree with M leaves)

M2 log M << N3

M << N

… but still not fast enough !

Achieved upto 19x speed-up

Visibility Problem is highly parallel

Problem Statement & Contributions:

Contributions

Parallel, Fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

O(M2 log M) Time Complexity (for Octree with M leaves)

M2 log M << N3

M << N

… but still not fast enough !

Achieved upto 19x speed-up

No compromise on quality Visibility problem is highly parallel

Problem Statement & Contributions:

Contributions

Parallel, Fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

O(M2 log M) Time Complexity (for Octree with M leaves)

M2 log M << N3

M << N

… but still not fast enough !

Achieved upto 19x speed-up Visibility problem is highly parallel

No Dynamic Memory Allocation and Recursion No compromise on quality

Problem Statement & Contributions:

Contributions

Parallel, Fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

O(M2 log M) Time Complexity (for Octree with M leaves)

M2 log M << N3

M << N

… but still not fast enough !

Achieved upto 19x speed-up Visibility problem is highly parallel No compromise on quality No Dynamic Memory Allocation and Recursion

Problem Statement & Contributions:

Contributions

Parallel, Fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Application to Global Illumination

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Parallel, Fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Application to Global Illumination

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Point modeled Cornell Room and the Stanford Bunny

Parallel, Fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Application to Global Illumination

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Can be extended to the Digital Heritage Projecthttp://research.microsoft.com

Plan

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Hierarchical Visibility Map (V-map) Construction on CPU

Parallel V-map Construction on GPU

Strategy 1 – Multiple threads per node Strategy 2 – One thread per node Strategy 3 – Multiple threads per node-pair

Results

Conclusion

Plan

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Hierarchical Visibility Map (V-map) Construction on CPU

Parallel V-map Construction on GPU

Strategy 1 – Multiple threads per node Strategy 2 – One thread per node Strategy 3 – Multiple threads per node-pair

Results

Conclusion

What is a V-map ?

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

V-map for a tree is a collection of visibility links for every node in the tree

What is a V-map ?

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

V-map for a tree is a collection of visibility links for every node in the tree

The visibility link for any node N is a set L of nodes

N

Node N List L

What is a V-map ?

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

V-map for a tree is a collection of visibility links for every node in the tree

The visibility link for any node N is a set L of nodes

Every point in any node L is guaranteed to be visible from every point in N V1

N

Node N V1Visibility Link

List L

What is a V-map ?

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

V-map for a tree is a collection of visibility links for every node in the tree

The visibility link for any node N is a set L of nodes

Every point in any node L is guaranteed to be visible from every point in N

V2

V1

N

Node N V1 V2Visibility Link

List L

What is a V-map ?

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

V-map for a tree is a collection of visibility links for every node in the tree

The visibility link for any node N is a set L of nodes

Every point in any node L is guaranteed to be visible from every point in N

V2

V1

N

Node N V1 V2Visibility Link

V3

V3

What is a V-map ?

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level 0

(Root)

What is a V-map ?

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

With respect to at any level,

-- Completely Visible

-- Completely Invisible

-- Partially Visible

Level 0

Level 1

(Root)

What is a V-map ?

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

With respect to at any level,

-- Completely Visible

-- Completely Invisible

-- Partially Visible

Level 0

Level 1

Level 2

(Root)

What is a V-map ?

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

With respect to at any level,

-- Completely Visible

-- Completely Invisible

-- Partially Visible

Level 0

Level 1

Level 2

Level 3

(Root)

(Leaf)

CPU based V-map Construction

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Compute Visibility

A B

Root

Level L-2

Level L-1

Level L

Level L-3

Level 0

a1 b1

CPU based V-map Construction

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0

Look up Look up

Root

A B

a1 b1

All leaf-pairs are visible

CPU based V-map Construction

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0Complete Visibility Root

A B

a1 b1

Look up Look up

CPU based V-map Construction

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0Complete Visibility

New VisibilityLink

Root

A B

a1 b1

Look up Look up

CPU based V-map Construction

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0Root

A B

a1 b1

Only some leaf-pairs are visible

Look up Look up

CPU based V-map Construction

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0Partial Visibility Root

New VisibilityLink

A B

a1 b1

Look up Look up

CPU based V-map Construction

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0Partial Visibility Root

New VisibilityLink

A B

a1 b1

Look up Look up

CPU based V-map Construction

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Compute Visibility

Root

Level L-2

Level L-1

Level L

Level L-3

Level 0

A B

a2 b1

CPU based V-map Construction

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Root

Level L-2

Level L-1

Level L

Level L-3

Level 0

A B

a2 b1

Uses Dynamic Memory Allocation

Compute Visibility

CPU based V-map Construction

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Root

Level L-2

Level L-1

Level L

Level L-3

Level 0

A B

a2 b1

Uses Recursion on the tree

Compute Visibility

Plan

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Hierarchical Visibility Map (V-map) Construction on CPU

Parallel V-map Construction on GPU

Strategy 1 – Multiple threads per node Strategy 2 – One thread per node Strategy 3 – Multiple threads per node-pair

Results

Conclusion

Plan

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Hierarchical Visibility Map (V-map) Construction on CPU

Parallel V-map Construction on GPU

Strategy 1 – Multiple threads per node Strategy 2 – One thread per node Strategy 3 – Multiple threads per node-pair

Results

Conclusion

Strategy 1: Multiple Threads per Node

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

A

N

I0 I1 I2 I3

T0T1

T3T2

Interaction List

Computation by a Single Thread

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Compute Visibility

A I0

Root

Level L-2

Level L-1

Level L

Level L-3

Level 0

a1 b1

Strategy 1: Multiple Threads per Node

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

A

N

I0 I1 I2 I3

T0T1

T3T2

Interaction List

Degree of parallelism limited by the size of the node’s interaction list Serious Limitation: No support for Recursion and Dynamic memory allocation

Discussion:

Plan

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Hierarchical Visibility Map (V-map) Construction on CPU

Parallel V-map Construction on GPU

Strategy 1 – Multiple threads per node Strategy 2 – One thread per node Strategy 3 – Multiple threads per node-pair

Results

Conclusion

Strategy 2: One Thread per Node

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Degree of parallelism limited by number of nodes per level

Serious Limitation: No support Recursion and Dynamic memory allocation

Discussion:

A I0 I1 I2 I3

N I0 I1

Thread T0

Thread T1

LevelL

LevelL

Interaction List

Performance increases as we go down the octree

Plan

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Hierarchical Visibility Map (V-map) Construction on CPU

Parallel V-map Construction on GPU

Strategy 1 – Multiple threads per node Strategy 2 – One thread per node Strategy 3 – Multiple threads per node-pair

Results

Conclusion

Strategy 3: Multiple Threads per Node-Pair

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0

Compute Visibility between leaves in parallel

A B

Root

Strategy 3: Multiple Threads per Node-Pair

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0

Store the results in a boolean array

A B

Root

Strategy 3: Multiple Threads per Node-Pair

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0

Look – up from the boolean array

A B

a1 b1

Look up Look up

Root

Strategy 3: Multiple Threads per Node-Pair

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0Root

A B

a1 b1

All leaf-pairs are visible

Look up Look up

Strategy 3: Multiple Threads per Node-Pair

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0Complete Visibility Root

A B

a1 b1

Look up Look up

Strategy 3: Multiple Threads per Node-Pair

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0Complete Visibility

New VisibilityLink

Root

A B

a1 b1

Look up Look up

Strategy 3: Multiple Threads per Node-Pair

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0Root

A B

a1 b1

Look up Look up

Strategy 3: Multiple Threads per Node-Pair

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0Partial Visibility Root

New VisibilityLink

A B

a1 b1

Look up Look up

Strategy 3: Multiple Threads per Node-Pair

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Level L-2

Level L-1

Level L

Level L-3

Level 0Partial Visibility Root

New VisibilityLink

A B

a1 b1

Look up Look up

Strategy 3: Multiple Threads per Node-Pair

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Root

Level L-2

Level L-1

Level L

Level L-3

Level 0

A B

a2 b1

Look up Look up

CPU v/s GPU

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Compute Visibility in parallel

Compute Visibility

A B

Root

CPU Algorithm GPU Algorithm

A B

Root

CPU v/s GPU

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Compute Visibility

A B

Root

CPU Algorithm GPU Algorithm

Look – up from the boolean array

A B

Root

Strategy 3: Multiple Threads per Node-Pair

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Degree of parallelism is high as hundreds/thousands of threads run concurrently most of the time Recursion and Dynamic memory allocation happens on the CPU

Performance is best at octree levels near to root

Achieves upto 19x speed-up

Discussion:

No change in Quality

Plan

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Hierarchical Visibility Map (V-map) Construction on CPU

Parallel V-map Construction on GPU

Strategy 1 – Multiple threads per node Strategy 2 – One thread per node Strategy 3 – Multiple threads per node-pair

Results

Conclusion

Results: Visibility Validation

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Results: Visibility Validation

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Results: Timings

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Bunny in the Cornell Room (1,35,000 points)

Max Speedup : 19X

Results: Timings

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Buddha in the Cornell Room (2,30,000 points)

Max Speedup : 14X

Results: Timings

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Ganesha & Satyawati in the Cornell Room (3,50,000 points)

Max Speedup : 17X

Results: Application to Global Illumination

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Results: Application to Global Illumination

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Results: Quality Comparison

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

CPU GPU

Accuracy upto 5 decimal points

Results: Quality Comparison

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

CPU GPU

Accuracy upto 5 decimal points

Discussion: Implementation Details

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Optimal Thread and Block Size Each block (16 x 16) contained 256 threads Every thread-block Grid contained no less than 64 blocks

Optimal Octree Heights Every thread works on a leaf-pair; Multiple threads are independent No Shared Memory Synchronization required With 16 Multi-processors, 64 thread-blocks and 256 threads per block,

we need 16384 leaf-pairs

Asynchronous Computations

Loop Unrolling

Plan

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Hierarchical Visibility Map (V-map) Construction on CPU

Parallel V-map Construction on GPU

Strategy 1 – Multiple threads per node Strategy 2 – One thread per node Strategy 3 – Multiple threads per node-pair

Results

Conclusion

Final Note

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Parallel implementation of the view-independent mutual point-pair visibility problem on the modern day GPU (G80/G92 architecture) was presented

Several strategies to achieve the desired parallelism were introduced and the most suitable one chosen

Speed-ups upto 19X were reported while maintaining the quality of results

Blocks of 256 threads achieved optimal GPU performance

Good octree heights essential for high speed-ups and accurate visibility results

By viewing this V-map construction as a “pre-processing” step, photo-realistic global illumination rendering of complex point models have been shown

Parallel V-map construction solution can be thought as an intuition to solve other problems involving computing relationships amongst data at various levels in a hierarchy

Thank You

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Fractal: Mandel Zoom - Satellite Antenna, Mandelbrot set

Problem Statement

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU

Problem Statement

ICVGIP, 2008 Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran

Fast computation of view-independent visibility in a complex scene represented as a point model, using a GPU