A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics...
-
Upload
sheila-atherley -
Category
Documents
-
view
214 -
download
0
Transcript of A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics...
![Page 1: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/1.jpg)
A Hardware Processing Unit For Point Sets
S. Heinzle, G. Guennebaud,M. Botsch, M. Gross
Graphics Hardware 2008
![Page 2: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/2.jpg)
Motivation
• Point-based graphics established• Powerful algorithms
– Representation– Processing– Manipulation– Rendering
• Decomposition– Get neighborhood– Operate on neighbors
Graphics Hardware 2008 2
![Page 3: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/3.jpg)
Motivation
• GPUs not suited for getting neighborhood– SIMD – Incoherent branching– Dynamic data structures
slow– Recursive calls not
supported
• CPUs– Small number of FPUs– Inflexible memory caches
Graphics Hardware 2008 3
Courtesy of NVIDIA
Courtesy of Intel
![Page 4: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/4.jpg)
Contributions
• Hardware architecture for point sets– Neighbor search module– Novel advanced caching mechanism– Reconfigurable processing module– Programmability using FPGA compiler
• FPGA prototype and measurements• Small & Lean
Integration into multi-core CPU/GPU possible
Graphics Hardware 2008 4
![Page 5: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/5.jpg)
Outline
• Related Work• Spatial Searching and Caching• Architecture and Prototype• Results• Conclusion
Graphics Hardware 2008 5
![Page 6: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/6.jpg)
Related Work
Kd-Tree[Bentley 75]
Graphics Hardware 2008 6
kNN on GPUs[Ma and McCool 02]
Kd-Tree Hardware[Woop et al. 05][Woop et al. 06]
Kd-Tree on GPUs[Popov et al. 07]
![Page 7: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/7.jpg)
Related Work
Adaptive SPH Fluid Simulation[Adams et al. ‘07]
Graphics Hardware 2008 7
Linear Moving Least Squares,[Adamson and Alexa ’04]
Algebraic Moving Least Squares, [Guennebaud and Gross ‘07]
![Page 8: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/8.jpg)
Linear Moving Least Squares
Graphics Hardware 2008 8
• Implicit surface definition defined by set of points
![Page 9: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/9.jpg)
Linear Moving Least Squares
Graphics Hardware 2008 9
x
• Implicit surface definition defined by set of points
![Page 10: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/10.jpg)
Linear Moving Least Squares
Graphics Hardware 2008 10
10
x
pi
ni
![Page 11: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/11.jpg)
Linear Moving Least Squares
Graphics Hardware 2008 11
x
• Iterative projections onto plane
![Page 12: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/12.jpg)
Linear Moving Least Squares
Graphics Hardware 2008 12
x
• Iterative projections onto plane
x’
’
![Page 13: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/13.jpg)
Linear Moving Least Squares
Graphics Hardware 2008 13
x
• Iterative projections onto plane
x’’
’ ’
![Page 14: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/14.jpg)
Linear Moving Least Squares
Graphics Hardware 2008 14
x
• Iterative projections onto plane
x’’’
’ ’ ’
![Page 15: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/15.jpg)
Linear Moving Least Squares
Graphics Hardware 2008 15
x
• Surface defined by points projecting onto themselves
![Page 16: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/16.jpg)
Outline
• Related Work• Spatial Searching and Caching• Architecture & Prototype• Results• Conclusion
Graphics Hardware 2008 16
![Page 17: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/17.jpg)
Spatial Search
• Spatial search: kNN and NN– Common in most point operations– Based on kd-tree
• Example NN:
Graphics Hardware 2008 17
![Page 18: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/18.jpg)
Spatial Search
• kNN search similar to NN search:– Start with infinite radius– Sort leaf points into priority queue– Shrink radius with every point sorted
Graphics Hardware 2008 18
![Page 19: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/19.jpg)
Coherent Neighbor Cache(NN)
• Find neighbors in slightly bigger radius• Re-use result for spatially close query
Graphics Hardware 2008 19
Re-use if
![Page 20: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/20.jpg)
Coherent Neighbor Cache
(kNN, exact)• Find (k+1) neighbors• Re-use result for spatially close query
Graphics Hardware 2008 20
Re-use if
![Page 21: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/21.jpg)
Coherent Neighbor Cache
(kNN, approximation)• Approximation error
– Enlarge radius
Graphics Hardware 2008 21
Re-use if
![Page 22: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/22.jpg)
Outline
• Related Work• Spatial Searching and Caching• Architecture & Prototype• Results• Conclusion
Graphics Hardware 2008 22
![Page 23: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/23.jpg)
The Architecture
Graphics Hardware 2008 23
Host
![Page 24: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/24.jpg)
• Eight cached neighborhoods• Problem: parallel queries in kd-tree
module Interleave spatially similar queries
Coherent Neighbor Cache
Graphics Hardware 2008 24
1 1 1
0 0 0
n n n
![Page 25: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/25.jpg)
Kd-Tree Traversal
Graphics Hardware 2008 25
![Page 26: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/26.jpg)
Graphics Hardware 2008 26
• Kd-tree structure on chip• 16 threads• Pipelining and multi-threading
NodeRecurs
e
![Page 27: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/27.jpg)
Stacks
• 16 stacks• Parallel read/write• Bounded in depth
• 6 bytes per thread per recursion
Graphics Hardware 2008 27
![Page 28: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/28.jpg)
Leaf
• 16 parallel priority queues (1-cycle ops)• Queues store pointers and distances• Bandwidth bottleneck
Graphics Hardware 2008 28
![Page 29: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/29.jpg)
• Multithreaded quad-port bank of 16 registers
• 128 threads• Programmability using FPGA-technology
Processing Module
Graphics Hardware 2008 29
![Page 30: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/30.jpg)
Further Data
• Implemented on two FPGAs– 64 bit DDR DRAM– Interconnection: no overhead
• Resource usage regs and LUTs– Virtex 2 Pro 100 (kNN):
26% registers, 38% LUTs– Virtex 2 Pro 70 (MLS):
47% registers, 52% LUTs
• Clock frequency: 75 MHz
Graphics Hardware 2008 30
![Page 31: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/31.jpg)
Outline
• Related Work• Spatial Searching and Caching• Architecture & Prototype• Results• Conclusion
Graphics Hardware 2008 31
![Page 32: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/32.jpg)
Applications
• Tested on various applications
• PCI interface of prototype slow
Graphics Hardware 2008 32
[Weyrich et al. 04]
[Adams et al. 07]
![Page 33: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/33.jpg)
Results kNN
Graphics Hardware 2008 33
CUDA: x4
CPU: x1.5
FPGA: x1
CUDA: x2.4
CPU: x1.4
FPGA: x1
CUDA w/o sort: x4.0
CUDA: x1.6CPU: x1.1
FPGA: x1
CUDA w/o sort: x3.1
75 MHz
1200 MHz2200 MHz
Number of Neighbors
Nu
mb
er
of
qu
eri
es
ASIC estimate, 500 MHzx6.6
![Page 34: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/34.jpg)
Results kNN
Graphics Hardware 2008 34
CUDA: x4
CPU: x1.5
FPGA: x1
CUDA: x2.4
CPU: x1.4
FPGA: x1
CUDA w/o sort: x4.0
CUDA: x1.6CPU: x1.1
FPGA: x1
CUDA w/o sort: x3.1
75 MHz
1200 MHz2200 MHz
Number of Neighbors
Nu
mb
er
of
qu
eri
es
ASIC estimate, 500 MHzx6.6
• Small hardware footprint • FPGA slightly slower• Realistic clock frequency
Prototype faster than CPU/GPU
![Page 35: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/35.jpg)
Results MLS
Graphics Hardware 2008 35
FPGA: x1
MLS CPU: x0.4
MLS CUDA x3.8
75 MHz
1200 MHz2200 MHz
Number of Neighbors
Nu
mb
er
of
qu
eri
es
FPGA faster than CPU
kNN bottleneck – FPGA– GPU
![Page 36: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/36.jpg)
Coherent Neighbor Cache
Graphics Hardware 2008 36
CPU,=0.1
FPGA, exact
FPGA,=0.1
Level of coherence
Nu
mb
er
of
qu
eri
es
![Page 37: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/37.jpg)
Results Approximation Error (MLS projection)
Graphics Hardware 2008 37
approximation
MLS
Err
or
no approx.
![Page 38: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/38.jpg)
Results Approximation Error (MLS projection)
Graphics Hardware 2008 38
Cache hits
Cach
e H
its
approximation
![Page 39: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/39.jpg)
Approximation Error (visual)
Graphics Hardware 2008 39
![Page 40: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/40.jpg)
Approximation Error (visual)
Graphics Hardware 2008 40
Coherent Neighbor Cache:
• Not optimal for exact queries
• Approximate queries – Can be tolerated in most
cases– Greatly increases
performance– Even for small
approximations
![Page 41: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/41.jpg)
Outline
• Related Work• Spatial Searching and Caching• Architecture & Prototype• Results• Conclusion
Graphics Hardware 2008 41
![Page 42: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/42.jpg)
Conclusion
• Novel hardware architecture for – Nearest-neighbor searches– Generic meshless processing operators
• Cache exploiting spatial coherence• Good performance considering resources• Possible GPU integration
Graphics Hardware 2008 42
![Page 43: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/43.jpg)
Future Work
• Programmable data structure– Support different data structures– Programmability in data structure– Construction on-chip
• ‘Real’ programmability in point processing module
Graphics Hardware 2008 43
![Page 44: A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062621/551be8bd550346c3588b619d/html5/thumbnails/44.jpg)
A Hardware Processing Unit For Point Sets
S. Heinzle, G. Guennebaud,M. Botsch, M. Gross
Graphics Hardware 2008