Universitat Autònoma de Barcelonarefbase.cvc.uab.es/files/HME2016.pdfPowerPoint Presentation Author...
Transcript of Universitat Autònoma de Barcelonarefbase.cvc.uab.es/files/HME2016.pdfPowerPoint Presentation Author...
CPU: Intel Core i7-5930K GPU: NVIDIA Titan X Drive PX: NVIDIA Tegra X1 Image Size: 1280x480 Disparity: 128 1 single-thread 2 single-socket
GPU: NVIDIA Titan X
Xxxx Xxxx Xxxx
Xxxxx D. Hernandez-Juarez1, A. Chacon1, A. Espinosa1, D. Vazquez2, J. C. Moure1, A. M. Lopez2 Universitat Autònoma de Barcelona
1Computer Architecture & Operating Systems Dpt. 2Computer Vision Center
Real-time 3D Reconstruction for Autonomous Driving via Semi-Global Matching
0 … 0 1 0 0 0 0
62 bits
Disparity Computation
Aggregate costs and estimate disparity from minimum value
Robust and dense computation of depth information from stereo-camera systems is a computationally demanding requirement for real-time autonomous driving. Semi-Global Matching (SGM) [1] approximates heavy-computation global algorithms results but with lower computational complexity, therefore it is a good candidate for a real-time implementation. SGM minimizes energy along several 1D paths across the image. The aim of this work is to provide a real-time system producing reliable results on energy-efficient hardware. Our design runs on a NVIDIA Titan X GPU at 104.62 FPS and on a NVIDIA Drive PX at 6.7 FPS, promising for real-time platforms.
Abstract
Cost Computation
References: [1] H. Hirschmüller, “Stereo Processing by Semiglobal Matching and Mutual Information,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008. Acknowledgements: This work is supported by the Spanish MICINN projects TRA2014-57088-C2-1-R, TIN2014-53234-C2-1-R and by the Secretaria d'Universitats i Recerca del Departament d'Economia i Coneixement de la Generalitat de Catalunya (2014-SGR-1506) and DGT project SPIP2014-01352. Our research is also kindly supported by NVIDIA Corporation in the form of different GPU hardware.
Infer depth from left & right images using triangulation
Cost Computation
Compute 1D smoothing costs in different directions (DP)
Matching Cost
Census Transform
0 … 1 0 0 1 0 1 , 0 … 0 0 0 0 0 1 2 Hamming( ) = # of different bits
1D Smoothing function: Dynamic Programming
𝐿𝑟: smoothed cost in direction r, p: pixel position, d: disparity P1, P2 : penalties for small and high disparity changes
FPS Speed Up FPS / Watt CPU1 SIMD 2.3 1 0.02
GPU Naive 25.4 10.98 0.10
GPU Optimized 104.62 45.49 0.42
NVIDIA Drive PX2 6.7 2.90 0.67
% Time 3.15 % 3.67 % 68.38 % 22.60 % 2.20 % % Instructions 4.37 % 3.34 % 68.62 % 21.77 % 1.90 %
Bandwidth 19 GB/s 234 GB/s 177 GB/s 288 GB/s 12 GB/s
Disparity, d : point (x, y) in 𝐼𝐿 corresponds to
point (x-d, y) in 𝐼𝑅
Epipolar line
Left camera Optical center
Right camera Optical center
0 2 3 9 84 18 11 7 6 2 9 2 2 88 0 0 0 0 0 0 1 11 91 14 5 2 0 2 7 10 81 80 85 3 0 0 0 2 10 101 99 95 12 2 0 3 9 93 90 88 86 87 8 2 4 80 86 84 82 80 81 82 9
0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 C 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0
9x7 window from image (I)
𝐶𝐶𝐶𝐶𝐶𝐶 𝐼 𝑥,𝑦 = �1, 𝑖𝑖 𝐼 𝑥,𝑦 ≥ 𝑐𝐶𝐶𝑐𝑐𝑐𝑐 𝑣𝑐𝑐𝐶𝐶0, 𝑖𝑖 𝐼 𝑥,𝑦 < 𝑐𝐶𝐶𝑐𝑐𝑐𝑐 𝑣𝑐𝑐𝐶𝐶
𝐶𝑜𝐶𝑐𝑥,𝑦,𝑑 = Hamming Distance 𝐶𝐶𝐶𝐶𝐶𝐶( 𝐼𝐿 𝑥,𝑦 ,𝐶𝐶𝐶𝐶𝐶𝐶( 𝐼𝑅 𝑥 − 𝑑,𝑦 ))
Implementation: Hamming(a, b) = bitcount ( a xor b )
Left Image (𝐼𝐿) Rigth Image (𝐼𝑅)
𝑑
𝑑𝑑 +···+
Disparity Image
=
Aggregate Costs
Compute Cost for each pixel (x,y) in 𝐼𝐿 and disparity d in 𝐼𝑅
Y
(x,y)
f
(x,y)
𝐿𝑟 𝑝,𝑑 = 𝐶𝑜𝐶𝑐 𝑝,𝑑 + min
𝐿𝑟(𝑝 − 1,𝑑)𝐿𝑟 𝑝 − 1,𝑑 + 1 + 𝑃1𝐿𝑟 𝑝 − 1,𝑑 − 1 + 𝑃1𝑚𝑖𝐶𝑖𝐿𝑟 𝑝 − 1, 𝑖 + 𝑃2
Cost Function
Census thresholding Census codification
Cost
Cost
Cost
····
All path costs contribute equally
Cost Cost
Median Filter
Remove outliers
Compute Disparity
The image is transformed to help the cost function
𝐶𝑜𝐶𝑐 𝑥,𝑦,𝑑
𝐼𝐿
𝐼𝑅
Stencil pattern in 8 directions 1 kernel per direction
1 warp per plane Use of registers storage
Shuffle for intra-warp communication
d
𝐿𝑟(𝑝 − 1) 𝐿𝑟(𝑝,𝑑)
d-1
d+1
𝐶𝐿
𝐶𝑅
𝐶𝐿
𝐶𝑅
Disparity Image
Filtered Image
Conclusions: - Semi-global matching can be used for real-time 3D reconstruction. - Need new strategies to get real-time performance for NVIDIA Drive PX. - NVIDIA Drive PX has 1.57x better energetic efficiency than high-end GPUs.
Census Transform 𝐼𝐿 Cost
Computation Disparity
Computation Matching
Cost Median Filter
Census Transform 𝐼𝑅
Find minimum disparity
Pipeline Problem: 3D Reconstruction
Census Transform
Matching Cost: Hamming Distance
Results
GPU Implementation Details