Universitat Autònoma de Barcelonarefbase.cvc.uab.es/files/HME2016.pdfPowerPoint Presentation Author...

1
CPU: Intel Core i7-5930K GPU: NVIDIA Titan X Drive PX: NVIDIA Tegra X1 Image Size: 1280x480 Disparity: 128 1 single-thread 2 single-socket GPU: NVIDIA Titan X Xxxx Xxxx Xxxx Xxxxx D. Hernandez-Juarez 1 , A. Chacon 1 , A. Espinosa 1 , D. Vazquez 2 , J. C. Moure 1 , A. M. Lopez 2 Universitat Autònoma de Barcelona 1 Computer Architecture & Operating Systems Dpt. 2 Computer Vision Center Real-time 3D Reconstruction for Autonomous Driving via Semi-Global Matching 0 0 1 0 0 0 0 62 bits Disparity Computation Aggregate costs and estimate disparity from minimum value Robust and dense computation of depth information from stereo-camera systems is a computationally demanding requirement for real-time autonomous driving. Semi-Global Matching (SGM) [1] approximates heavy-computation global algorithms results but with lower computational complexity, therefore it is a good candidate for a real-time implementation. SGM minimizes energy along several 1D paths across the image. The aim of this work is to provide a real-time system producing reliable results on energy-efficient hardware. Our design runs on a NVIDIA Titan X GPU at 104.62 FPS and on a NVIDIA Drive PX at 6.7 FPS, promising for real-time platforms. Abstract Cost Computation References: [1] H. Hirschmüller, “Stereo Processing by Semiglobal Matching and Mutual Information,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008. Acknowledgements: This work is supported by the Spanish MICINN projects TRA2014-57088-C2-1-R, TIN2014-53234-C2-1-R and by the Secretaria d'Universitats i Recerca del Departament d'Economia i Coneixement de la Generalitat de Catalunya (2014-SGR-1506) and DGT project SPIP2014-01352. Our research is also kindly supported by NVIDIA Corporation in the form of different GPU hardware. Infer depth from left & right images using triangulation Cost Computation Compute 1D smoothing costs in different directions (DP) Matching Cost Census Transform 0 1 0 0 1 0 1 , 0 0 0 0 0 0 1 2 Hamming( ) = # of different bits 1D Smoothing function: Dynamic Programming : smoothed cost in direction r, p: pixel position, d: disparity P 1 , P 2 : penalties for small and high disparity changes FPS Speed Up FPS / Watt CPU 1 SIMD 2.3 1 0.02 GPU Naive 25.4 10.98 0.10 GPU Optimized 104.62 45.49 0.42 NVIDIA Drive PX 2 6.7 2.90 0.67 % Time 3.15 % 3.67 % 68.38 % 22.60 % 2.20 % % Instructions 4.37 % 3.34 % 68.62 % 21.77 % 1.90 % Bandwidth 19 GB/s 234 GB/s 177 GB/s 288 GB/s 12 GB/s Disparity, d : point (x, y) in corresponds to point (x-d, y) in Epipolar line Left camera Optical center Right camera Optical center 0 2 3 9 84 18 11 7 6 2 9 2 2 88 0 0 0 0 0 0 1 11 91 14 5 2 0 2 7 10 81 80 85 3 0 0 0 2 10 101 99 95 12 2 0 3 9 93 90 88 86 87 8 2 4 80 86 84 82 80 81 82 9 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 C 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 9x7 window from image (I) , = 1, , 0, , < ,, = Hamming Distance ( , , ( , )) Implementation: Hamming(a, b) = bitcount ( a xor b ) Left Image ( ) Rigth Image ( ) +···+ Disparity Image = Aggregate Costs Compute Cost for each pixel (x,y) in and disparity d in Y (x,y) f (x,y) , = , + min (1, ) 1, +1 + 1 1, 1 + 1 1, + 2 Cost Function Census thresholding Census codification Cost Cost Cost ···· All path costs contribute equally Cost Cost Median Filter Remove outliers Compute Disparity The image is transformed to help the cost function ,, Stencil pattern in 8 directions 1 kernel per direction 1 warp per plane Use of registers storage Shuffle for intra-warp communication d (1) (, ) d-1 d+1 Disparity Image Filtered Image Conclusions: - Semi-global matching can be used for real-time 3D reconstruction. - Need new strategies to get real-time performance for NVIDIA Drive PX. - NVIDIA Drive PX has 1.57x better energetic efficiency than high-end GPUs. Census Transform Cost Computation Disparity Computation Matching Cost Median Filter Census Transform Find minimum disparity Pipeline Problem: 3D Reconstruction Census Transform Matching Cost: Hamming Distance Results GPU Implementation Details

Transcript of Universitat Autònoma de Barcelonarefbase.cvc.uab.es/files/HME2016.pdfPowerPoint Presentation Author...

Page 1: Universitat Autònoma de Barcelonarefbase.cvc.uab.es/files/HME2016.pdfPowerPoint Presentation Author Kieran Fairnington Created Date 1/25/2016 7:34:42 PM ...

CPU: Intel Core i7-5930K GPU: NVIDIA Titan X Drive PX: NVIDIA Tegra X1 Image Size: 1280x480 Disparity: 128 1 single-thread 2 single-socket

GPU: NVIDIA Titan X

Xxxx Xxxx Xxxx

Xxxxx D. Hernandez-Juarez1, A. Chacon1, A. Espinosa1, D. Vazquez2, J. C. Moure1, A. M. Lopez2 Universitat Autònoma de Barcelona

1Computer Architecture & Operating Systems Dpt. 2Computer Vision Center

Real-time 3D Reconstruction for Autonomous Driving via Semi-Global Matching

0 … 0 1 0 0 0 0

62 bits

Disparity Computation

Aggregate costs and estimate disparity from minimum value

Robust and dense computation of depth information from stereo-camera systems is a computationally demanding requirement for real-time autonomous driving. Semi-Global Matching (SGM) [1] approximates heavy-computation global algorithms results but with lower computational complexity, therefore it is a good candidate for a real-time implementation. SGM minimizes energy along several 1D paths across the image. The aim of this work is to provide a real-time system producing reliable results on energy-efficient hardware. Our design runs on a NVIDIA Titan X GPU at 104.62 FPS and on a NVIDIA Drive PX at 6.7 FPS, promising for real-time platforms.

Abstract

Cost Computation

References: [1] H. Hirschmüller, “Stereo Processing by Semiglobal Matching and Mutual Information,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008. Acknowledgements: This work is supported by the Spanish MICINN projects TRA2014-57088-C2-1-R, TIN2014-53234-C2-1-R and by the Secretaria d'Universitats i Recerca del Departament d'Economia i Coneixement de la Generalitat de Catalunya (2014-SGR-1506) and DGT project SPIP2014-01352. Our research is also kindly supported by NVIDIA Corporation in the form of different GPU hardware.

Infer depth from left & right images using triangulation

Cost Computation

Compute 1D smoothing costs in different directions (DP)

Matching Cost

Census Transform

0 … 1 0 0 1 0 1 , 0 … 0 0 0 0 0 1 2 Hamming( ) = # of different bits

1D Smoothing function: Dynamic Programming

𝐿𝑟: smoothed cost in direction r, p: pixel position, d: disparity P1, P2 : penalties for small and high disparity changes

FPS Speed Up FPS / Watt CPU1 SIMD 2.3 1 0.02

GPU Naive 25.4 10.98 0.10

GPU Optimized 104.62 45.49 0.42

NVIDIA Drive PX2 6.7 2.90 0.67

% Time 3.15 % 3.67 % 68.38 % 22.60 % 2.20 % % Instructions 4.37 % 3.34 % 68.62 % 21.77 % 1.90 %

Bandwidth 19 GB/s 234 GB/s 177 GB/s 288 GB/s 12 GB/s

Disparity, d : point (x, y) in 𝐼𝐿 corresponds to

point (x-d, y) in 𝐼𝑅

Epipolar line

Left camera Optical center

Right camera Optical center

0 2 3 9 84 18 11 7 6 2 9 2 2 88 0 0 0 0 0 0 1 11 91 14 5 2 0 2 7 10 81 80 85 3 0 0 0 2 10 101 99 95 12 2 0 3 9 93 90 88 86 87 8 2 4 80 86 84 82 80 81 82 9

0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 C 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0

9x7 window from image (I)

𝐶𝐶𝐶𝐶𝐶𝐶 𝐼 𝑥,𝑦 = �1, 𝑖𝑖 𝐼 𝑥,𝑦 ≥ 𝑐𝐶𝐶𝑐𝑐𝑐𝑐 𝑣𝑐𝑐𝐶𝐶0, 𝑖𝑖 𝐼 𝑥,𝑦 < 𝑐𝐶𝐶𝑐𝑐𝑐𝑐 𝑣𝑐𝑐𝐶𝐶

𝐶𝑜𝐶𝑐𝑥,𝑦,𝑑 = Hamming Distance 𝐶𝐶𝐶𝐶𝐶𝐶( 𝐼𝐿 𝑥,𝑦 ,𝐶𝐶𝐶𝐶𝐶𝐶( 𝐼𝑅 𝑥 − 𝑑,𝑦 ))

Implementation: Hamming(a, b) = bitcount ( a xor b )

Left Image (𝐼𝐿) Rigth Image (𝐼𝑅)

𝑑

𝑑𝑑 +···+

Disparity Image

=

Aggregate Costs

Compute Cost for each pixel (x,y) in 𝐼𝐿 and disparity d in 𝐼𝑅

Y

(x,y)

f

(x,y)

𝐿𝑟 𝑝,𝑑 = 𝐶𝑜𝐶𝑐 𝑝,𝑑 + min

𝐿𝑟(𝑝 − 1,𝑑)𝐿𝑟 𝑝 − 1,𝑑 + 1 + 𝑃1𝐿𝑟 𝑝 − 1,𝑑 − 1 + 𝑃1𝑚𝑖𝐶𝑖𝐿𝑟 𝑝 − 1, 𝑖 + 𝑃2

Cost Function

Census thresholding Census codification

Cost

Cost

Cost

····

All path costs contribute equally

Cost Cost

Median Filter

Remove outliers

Compute Disparity

The image is transformed to help the cost function

𝐶𝑜𝐶𝑐 𝑥,𝑦,𝑑

𝐼𝐿

𝐼𝑅

Stencil pattern in 8 directions 1 kernel per direction

1 warp per plane Use of registers storage

Shuffle for intra-warp communication

d

𝐿𝑟(𝑝 − 1) 𝐿𝑟(𝑝,𝑑)

d-1

d+1

𝐶𝐿

𝐶𝑅

𝐶𝐿

𝐶𝑅

Disparity Image

Filtered Image

Conclusions: - Semi-global matching can be used for real-time 3D reconstruction. - Need new strategies to get real-time performance for NVIDIA Drive PX. - NVIDIA Drive PX has 1.57x better energetic efficiency than high-end GPUs.

Census Transform 𝐼𝐿 Cost

Computation Disparity

Computation Matching

Cost Median Filter

Census Transform 𝐼𝑅

Find minimum disparity

Pipeline Problem: 3D Reconstruction

Census Transform

Matching Cost: Hamming Distance

Results

GPU Implementation Details