Towards Real-Time Stereoscopic Image Rectification for 3D ... · warping distortion must be...

1
Stereo Vision systems are used in order to obtain Three Dimensional instantaneous depth information from the world, a human’s binocular vision is such an example. In order to merge both left and right views and extract the depth from a scene, a human’s eyes must be aligned. The same principle applies when 3D stereo images are generated using a pair of images taken from different viewpoints of the scene. In order to ensure that corresponding objects are aligned in both left and right images, a technique called Stereo Rectification is utilised. Planar homographies (H and H') are applied to warp both images accordingly (see Fig. 1). The Epipolar Geometry between both images is analysed using matched corresponding feature points, H and H‘ can then be derived from the Fundamental Matrix (F). In order to realistically recreate the depth illusion inherent in stereoscopic 3D visualisation, the warping distortion must be minimised during the rectification process. This work presents a distortion minimisation strategy that is significantly more computationally efficient than the previous state of the art method found in the literature. Of specific interest is the synthesis of stereoscopic 3D video content from a monocular video sequence, which can also be relevant to Augmented Reality applications. Towards Real-Time Stereoscopic Image Rectification for 3D Visualization Tony Marrero Barroso Aubrey K. Dunne John Mallon Paul F. Whelan CIPA (Centre for Image Processing and Analysis) Dublin City University, Ireland www.cipa.dcu.ie Introduction a a' b b' c c' d' d Figure 1: The Stereo Rectification Process. Left hand diagram shows the initial triangulation scene. The second image illustrates the image rectification results, all corresponding features become horizontally aligned. A B C D The implemented stereo rectification approach minimises horizontal distortions on the rectified image pairs by analysing the Jacobian (J) of the pixel’s transformation across all local neighbourhoods of the image. The method attempts to make the singular values (σ 1 ,σ 2 ) of these Jacobians as close to one as possible, and thus reduce distortions by minimising local scale re-sampling effects. This in turn ensures that the aspect ratio and orthogonality of the rectified images remain near the ideal values of 1 and 90° respectively. To find the appropriate variables of the rectifying homographies, the following cost function is minimised. The state of the art technique minimises this cost function using the unconstrained Nelder-Mead simplex search method. Our approach facilitates the use of more efficient non-linear optimisation techniques by the development of analytical constraints that relate the cost function to the unknown variables. These techniques can then avail of full analytical derivatives to optimise the search. Three standard minimisation algorithms (Gradient Descent , Gauss-Newton and Levenberg- Marquardt ) were implemented in C to test the performance and robustness of the developed mathematical constraints. A rigorous comparison was made between our implementations and the results obtained using the Nelder-Mead method of Numerical Recipes in C. To put the results in perspective, the popular L-M minimisation method of Minpack was also compared. The tests compared the efficiency of the minimisation along with the accuracy. Distortion Minimisation Results This research was supported by the Irish Research Council for Science, Engineering and Technology (IRCSET), funded by the National Development Plan. Pipeline Stereo Image Pair Input SURF Feature Extraction SURF Feature Matching Epipolar Geometry Calculation Distortion Minimisation Rectification 3D Visualisation H HRobustness Performance ( ) ( ) ( ) [ ] = + = N i a a f 1 2 2 2 1 12 11 1 1 ˆ , ˆ σ σ Algorithm Accuracy Error (%) Cost Function Evaluations Time (ms) Gradient Descent (G-D) 6.125 7 0.065 Gauss-Newton (G-N) 3.625 17 0.113 Levenberg-Marquardt (L-M) 4.920 16 0.110 Minpack’s L-M 4.180 54 0.141 Nelder-Mead 7.133 60 0.785 Table 1: Comparison of the average results of all the minimisation tests performed. Figure 3: Accuracy of Convergence; plots comparing the accuracy of the Nelder-Mead method with our three implemented minimisation algorithms. With the developed mathematical constraints, these three algorithms were able to converge more consistently and accurately than the Nelder-Mead search method under varying conditions. Figure 4: Time of Convergence; plots comparing the computational efficiency of the Nelder-Mead method with our three implemented minimisation algorithms. The Nelder-Mead search method had a significantly inferior performance under most of the tested conditions. The G-D method in some conditions was even up to 1200% times faster. Figure 5: Stereoscopic 3D Anaglyph image without distortion minimisation. The left and right images cannot be merged effectively due to the horizontal shearing distortions. Figure 6: Stereoscopic 3D Anaglyph image with distortion minimisation. All objects are also aligned, but the improvement in the 3D visualisation that occurs after the distortion minimisation is considerable. Acknowledgements OpenCV is used to extract the SURF (Speeded Up Robust Features) points from the left and right images The points are initially matched using a naive nearest neighbour classifier and many mismatches occur. The Fundamental Matrix is calculated using RANSAC, outliers are removed and only the best point matches are used. From the F matrix, rectifying homographies are found and applied. These are the results if distortion minimisation is not considered. After the distortion minimisation both the aspect ratio and orthogonality improve. These error metrics can have a significant impact on the quality of the visualisation. Figure 2: Image sequence of our implemented rectification pipeline.

Transcript of Towards Real-Time Stereoscopic Image Rectification for 3D ... · warping distortion must be...

Page 1: Towards Real-Time Stereoscopic Image Rectification for 3D ... · warping distortion must be minimised during the rectification process. • This work presents a distortion minimisation

• Stereo Vision systems are used in order to obtain Three Dimensional instantaneous depthinformation from the world, a human’s binocular vision is such an example.

• In order to merge both left and right views and extract the depth from a scene, a human’s eyes mustbe aligned. The same principle applies when 3D stereo images are generated using a pair ofimages taken from different viewpoints of the scene.

• In order to ensure that corresponding objects are aligned in both left and right images, a techniquecalled Stereo Rectification is utilised. Planar homographies (H and H') are applied to warp bothimages accordingly (see Fig. 1).

• The Epipolar Geometry between both images is analysed using matched corresponding featurepoints, H and H‘ can then be derived from the Fundamental Matrix (F).

• In order to realistically recreate the depth illusion inherent in stereoscopic 3D visualisation, thewarping distortion must be minimised during the rectification process.

• This work presents a distortion minimisation strategy that is significantly more computationallyefficient than the previous state of the art method found in the literature.

• Of specific interest is the synthesis of stereoscopic 3D video content from a monocular videosequence, which can also be relevant to Augmented Reality applications.

Towards Real-Time Stereoscopic Image Rectification for 3D VisualizationTony Marrero Barroso Aubrey K. Dunne John Mallon Paul F. Whelan

CIPA (Centre for Image Processing and Analysis)Dublin City University, Ireland

www.cipa.dcu.ie

Introduction

a a'

b b'c c'

d'd

Figure 1: The Stereo Rectification Process. Left hand diagram shows the initial triangulation scene. The secondimage illustrates the image rectification results, all corresponding features become horizontally aligned.

A

BC

D

• The implemented stereo rectification approach minimises horizontal distortions on therectified image pairs by analysing the Jacobian (J) of the pixel’s transformation across alllocal neighbourhoods of the image.

• The method attempts to make the singular values (σ1,σ2) of these Jacobians as close to oneas possible, and thus reduce distortions by minimising local scale re-sampling effects. This inturn ensures that the aspect ratio and orthogonality of the rectified images remain near theideal values of 1 and 90° respectively.

• To find the appropriate variables of the rectifying homographies, the following cost function isminimised. The state of the art technique minimises this cost function using theunconstrained Nelder-Mead simplex search method.

• Our approach facilitates the use of more efficient non-linear optimisation techniques by thedevelopment of analytical constraints that relate the cost function to the unknown variables.These techniques can then avail of full analytical derivatives to optimise the search.

• Three standard minimisation algorithms (Gradient Descent, Gauss-Newton and Levenberg-Marquardt) were implemented in C to test the performance and robustness of the developedmathematical constraints. A rigorous comparison was made between our implementationsand the results obtained using the Nelder-Mead method of Numerical Recipes in C.

• To put the results in perspective, the popular L-M minimisation method of Minpack was alsocompared. The tests compared the efficiency of the minimisation along with the accuracy.

Distortion Minimisation

Results

This research was supported by the Irish Research Council for Science, Engineering andTechnology (IRCSET), funded by the National Development Plan.

Pipeline

Stereo Image Pair

Input

SURF Feature

Extraction

SURF Feature

Matching

EpipolarGeometry

Calculation

Distortion Minimisation

Rectification

3D Visualisation

H

H′

Robustness Performance

( ) ( ) ( )[ ]∑=

−+−=N

iaaf

1

22

211211 11ˆ,ˆ σσ

Algorithm Accuracy Error (%)

Cost Function Evaluations

Time (ms)

Gradient Descent (G-D) 6.125 7 0.065Gauss-Newton (G-N) 3.625 17 0.113Levenberg-Marquardt (L-M) 4.920 16 0.110Minpack’s L-M 4.180 54 0.141Nelder-Mead 7.133 60 0.785

Table 1: Comparison of the average results of all the minimisation tests performed.

Figure 3: Accuracy of Convergence; plots comparing the accuracy of the Nelder-Mead method with our threeimplemented minimisation algorithms. With the developed mathematical constraints, these three algorithms were ableto converge more consistently and accurately than the Nelder-Mead search method under varying conditions.

Figure 4: Time of Convergence; plots comparing the computational efficiency of the Nelder-Mead method with ourthree implemented minimisation algorithms. The Nelder-Mead search method had a significantly inferior performanceunder most of the tested conditions. The G-D method in some conditions was even up to 1200% times faster.

Figure 5: Stereoscopic 3D Anaglyph image withoutdistortion minimisation. The left and right images cannotbe merged effectively due to the horizontal shearingdistortions.

Figure 6: Stereoscopic 3D Anaglyph image withdistortion minimisation. All objects are also aligned, butthe improvement in the 3D visualisation that occursafter the distortion minimisation is considerable.

Acknowledgements

OpenCV is used toextract the SURF(Speeded UpRobust Features)points from the leftand right images

The points areinitially matchedusing a naivenearest neighbourclassifier and manymismatches occur.

The FundamentalMatrix is calculatedusing RANSAC,outliers areremoved and onlythe best pointmatches are used.

From the F matrix,rectifyinghomographies arefound and applied.These are theresults if distortionminimisation is notconsidered.

After the distortionminimisation boththe aspect ratioand orthogonalityimprove. Theseerror metrics canhave a significantimpact on thequality of thevisualisation.

Figure 2: Image sequence of our implemented rectification pipeline.