Stereo Vision John Morris Vision Research in CITR.

38
Stereo Vision John Morris Vision Research in CITR

Transcript of Stereo Vision John Morris Vision Research in CITR.

Page 1: Stereo Vision John Morris Vision Research in CITR.

Stereo Vision

John Morris

Vision Research in CITR

Page 2: Stereo Vision John Morris Vision Research in CITR.

2

Basics

• A single image has no depth information− Humans infer depth from ‘clues’ in the scene

but

− These are ambiguous

• Stereo vision systems take two images of a scenefrom different viewpoints− Usually referred to as left and right images

• Left and right images are slightly different• Disparity is

Displacement of corresponding points from one image to the other

• From the disparity, we can calculate depth

Page 3: Stereo Vision John Morris Vision Research in CITR.

3

Stereo Vision - Basics

Two cameras: Left and RightOptical centres: OL and OR

Virtual image plane is projection of actual image plane through optical centreBaseline, b, is the separation between the optical centresScene Point, P, imaged at pL and pR

pL = 9pR = 3Disparity, d = pR – PL = 6

Disparity is the amount by which the two images of P are displaced relative to each other

Depth, z = bf

pd

p = pixel width

Page 4: Stereo Vision John Morris Vision Research in CITR.

4

Motivation - Applications

• Stereo Vision has many applications− Aerial Mapping− Forensics - Crime Scenes, Traffic Accidents− Mining - Mine face measurement− Civil Engineering - Structure monitoring− Collision Avoidance

− Real-time performance needed− Depth accuracy critical

− Manufacturing− Process control − Process monitoring

− General Photogrammetry− Any non contact measurement

Page 5: Stereo Vision John Morris Vision Research in CITR.

5

Motivation - Advantages

Example

• Collision avoidance− Why stereo?− RADAR keeps airplanes from colliding− SONAR

− Keeps soccer-playing robots from fouling each other− Guides your automatic vacuum cleaner

− Active methods are fine for `sparse’ environments− Airplane density isn’t too large − Only 5 robots / team − Only one vacuum cleaner

Page 6: Stereo Vision John Morris Vision Research in CITR.

6

Motivation - Advantages

• Collision avoidance− What about Seoul (Bangkok, London, New York, …) traffic?

− How many vehicles can rely upon active methods?− Reflected pulse is many dB below probe pulse!− What fraction of other vehicles can use the same active

method before even the most sophisticated detectors get confused?(and car insurance becomes unaffordable )

− Sonar, in particular, is subject to considerable environmental noise also

− Passive methods (sensor only) are the only ‘safe’ solution− In fact, with stereo, one technique for resolving

problems may be assisted by environmental noise!

Page 7: Stereo Vision John Morris Vision Research in CITR.

7

Stereo Vision

• Goal− Recovery of 3D scene structure

− using two or more images,

− each acquired from a different viewpoint in space

− Using multiple cameras or one moving camera

− Term binocular vision is used when two cameras are employed

StereophotogrammetryUsing stereo vision systems to measure properties (dimensions here) of a scene

Page 8: Stereo Vision John Morris Vision Research in CITR.

8

Stereo Vision - Terminology

Fixation point Point of intersection of the optical axes of the two cameras

Baseline Distance between the camera optical centres

Epipolar plane Plane passing through the optical centres and a point in the scene

Epipolar line Intersection of the epipolar plane with the image plane

Conjugate pairCorresponding points

Matching points

Point in the scene that is visible to both cameras (binocularly visible) will be projected to a pair of points in the two images

Disparity Distance between corresponding points when the two images are superimposed

Disparity map Disparities of all points form the disparity mapUsual output from a stereo matching algorithm

Often displayed as an image

Page 9: Stereo Vision John Morris Vision Research in CITR.

9

Stereo Vision

• Camera configuration

• Parallel opticalaxes

Page 10: Stereo Vision John Morris Vision Research in CITR.

10

Stereo Vision

• Camera configuration

• Verging opticalaxes

Note that if the cameras are aligned so that the scanlines of both cameras lie in the epipolar planes,then matching pixels must lie in the same scanline on both images.This is the epipolar constraint.

Page 11: Stereo Vision John Morris Vision Research in CITR.

11

Triangulation

Basic Principle

− Any visible point in the scene must lie on the straight line that passes through the optical centre (centre of projection) and the projection (image) of the point on the image plane

− Binocular stereo vision calculates the position of a scene point by finding the intersection of the two lines passing through the optical centres and the projection of the point in each image

Page 12: Stereo Vision John Morris Vision Research in CITR.

12

Stereo Vision

• Two problems

− Correspondence problem.

− Reconstruction problem.

• Correspondence problem

− Finding pairs of matched points in each image that are projections of the same scene point

− Triangulation depends on solution of the correspondence problem

Page 13: Stereo Vision John Morris Vision Research in CITR.

13

Stereo Vision• Correspondence problem

− Ambiguous correspondence between points in the two images may lead to several different consistent interpretations of the scene

− Problem is fundamentally ill-posed

Actual scene points

Possible scene points

Page 14: Stereo Vision John Morris Vision Research in CITR.

14

Reconstruction

− Having found the corresponding points, we can compute the disparity map

− Disparity maps are commonly expressed in pixels

ie number of pixels between corresponding points in two images

− Disparity map can be converted to a 3D map of the scene if the geometry of the imaging system is known

− Critical parameters: Baseline, camera focal length, pixel size

Page 15: Stereo Vision John Morris Vision Research in CITR.

15

Reconstruction

• Determining depth− In a coordinate space based on the optical centre of the left camera:

− A scene point, P = ( Xl , Yl , Zl ) is projected on to the image plane at ( xl , yl ) where

− Similarly, in a coordinate space based on the optical centre of the right camera:

− A scene point, P = ( Xr , Yr , Zr ) is projected on to the image plane at ( xr , yr ) where

Page 16: Stereo Vision John Morris Vision Research in CITR.

16

Reconstruction

• Determining depth

− To recover the position of P from its projections, pl and pr :

− Cameras are related by a rotation, R, and a translation, T:

− Canonical configuration:

− Parallel camera optical axes Zr = Zl = Z and Xr = Xl – T so we have:

where

d = xl – xr

is the disparity - the difference in position between the corresponding points in the two images, commonly measured in pixels

Page 17: Stereo Vision John Morris Vision Research in CITR.

17

Reconstruction

• Recovering depth

where T is the baseline

If disparity, d’, is measured in pixels,then

d = xl – xr = d’p where p is the width of a

pixel in the image plane,then we have

Z = Tf / d’p

Note the reciprocal relationship between disparity and depth!This is particularly relevant when considering the accuracy of stereo photogrammetry

Page 18: Stereo Vision John Morris Vision Research in CITR.

18

Stereo Camera Configuration• Standard Case

Two cameras with parallel optical axesb baseline (camera separation) camera angular FoVDsens sensor widthn number of pixelsp pixel widthf focal lengtha object extentD distance to object

Page 19: Stereo Vision John Morris Vision Research in CITR.

19

Stereo Camera Configuration

• Canonical configuration− Two cameras with parallel optical

axes

• Rays are drawn through each pixel in the image

• Ray intersections represent points imaged onto the centre of each pixel

Points along these lineshave the same

LR displacement (disparity)

but• An object must fit into

the Common Field of View

• Clearly depth resolution increases as the object gets closer to the camera

• Distance, z = b f

p ddisparity

focal length

pixel size

Page 20: Stereo Vision John Morris Vision Research in CITR.

20

Stereo Vision

• Configuration parameters

− Intrinsic parameters

− Characterize the transformation from image plane coordinates to pixel coordinates in each camera

− Parameters intrinsic to each camera

− Extrinsic parameters (R, T)

− Describe the relative position and orientation of the two cameras

− Can be determined from the extrinsic parameters of each camera:

Page 21: Stereo Vision John Morris Vision Research in CITR.

21

Page 22: Stereo Vision John Morris Vision Research in CITR.

22

Correspondence Problem

• Why is the correspondence problem difficult?

− Some points in each image will have no corresponding points in the other image

− They are not binocularly visible or

− They are only monocularly visible

− Cameras have different fields of view

− Occlusions may be present

− A stereo system must be able to determine parts that should not be matched

These two are equivalent!

Page 23: Stereo Vision John Morris Vision Research in CITR.

23

Correspondence Problem

• Methods for establishing correspondences

− Two issues

− How to select candidate matches?

− How to determine the goodness of a match?

− Two main classes of correspondence (matching) algorithm:

− Correlation-based

− Attempt to establish a correspondence by matching image intensities – usually over a window of pixels in each image

Dense disparity maps

− Distance is found for all BV image points

− Except occluded (MV) points

− Feature-based

− Attempt to establish a correspondence by matching a sparse sets of image features – usually edges

− Disparity map is sparse

− Number of points is related to the number of image features identified

Page 24: Stereo Vision John Morris Vision Research in CITR.

24

Correlation-Based Methods

• Match image sub-windows in the two images using image correlation− oldest technique for finding correspondence between image pixels

• Scene points must have the same intensity in each image− Assumes

a) All objects are perfect Lambertian scatterers

ie the reflected intensity is not dependent on angle or objects scatter light uniformly in all directions

• Informally: matte surfaces only

b) Fronto-planar surfaces

− (Visible) surfaces of all objects are perpendicularto camera optical axes

Page 25: Stereo Vision John Morris Vision Research in CITR.

25

Correlation-Based Methods

Page 26: Stereo Vision John Morris Vision Research in CITR.

26

Correlation-Based Methods

• Usually, we normalize c(d) by dividing it by the standard deviation of both Il and Ir (normalized cross-correlation, c(d) [0,1])

where and are the average pixel values in the left and right windows.

• An alternative similarity measure is the sum of squared differences (SSD):

lI rI

• Experiment shows that the simpler sum of absolute differences (SAD) is just as good

c(d) = | Il(i+k,j+l) – Ir(i+k-d,j+l) |

Page 27: Stereo Vision John Morris Vision Research in CITR.

27

Correlation-Based Methods

• Problem:− Two cameras with slightly different viewpoints

− Electronic gain (‘contrast’) and dark noise (‘offset’) differ slightly Different maximum and minimum intensities Simple intensity matching fails

− Slightly different scattered intensities

− Scene objects are not perfect Lambertian scatterers

Page 28: Stereo Vision John Morris Vision Research in CITR.

28

Correlation-Based Methods

• Improvements− Do not use image intensity values,

use intensity gradients instead!

− One scheme calculates thresholded signed gradient magnitudes at each pixel

− Compute the gradient magnitude at each pixel in the two images without smoothing

− Map the gradient magnitude values into three values: -1, 0, 1 (by thresholding the gradient magnitude)

− More sensitive correlations are produced this way

+ several dozen moresee Scharstein & Szeliski, 2001 and the Middlebury web pages for a review

Many matching functions can be used with varying success!!

Page 29: Stereo Vision John Morris Vision Research in CITR.

29

Correlation-Based Methods

• Comments

− The success of correlation-based methods depends on whether the image window in one image exhibits a distinctive structure that occurs infrequently in the search region of the other image.

− How to choose the size of the window, W?

− too small a window

− may not capture enough image structure and

− may be too noise sensitive many false matches

− too large a window

− makes matching less sensitive to noise (desired) but also

− decreases precision(blurs disparity map)

− An adaptive searching window has been proposed

Page 30: Stereo Vision John Morris Vision Research in CITR.

30

Correlation-Based Methods

Input – Ground truth

3x3 windowToo noisy!

7x7 windowSharp edges are blurred!

Adaptive windowSharp edges and less noise

Page 31: Stereo Vision John Morris Vision Research in CITR.

31

Correlation-Based Methods

Page 32: Stereo Vision John Morris Vision Research in CITR.

32

Correlation-Based Methods

• Comments

− How to choose the size and location of R(pl)?

− if the distance of the fixating point from the cameras is much larger than the baseline, the location of R(pl) can be chosen to be the same as the location of pl

− the size of R(pl) can be estimated from the maximum range of distances we expect to find in the scene

− we will see that the search region can always be reduced to a line

Page 33: Stereo Vision John Morris Vision Research in CITR.

33

Feature-Based Methods

• Main idea

− Look for a feature in an image that matches a feature in the other.

− Typical features used are:

− edge points

− line segments

− corners (junctions)

Page 34: Stereo Vision John Morris Vision Research in CITR.

34

Feature-Based Methods

• A set of features is used for matching− a line feature descriptor, for example, could contain:

− length, l− orientation, − coordinates of the midpoint, m− average intensity along the line, i

• Similarity measures are based on matching feature descriptors:

where w0, ..., w3 are weights

(determining the weights that yield the best matches is a nontrivial task).

Page 35: Stereo Vision John Morris Vision Research in CITR.

35

Feature-Based Methods

Page 36: Stereo Vision John Morris Vision Research in CITR.

36

Correlation vs. feature-based approaches

• Correlation methods− Easier to implement

− Provide a dense disparity map (useful for reconstructing surfaces)

− Need textured images to work well (many false matches otherwise)

− Do not work well when viewpoints are very different, due to

− change in illumination direction

− Objects are not perfect (Lambertian) scatterers

− foreshortening

− perspective problem – surfaces are not fronto-planar

• Feature-based methods− Suitable when good features can be extracted from the scene

− Faster than correlation-based methods

− Provide sparse disparity maps

− OK for applications like visual navigation

− Relatively insensitive to illumination changes

Page 37: Stereo Vision John Morris Vision Research in CITR.

37

Other correspondence algorithms

• Dynamic programming (Gimel’Farb)− Finds a ‘path’ through an image which provides the least-cost

match

− Can allow for occlusions (Birchfield and Tomasi)

− Generally provide better results than area-based correlation

− Faster than correlation

• Graph Cut (Zabih et al)− Seems to provide best results

− Very slow

• Concurrent Stereo Matching− Examine all possible matches in parallel (Delmas, Gimel’Farb,

Morris, work in progress)

− Uses a model of image noise instead of arbitrary weights in cost functions

− Suitable for real-time parallel hardware implementation

Page 38: Stereo Vision John Morris Vision Research in CITR.

38

Other correspondence algorithms

… and many more!!• See the Middlebury Stereo page for examples and

performance comparisons:

vision.middlebury.edu/stereo/