What is Visual Odometry - UCSBlbmedia.ece.ucsb.edu/.../Lecture6-visual-odometry.pdf · Visual...

What is Visual Odometry

• 视觉里程计(Visual odometry)起初主要应

用在机器人领域中，用于解决移动机器人在未知环境中的自主定位和导航问题

• 它的核心功能是分析采集的图片序列，并由此确定相机的当前位置和姿态。根据每一帧的相机姿态，可以得到整个系统的轨迹图

baseline

Epipolar plane

Epipolar holes

Epipolar lines

The epipolar geometry

C,C’,x,x’ and X are coplanar


All points on p project on l and l’


Family of planes p and lines l and l’

Intersection in e and e’


epipoles e,e’

= intersection of baseline with image plane

= projection of projection center in other image

an epipolar plane = plane containing baseline (1-D family)

an epipolar line = intersection of epipolar plane with image

(always come in corresponding pairs)

Example: converging cameras

Example: motion parallel with image plane

Example: forward motion

e

e’

Essential Matrix and Fundamental Matrix

两个相机拍摄同一场景获得的左右两幅图像对应的点之间的关系可以

通过Essential矩阵或者Fundamental矩阵来表明。

Matrix form of cross product

bab

aa

aa

aa

baba

baba

baba

ba

0

0

0

12

13

23

1221

3113

2332

0)(

0)(

bab

baa

Calibrated Camera

T

T

vup

vupRptp

)1,','('

)1,,( with 0)](['

0' Epp

' 0 with p Ep E t R SR Essential matrix

Uncalibrated Camera

0 ' pEp

scoordinate camerain ' and toingcorrespond scoordinate pixelin points ' and pppp

p'MppMp '' and 1

int

1

int

with

0'

1

int

'

int

MEMF

F pp

T

T

Fundamental matrix

Properties of fundamental and

essential matrix

• Matrix is 3 x 3

• Transpose : If F is essential matrix of cameras (P, P’).

FT is essential matrix of camera (P’,P)

• Epipolar lines: Think of p and p’ as points in the projective

plane then F p is projective line in the right image.

That is l’=F p l = FT p’

• Encodes information of the extrinisic parameters only

Least square approach

1|| constraint under the

) Minimize

2

2

1

F

Fp'(p i

n

i

i

We have a homogeneous system A f =0

The least square solution is smallest singular value of A,

i.e. the last column of V in SVD of A = U D VT

3D Reconstruction

• Stereo: we know the viewing geometry (extrinsic parameters) and the intrinsic parameters: Find correspondences exploiting epipolar geometry, then reconstruct

• Structure from motion (with calibrated cameras): Find correspondences, then estimate extrinsic parameters (rotation and direction of translation), then reconstruct.

• Uncalibrated cameras: Find correspondences,

Compute projection matrices (up to a projective transformation), then reconstruct up to a projective transformation.

Point reconstruction

M Xx XM 'x'

Geometric error

Reconstruct matches in projective frame

by minimizing the reprojection error

Non-iterative optimal solution

Reconstruction for intrinsically

calibrated cameras

• Compute the essential matrix E using normalized points.

• Select M=[I|0] M’=[R|T] then E=[Tx]R

• Find T and R using SVD of E

Reconstruction from uncalibrated

cameras Reconstruction problem:

given xi↔x‘i , compute M,M‘ and Xi

ii M Xx ii XMx for all i

without additional information possible

only up to projective ambiguity

Projective Reconstruction

Theorem

• Assume we determine matching points xi and xi’. Then we can compute a unique Fundamental matrix F.

• The camera matrices M, M’ cannot be recovered uniquely

• Thus the reconstruction (Xi) is not unique

• There exists a projective transformation H such that

11'

2'1

12,,1,2 HMMHMMHXX ii

Reconstruction ambiguity:

projective

iii XHM HM Xx P

-1

P

Structure from Motion

or

Simultaneous Localization and Mapping (SLAM)

or

Visual Odometry

Camera calibration

• Determine camera parameters from known

3D points or calibration object(s)

1. internal or intrinsic parameters such as

focal length, optical center, aspect ratio:

what kind of camera?

2. external or extrinsic (pose)

parameters:

where is the camera?

• How can we do this?

Coordinate Systems

• World Coordinate System: It’s a known

reference coordinate system with respect to

which we calibrate the camera.

• Camera Coordinate System: It’s a

coordinate system with its origin at the

optical center of the camera.

• Pixel Coordinate System

Camera:Geometry Involved

• Mathematical Definition: A camera is a mapping

between a 3D world (object space) and a 2D

image.

• Calibration: The objective of calibration is to

calculate the intrinsic and/or extrinsic parameters

of a camera given a set of images taken using the

camera.

Camera Models

• Perspective:

• Orthographic:

Note: We will deal only with perspective projection

z

xfx '

z

yfy '

xx ' yy '

Intrinsic Parameters

• Let be the coordinates of a point in 3D. Its projection on

the image plane is given by:

),,( zyx

Pixel Square Camera, Ideal

z

yfv

z

xfu

Pixelr Rectangula Camera, Ideal

z

ylfv

z

xkfu

Center Displaced

0

0

vz

ylfv

uz

xkfu

Axes Coordinate Rotated

0

0

)sin(

)cot(

vz

yv

uz

y

z

xu

),( vu

Intrinsic Parameters

KPz

p1

1

v

u

p

1

z

y

x

P

0100

0)sin(

0

0)cot(

0

0

v

u

K

Hence, the 5 intrinsic parameters of a camera are:

θ β α vu 00

Extrinsic Parameters

• The camera frame ( C ) can be different from the world frame (W).

Vectoron translati3x1 t

MatrixRotation 3x3 R

1101

wc PtRP

c

c

c

c

z

y

x

P

w

w

w

w

z

y

x

P

T

T

T

r

r

r

R

3

2

1

z

y

x

t

t

t

t

Hence, we have 6 extrinsic

Parameters

Perspective Projection Matrix

• M = 3x4 matrix. Taking into consideration both the intrinsic

and extrinsic parameters:

z

T

zy

TT

zyx

TTT

tr

tvtrvr

tuttrurr

tRKM

3

0302

03021

)sin()sin(

)cot()cot(

MPz

p1

Pm

Pmv

Pm

Pmu

.

.

.

.

3

2

3

1

Rotation and Translation

Camera matrix

• Fold intrinsic calibration matrix K and

extrinsic pose parameters (R,t) together into

a camera matrix

• M = K [R | t ]

• (put 1 in lower r.h. corner for 11 d.o.f.)

Camera matrix calibration

• Directly estimate 11 unknowns in the M

matrix using known 3D points (Xi,Yi,Zi) and

measured feature positions (ui,vi)

Camera matrix calibration

• Linear regression:

– Bring denominator over, solve set of (over-

determined) linear equations. How?

– Least squares (pseudo-inverse)

Projective structure from motion

• Given: m images of n fixed 3D points

• xij = Pi Xj , i = 1,… , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and n 3D points Xj from the mn corresponding points xij

x1j

x2j

x3j

Xj

P1

P2

P3

Slides from Lana Lazebnik

Bundle adjustment • Non-linear method for refining structure and motion

• Minimizing reprojection error

2

1 1

,),(

m

i

n

j

jiijDE XPxXP

x1j

x2j

x3

j

Xj

P1

P2

P3

P1Xj

P2Xj

P3Xj

Existing attempts : SLAM

• SLAM : Simultaneous Localization and Mapping

can use many different types of sensor to acquire observation data used in building the

map such as laser rangefinders, sonar sensors and cameras.

– Well-established in robotics (using a rich array of sensors)

– Demonstrated with a single hand-held camera by Davison at 2003 (Mono-

SLAM).

– Mono-SLAM was applied to

AR system at 2004.

Existing attempts : Model based tracking

• Model-based tracking is

– More robust

– More accurate

– Proposed by Lepetit et. al.

at ISMAR 2003

Frame by Frame SLAM • Why?

is SLAM fundamentally harder?

Time

One frame

Find features

Draw graphics

Update camera pose and entire map Many DOF

Frame by Frame SLAM • SLAM

– Updating entire map every frame is so

expensive!!!

– Needs “sparse map of high-quality features”

- A. Davison

• Proposed approach

– Use dense map (of low quality features)

– Don’t update the map every frame : Keyframes

– Split the tracking and mapping into two

threads

Parallel Tracking and Mapping

• Proposed method - Split the tracking and mapping into two threads

Time

One frame

Find features

Draw graphics

Update camera pose only Simple & easy

Thread #2 Mapping

Thread #1 Tracking

Update map

Parallel Tracking and Mapping

Tracking thread:

• Responsible estimation of camera

pose and rendering augmented

graphics

• Must run at 30 Hz

• Make as robust and accurate as

possible

Mapping thread:

• Responsible for providing the

map

• Can take lots of time per key

frame

• Make as rich and accurate as

possible

Tracking thread • Overall flow

Pre-process frame

Project points

Measure points

Update Camera Pose

Project points

Measure points

Update Camera Pose

Draw Graphics

Coarse stage Fine stage

Map

Pre-process frame

• Make for pyramid levels

640x480 320x240 160x120 80x60

Pre-process frame • Make for pyramid levels

• Detect Fast corners

– E. Rosten et al (ECCV 2006)

640x480 320x240 160x120 80x60

Project Points

• Use motion model to update camera pose

– Constant velocity model

Vt =(Pt – Pt-1)/∇t

Estimated current Pt+1

Previous pos Pt

Previous pos Pt-1

Pt+1=Pt+∇t’(Vt)

∇t

∇t’

Project Points

• Choose subset to measure

– ~ 50 biggest features for coarse stage

– 1000 randomly selected for fine stage

640x480 320x240 160x120 80x60

~50 1000

Measure Points • Generate 8x8 matching template (warped

from source key-frame:map)

• Search a fixed radius around projected

position

– Use Zero-mean SSD

– Only search at Fast corner

points

Update caemra pose • 6-DOF problem

– Obtain by SFM (Three-point algorithm)

?

Mapping thread • Overall flow

Stereo Initialization

Wait for new key frame

Add new map points

Optimize map

Map maintenance

Tracker

Stereo Initialization

• Use five-point-pose algorithm

– D. Nister et. al. 2006

• Requires a pair of frames and feature correspondences

• Provides initial map

• User input required:

– Two clicks for two key-frames

– Smooth motion for feature correspondence

Wait for new key frame

• Key frames are only added if :

– There is a sufficient baseline to the other key frame

– Tracking quality is good

– Keyframe (4 level pyramid images and its corners)

• When a key frame is added :

– The mapping thread stops whatever it is doing

– All points in the map are measured in the keyframe

– New map points are found and added to the map

Add new map points • Want as many map points as possible

• Check all maximal FAST corners in the key

frame :

– Check score

– Check if already in map

• Epipolar search in a neighboring key frame

• Triangulate matches and add to map

• Repeat in four image pyramid levels

Optimize map

• Use batch SFM method: Bundle Adjustment

• Adjusts map point positions and key frame

poses

• Minimize reprojection error of all points in all

keyframes (or use only last N key frames)

Map maintenance

• When camera is not exploring, mapping thread

has idle time

• Data association in bundle adjustment is

reversible

• Re-attempt outlier measurements

• Try measure new map features in all old key

frames

Comparison to EKF-SLAM • More Accurate

• More robust

• Faster tracking

<

SLAM based AR Proposed AR

System and Results • Environment

– Desktop PC (Intel Core 2 Duo 2.66 GHz)

– OS : Linux

– Language : C++

• Tracking speed

Total 19.2 ms

Key frame preparation 2.2 ms

Feature Projection 3.5 ms

Patch search 9.8 ms

Iterative pose update 3.7 ms

System and Results • Mapping scalability and speed

– Practical limit

• 150 key frames

• 6000 points

– Bundle adjustment timing

Key frames 2-49 50-99 100-149

Local Bundle Adjustment 170 ms 270 ms 440 ms

Global Bundle Adjustment 380 ms 1.7 s 6.9 s

Demonstration

Remaining problem

• Outlier management

• Still brittle in some scenario

– Repeated texture

– Passive stereo initialization

• Occlusion problem

• Relocation problem

What is Visual Odometry - UCSBlbmedia.ece.ucsb.edu/.../Lecture6-visual-odometry.pdf · Visual...

Documents

Transcript of What is Visual Odometry - UCSBlbmedia.ece.ucsb.edu/.../Lecture6-visual-odometry.pdf · Visual...