Structure from Motion - PUC-Riovisao/Structure from Motion.pdf · from the mn correspondences p ij....

Structure from Motion

Raul Queiroz Feitosa

21-Jun-17 Structure from Motion 2

Objectives

State the Structure from Motion (SFM) problem,

Introduce SFM techniques for small relief (affine), and

Introduce SFM techniques for large relief (projective).


Contents

Introduction

Affine SFM

Projective SFM


Structure from motion

Automatic recovery of camera motion and scene

structure from two or more images.

also called automatic camera tracking or matchmoving.


Structure from motion

click here to

watch the

video

http://www.youtube.com/watch?v=i7ierVkXYa8




Applications of SFM

Computer vision:

multiple-view shape reconstruction,

novel view synthesis,

autonomous vehicle navigation.

…

Film production:

insertion of computer-generated imagery (CGI) into live-

action backgrounds,

…


Stereopsis × SFM

Stereopsis

Intrinsic camera parameters known (calibrated

cameras), and

Extrinsic camera parameters known (camera pose

relative to a fixed world coordinate system known)


Camera parameters are (partially) unknown and

possibly change over time.


SFM problem formulation

Given the images pij seen by m cameras of n fixed

3D points Pj (homogeneous)

estimate the m projection matrices Mi and the 3D

points Pj from the mn correspondences pij.



Contents:

Introduction

Affine SFM

Projective SFM


Affine approximation If scene's relief is small compared with distance zj to the camera

the perspective projection may be approximated by affine

models.

increasing focal length / distance from camera

center at

infinite


Affine model

For affine cameras (see chapter on camera models), the

previous equation takes the (nonhomogeneous) form

where Mi is the affine projection matrix, for i=1,…, m,

j=1,…,n.

Dropping i and j from the notation above, yields

iji

j

iij bPP

p

M

1

2×1 2×3 3×1 2×1

bPp

2

1

232221

131211

b

b

z

y

x

aaa

aaa

v

u

2×3 3×1 2×1


Affine model

Thus

involves 2mn equations in 8m+3n unknowns.

Hence, for a sufficient number of views (m) and a sufficient number of points (n), camera motion (Mi) and

scene structure (Pi) may be recovered.

iji

j

iij bPP

p

M

1

2×1 2×4 3×1 2×3 3×1 2×1


Affine ambiguity

Affine Ambiguity

If are solutions to the SFM affine problem, so are , where

and Q is an arbitrary affine transformation matrix, -

that is, it can be written as

,

where C is a nonsingular 3x3 matrix and d R3.

ji P and Mji P and M

1

1 and 1 jj

ii QQPP

MM

1

with 1

11

1

TTQQ

00

dd CCC


Affine ambiguity

Affine Ambiguity

The affine SFM problem can only be defined up to an

affine transformation ambiguity.

Taking into account the 12 parameters of the general

affine transformation, a solution may exist as long as

2mn ≥ 8m+3n-12

Thus, for 2 views, we need 4 point correspondences.

# of Mi

# of Pj


Offset to the center of mass

Translate the origin of the image coordinate system to the

center of mass pi0 of the observed points (pij → pij - pi0 ),

by doing

For simplicity, the origin of the world coordinate system

is moved to the centroid of the 3D points (Pj → Pj - P0 )

This yields

n

k

n

k

kjiikiiji

n

k

ikijijnnn 1 11

111PPbPbPppp

jiij Pp


The measurement matrixx

Let’s create a 2m × n data (measurement) matrix D:

The measurement matrix D = P must have rank 3!

P

D

n

mmnmm

n

n

PPP

ppp

ppp

ppp

21

2

1

21

22221

11211

cam

eras

(2 m

)

points (n)

2m×n 2m×3 3×n


Tomasi-Kanade factorization

Applying SVD to matrix D D=UWVT

Since rank(D)=3 (noise free)

U= W= VT=

Therefore

D= U3 W3 VT

3 = P

In practice, due to noise, inaccuracy in location of the interest points, and the mere fact that cameras are not affine, the matrix D is full rank.

U3 Un-3 W3 0

0 0

VT3.

VTn-3

2m×n 2m×3 3×3 3×n


Tomasi-Kanade factorization algorithm

1. Compute the singular value decomposition.

D=UWVT

2. Construct the matrices U3, V3 and W3 formed by the three leftmost columns of the matrices U and V (the ones

corresponding to the largest singular values), and the corresponding 3×3 sub matrix of W.

3. Define

0= U3 (W3)½

P0 = (W3)½

VT

3

the 2m×3 matrix is an estimate of the camera motion, and the 3×n matrix is an estimate of the scene structure.

It can be proven that W3 is the closest rank-3 approximation of D.


Recalling the affine ambiguity

The decomposition

0= U3 (W3)½

P0 = (W3)½

VT

3

is not unique.

For any 3×3 non singular matrix C

= 0 C and P = C-1 P0

is also a solution (affine ambiguity).

21-Jun-17 Geometric Camera Models 20

Affine camera models

Summary of affine mappings

p=M WP , where

Note that in all cases the 3rd row is =(0 0 0 1)!

1

2

1

0

y

T

x

T

t

t

r

r

M

1

2

1

0

y

T

x

T

t

t

r

r

M

1

2

1

0

y

T

x

T

t

t

r

r

M

11

0sin

0cot

2

1

0

y

T

x

T

t

t

r

r

M

orthographic - 5 dof scaled orthographic – 6 dof

weak-perspective - 7 dof general affine - 8 dof


Eliminating the affine ambiguity

From a previous slide

the image coordinates pij are the projections of Pij, on the vectors aT

i1 and aTi2 , the rows of i.

Consider scaled orthographic projection: image

axes are perpendicular.

jiij Pp

2211

21

2211

21 0

ˆˆˆˆ

0ˆˆ

i

TT

ii

TT

i

i

TT

i

i

T

ii

T

i

i

T

i

aaaa

aa

aaaa

aa

CCCC

CC


Eliminating the affine ambiguity Scaled orthographic: image axes are perpendicular

This translates into 2m quadratic equations in the elements of C

Solve for C

Update and P → = 0C and P = C-1 P0

Note that the solution is defined up to an arbitrary rotation.

Common practice: → first camera frame ≡ world frame.

p

P i1a

2ˆ

ia

2211

21

2211

21 0

ˆˆˆˆ

0ˆˆ

i

TT

ii

TT

i

i

TT

i

i

T

ii

T

i

i

T

i

aaaa

aa

aaaa

aa

CCCC

CC


Tomasi-Kanade algorithm summary

Given: m images and n features pij

For each image i, center the feature coordinates

Construct a 2m × n measurement matrix D:

Column j has the projection of point Pj in all views

Row i has one coordinate of the projections of all n points in image i

Factorize D :

Compute SVD: D = U W VT

Create U3 by taking the first 3 columns of U

Create V3 by taking the first 3 columns of V

Create W3 by taking the upper left 3 × 3 block of W

Create the motion and shape matrices:

0 = U3W3½ and P0 = W3

½ V3T (or 0 = U3 and P0 = W3V3

T)

Eliminate affine ambiguity.

Source: M. Hebert


Reconstruction results

Figures from

C. Tomasi and T. Kanade.

Shape and motion from

image streams under

orthography: A

factorization method.

International Journal of

Computer Vision, 9(2):137-

154, November 1992.

http://www.eecs.berkeley.edu/~yang/courses/cs294-6/papers/TomasiC_Shape and motion from image streams under orthography.pdf





Contents

Introduction

Affine SFM

Projective SFM


Projective model

For projective cameras we have

for i=1,…,m and j=1,…,n where mTi1, m

Ti2, m

Ti3, are the

rows of a 3 ×4 matrix Mi and Pj denotes the homo-

geneous coordinate vector of the point Pj in some world

coordinate system.

or 1

13

2

3

1

ji

ji

ij

ji

ji

ij

ji

ij

ij

ij

ij

v

u

zv

u

Pm

Pm

Pm

Pm

Pp M


Projective ambiguity

According to Theorem 1 (see chapter on Camera Model) Mi is an arbitrary rank-3 3×4 matrix.

Hence, if Mi and Pj are solutions for the equation

above, so are

M’i =Mi Q and P’j =Q-1 Pj ,

where Q is an arbitrary projective transformation matrix,

i.e., any non singular 4×4 matrix.

or 1

13

2

3

1

ji

ji

ij

ji

ji

ij

ji

ij

ij

ij

ij

v

u

zv

u

Pm

Pm

Pm

Pm

Pp M


Projective ambiguity

The 4×4 matrix Q is defined up to a scale, since multiplying it

by a nonzero scalar amounts to applying inverse scaling to Mi

and Pj.

Thus, the projective SFM problem can only be defined up to a

projective transformation ambiguity.

Taking into account the projective ambiguity the problem has a

finite number of solutions as soon as

2mn ≥ 11m+3n-15

For two views, we need seven point correspondences.

# of Mi

# of Pj

Bilinear projection SFM

As in algebraic reconstruction, we do

So, we look for a solution that minimizes

This can be done by alternating steps where Pj are kept constant (estimated) while Mi are estimated

(kept constant).


zij pij = Mi Pj pij Mi Pj = 0

2

ij

jiijE Pp M


Bundle Adjustment

Given estimates for the matrices Mi (i=1,…,m) and vectors

Pj (j=1,…,n) , refine them by using non linear least square

to minimize the global reprojection error E:

Expensive, but encompasses a physically significant error

measure.

As initial guesses use Pj from Affine SFM e and from them compute Mi.

ji ji

ji

ij

ji

ji

ij vumn

E,

2

3

2

2

3

11

Pm

Pm

Pm

Pm


Eliminating the projective ambiguity

Let us assume that Mi (i=1,…,m) and vectors Pj (j=1,…,n)

have been estimated.

If Mi and Pj are a reconstruction in a particular Euclidean

coordinate system, there exist a 4×4 matrix Q such that

The next slides show two methods of computing the Euclidean upgrade matrix Q, when (some of) the intrinsic

parameters are known.

iiii PP1ˆ and ˆ QQMM

ˆ ˆ



Method 1 (calibrated cameras):

Mi is defined up to a scale, so

where ρi accounts for the scale and Ki is the calibration

matrix.

Writing Q=(Q3 q4), where Q3 is 4×3 matrix and q4 is

a vector in R4 , yields

If Ki is known (calibrated), can

be treated as a scaled rotation matrix, Thus ….

)( iiiii t KM

3 iiii KQM

ˆ

ˆ

3

1

iiii QMK



Method 1 (cont.): … the rows (j=1,2,3) of are perpendicular vectors,

so

where Q3 is defined up to a scale. To determine it uniquely we assume that the world coordinate system coincides with the first camera’s frame, that means, =[Id 0].=

.

T

ijm

0

0

0

0

0

33332332

23321331

1333

3332

2331

i

TT

ii

TT

i

i

TT

ii

TT

i

i

TT

i

i

TT

i

i

TT

i

QQQQ

QQQQ

QQ

QQ

QQ

mmmm

mmmm

mm

mm

mm

ii MK 1

ii MK 1



Method 1 (cont.):

Given m images, we obtain 5(m-1) quadratic equations in the coefficients of Q that can be solved using non

linear least squares.

The vector q4 of matrix Q

Q=(Q3 q4),

can be determined by assuming (arbitrarily) that the

origin of world coordinate system and the origin of the

first camera’s frame coincide.



Method 2:

The previous equations are linear in the 10 coefficients of the symmetric matrix =Q3Q3

T (linear least

squares)

To enforce the rank 3 of , we do

=UΛUT,

where Λ is the diagonal matrix with the eigenvalues of and U is the orthogonal matrix formed by its

eigenvectors.



Method 2 (cont.):

To enforce rank 3 of we do

Q=U3Λ3½

where U3 is formed by the 3 columns of U associated to

the 3 largest eigenvalues of and Λ3 is the

corresponding sub matrix of Λ.

=

Vector q4 can be computed as in the previous method.



These methods can be adapted to the case where only some

of the intrinsic camera parameters are known.

Since i are orthogonal matrices

each image provides a set of constraints between the entries of i and .

2 T

iii

T

ii KKMM



Assuming, for example that the center of the image is known for

each camera, we can take u0=v0=0 and

Each zero entry of provides 2 independent linear equations

in the 10 coefficients of the 4×4 matrix . With m ≥ 5 images, these

parameters can be estimated via linear least squares.

What if zero skew can be assumed?

T

i iKK

0

0

32

31

i

T

i

i

T

i

mm

mm

100

0sin

1

sin

cos

0sin

cos

sin

1

2

2

2

2

2

2

2

2

i

i

i

iii

i

iii

i

i

T

i i

KK


Readings

C. Tomasi and T. Kanade. Shape and motion from image streams under

orthography: A factorization method. International Journal of Computer

Vision, 9(2):137-154, November 1992.

F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting,

Modeling, and Matching Video Clips Containing Multiple Moving Objects.

PAMI 2007.

Muhamad, S. and Hebert, M. (2000), Iterative Projective Reconstruction

from Multiple Views, Proc. IEEE Conference on Computer Vision and

Pattern Recognition, 2000, pp. II, 430-437.

Computer Vision - A modern approach 2nd ED , by D. Forsyth and J.

Ponce, 2012, chapter 8.



http://www-cvr.ai.uiuc.edu/ponce_grp/publication/paper/pami06.pdf

http://www-cvr.ai.uiuc.edu/ponce_grp/publication/paper/pami06.pdf


Next Topic

Kalman Filter

Structure from Motion - PUC-Riovisao/Structure from Motion.pdf · from the mn correspondences p ij....

Documents

Transcript of Structure from Motion - PUC-Riovisao/Structure from Motion.pdf · from the mn correspondences p ij....