Structure from Motion - PUC-Riovisao/Structure from Motion.pdf · from the mn correspondences p ij....
Transcript of Structure from Motion - PUC-Riovisao/Structure from Motion.pdf · from the mn correspondences p ij....
Structure from Motion
Raul Queiroz Feitosa
21-Jun-17 Structure from Motion 2
Objectives
State the Structure from Motion (SFM) problem,
Introduce SFM techniques for small relief (affine), and
Introduce SFM techniques for large relief (projective).
21-Jun-17 Structure from Motion 3
Contents
Introduction
Affine SFM
Projective SFM
21-Jun-17 Structure from Motion 4
Structure from motion
Automatic recovery of camera motion and scene
structure from two or more images.
also called automatic camera tracking or matchmoving.
21-Jun-17 Structure from Motion 5
Structure from motion
click here to
watch the
video
21-Jun-17 Structure from Motion 6
Applications of SFM
Computer vision:
multiple-view shape reconstruction,
novel view synthesis,
autonomous vehicle navigation.
…
Film production:
insertion of computer-generated imagery (CGI) into live-
action backgrounds,
…
21-Jun-17 Structure from Motion 7
Stereopsis × SFM
Stereopsis
Intrinsic camera parameters known (calibrated
cameras), and
Extrinsic camera parameters known (camera pose
relative to a fixed world coordinate system known)
Structure from Motion
Camera parameters are (partially) unknown and
possibly change over time.
21-Jun-17 Structure from Motion 8
SFM problem formulation
Given the images pij seen by m cameras of n fixed
3D points Pj (homogeneous)
estimate the m projection matrices Mi and the 3D
points Pj from the mn correspondences pij.
21-Jun-17 Structure from Motion 9
Structure from Motion
Contents:
Introduction
Affine SFM
Projective SFM
21-Jun-17 Structure from Motion 10
Affine approximation If scene's relief is small compared with distance zj to the camera
the perspective projection may be approximated by affine
models.
increasing focal length / distance from camera
center at
infinite
21-Jun-17 Structure from Motion 11
Affine model
For affine cameras (see chapter on camera models), the
previous equation takes the (nonhomogeneous) form
where Mi is the affine projection matrix, for i=1,…, m,
j=1,…,n.
Dropping i and j from the notation above, yields
iji
j
iij bPP
p
M
1
2×1 2×3 3×1 2×1
bPp
2
1
232221
131211
b
b
z
y
x
aaa
aaa
v
u
2×3 3×1 2×1
21-Jun-17 Structure from Motion 12
Affine model
Thus
involves 2mn equations in 8m+3n unknowns.
Hence, for a sufficient number of views (m) and a sufficient number of points (n), camera motion (Mi) and
scene structure (Pi) may be recovered.
iji
j
iij bPP
p
M
1
2×1 2×4 3×1 2×3 3×1 2×1
21-Jun-17 Structure from Motion 13
Affine ambiguity
Affine Ambiguity
If are solutions to the SFM affine problem, so are , where
and Q is an arbitrary affine transformation matrix, -
that is, it can be written as
,
where C is a nonsingular 3x3 matrix and d R3.
ji P and Mji P and M
1
1 and 1 jj
ii QQPP
MM
1
with 1
11
1
TTQQ
00
dd CCC
21-Jun-17 Structure from Motion 14
Affine ambiguity
Affine Ambiguity
The affine SFM problem can only be defined up to an
affine transformation ambiguity.
Taking into account the 12 parameters of the general
affine transformation, a solution may exist as long as
2mn ≥ 8m+3n-12
Thus, for 2 views, we need 4 point correspondences.
# of Mi
# of Pj
21-Jun-17 Structure from Motion 15
Offset to the center of mass
Translate the origin of the image coordinate system to the
center of mass pi0 of the observed points (pij → pij - pi0 ),
by doing
For simplicity, the origin of the world coordinate system
is moved to the centroid of the 3D points (Pj → Pj - P0 )
This yields
n
k
n
k
kjiikiiji
n
k
ikijijnnn 1 11
111PPbPbPppp
jiij Pp
21-Jun-17 Structure from Motion 16
The measurement matrixx
Let’s create a 2m × n data (measurement) matrix D:
The measurement matrix D = P must have rank 3!
P
D
n
mmnmm
n
n
PPP
ppp
ppp
ppp
21
2
1
21
22221
11211
cam
eras
(2 m
)
points (n)
2m×n 2m×3 3×n
21-Jun-17 Structure from Motion 17
Tomasi-Kanade factorization
Applying SVD to matrix D D=UWVT
Since rank(D)=3 (noise free)
U= W= VT=
Therefore
D= U3 W3 VT
3 = P
In practice, due to noise, inaccuracy in location of the interest points, and the mere fact that cameras are not affine, the matrix D is full rank.
U3 Un-3 W3 0
0 0
VT3.
VTn-3
2m×n 2m×3 3×3 3×n
21-Jun-17 Structure from Motion 18
Tomasi-Kanade factorization algorithm
1. Compute the singular value decomposition.
D=UWVT
2. Construct the matrices U3, V3 and W3 formed by the three leftmost columns of the matrices U and V (the ones
corresponding to the largest singular values), and the corresponding 3×3 sub matrix of W.
3. Define
0= U3 (W3)½
P0 = (W3)½
VT
3
the 2m×3 matrix is an estimate of the camera motion, and the 3×n matrix is an estimate of the scene structure.
It can be proven that W3 is the closest rank-3 approximation of D.
21-Jun-17 Structure from Motion 19
Recalling the affine ambiguity
The decomposition
0= U3 (W3)½
P0 = (W3)½
VT
3
is not unique.
For any 3×3 non singular matrix C
= 0 C and P = C-1 P0
is also a solution (affine ambiguity).
21-Jun-17 Geometric Camera Models 20
Affine camera models
Summary of affine mappings
p=M WP , where
Note that in all cases the 3rd row is =(0 0 0 1)!
1
2
1
0
y
T
x
T
t
t
r
r
M
1
2
1
0
y
T
x
T
t
t
r
r
M
1
2
1
0
y
T
x
T
t
t
r
r
M
11
0sin
0cot
2
1
0
y
T
x
T
t
t
r
r
M
orthographic - 5 dof scaled orthographic – 6 dof
weak-perspective - 7 dof general affine - 8 dof
21-Jun-17 Structure from Motion 21
Eliminating the affine ambiguity
From a previous slide
the image coordinates pij are the projections of Pij, on the vectors aT
i1 and aTi2 , the rows of i.
Consider scaled orthographic projection: image
axes are perpendicular.
jiij Pp
2211
21
2211
21 0
ˆˆˆˆ
0ˆˆ
i
TT
ii
TT
i
i
TT
i
i
T
ii
T
i
i
T
i
aaaa
aa
aaaa
aa
CCCC
CC
21-Jun-17 Structure from Motion 22
Eliminating the affine ambiguity Scaled orthographic: image axes are perpendicular
This translates into 2m quadratic equations in the elements of C
Solve for C
Update and P → = 0C and P = C-1 P0
Note that the solution is defined up to an arbitrary rotation.
Common practice: → first camera frame ≡ world frame.
p
P i1a
2ˆ
ia
2211
21
2211
21 0
ˆˆˆˆ
0ˆˆ
i
TT
ii
TT
i
i
TT
i
i
T
ii
T
i
i
T
i
aaaa
aa
aaaa
aa
CCCC
CC
21-Jun-17 Structure from Motion 23
Tomasi-Kanade algorithm summary
Given: m images and n features pij
For each image i, center the feature coordinates
Construct a 2m × n measurement matrix D:
Column j has the projection of point Pj in all views
Row i has one coordinate of the projections of all n points in image i
Factorize D :
Compute SVD: D = U W VT
Create U3 by taking the first 3 columns of U
Create V3 by taking the first 3 columns of V
Create W3 by taking the upper left 3 × 3 block of W
Create the motion and shape matrices:
0 = U3W3½ and P0 = W3
½ V3T (or 0 = U3 and P0 = W3V3
T)
Eliminate affine ambiguity.
Source: M. Hebert
21-Jun-17 Structure from Motion 24
Reconstruction results
Figures from
C. Tomasi and T. Kanade.
Shape and motion from
image streams under
orthography: A
factorization method.
International Journal of
Computer Vision, 9(2):137-
154, November 1992.
21-Jun-17 Structure from Motion 25
Contents
Introduction
Affine SFM
Projective SFM
21-Jun-17 Structure from Motion 26
Projective model
For projective cameras we have
for i=1,…,m and j=1,…,n where mTi1, m
Ti2, m
Ti3, are the
rows of a 3 ×4 matrix Mi and Pj denotes the homo-
geneous coordinate vector of the point Pj in some world
coordinate system.
or 1
13
2
3
1
ji
ji
ij
ji
ji
ij
ji
ij
ij
ij
ij
v
u
zv
u
Pm
Pm
Pm
Pm
Pp M
21-Jun-17 Structure from Motion 27
Projective ambiguity
According to Theorem 1 (see chapter on Camera Model) Mi is an arbitrary rank-3 3×4 matrix.
Hence, if Mi and Pj are solutions for the equation
above, so are
M’i =Mi Q and P’j =Q-1 Pj ,
where Q is an arbitrary projective transformation matrix,
i.e., any non singular 4×4 matrix.
or 1
13
2
3
1
ji
ji
ij
ji
ji
ij
ji
ij
ij
ij
ij
v
u
zv
u
Pm
Pm
Pm
Pm
Pp M
21-Jun-17 Structure from Motion 28
Projective ambiguity
The 4×4 matrix Q is defined up to a scale, since multiplying it
by a nonzero scalar amounts to applying inverse scaling to Mi
and Pj.
Thus, the projective SFM problem can only be defined up to a
projective transformation ambiguity.
Taking into account the projective ambiguity the problem has a
finite number of solutions as soon as
2mn ≥ 11m+3n-15
For two views, we need seven point correspondences.
# of Mi
# of Pj
Bilinear projection SFM
As in algebraic reconstruction, we do
So, we look for a solution that minimizes
This can be done by alternating steps where Pj are kept constant (estimated) while Mi are estimated
(kept constant).
21-Jun-17 Structure from Motion 29
zij pij = Mi Pj pij Mi Pj = 0
2
ij
jiijE Pp M
21-Jun-17 Structure from Motion 30
Bundle Adjustment
Given estimates for the matrices Mi (i=1,…,m) and vectors
Pj (j=1,…,n) , refine them by using non linear least square
to minimize the global reprojection error E:
Expensive, but encompasses a physically significant error
measure.
As initial guesses use Pj from Affine SFM e and from them compute Mi.
ji ji
ji
ij
ji
ji
ij vumn
E,
2
3
2
2
3
11
Pm
Pm
Pm
Pm
21-Jun-17 Structure from Motion 31
Eliminating the projective ambiguity
Let us assume that Mi (i=1,…,m) and vectors Pj (j=1,…,n)
have been estimated.
If Mi and Pj are a reconstruction in a particular Euclidean
coordinate system, there exist a 4×4 matrix Q such that
The next slides show two methods of computing the Euclidean upgrade matrix Q, when (some of) the intrinsic
parameters are known.
iiii PP1ˆ and ˆ QQMM
ˆ ˆ
21-Jun-17 Structure from Motion 32
Eliminating the projective ambiguity
Method 1 (calibrated cameras):
Mi is defined up to a scale, so
where ρi accounts for the scale and Ki is the calibration
matrix.
Writing Q=(Q3 q4), where Q3 is 4×3 matrix and q4 is
a vector in R4 , yields
If Ki is known (calibrated), can
be treated as a scaled rotation matrix, Thus ….
)( iiiii t KM
3 iiii KQM
ˆ
ˆ
3
1
iiii QMK
21-Jun-17 Structure from Motion 33
Eliminating the projective ambiguity
Method 1 (cont.): … the rows (j=1,2,3) of are perpendicular vectors,
so
where Q3 is defined up to a scale. To determine it uniquely we assume that the world coordinate system coincides with the first camera’s frame, that means, =[Id 0].=
.
T
ijm
0
0
0
0
0
33332332
23321331
1333
3332
2331
i
TT
ii
TT
i
i
TT
ii
TT
i
i
TT
i
i
TT
i
i
TT
i
QQQQ
QQQQ
mmmm
mmmm
mm
mm
mm
ii MK 1
ii MK 1
21-Jun-17 Structure from Motion 34
Eliminating the projective ambiguity
Method 1 (cont.):
Given m images, we obtain 5(m-1) quadratic equations in the coefficients of Q that can be solved using non
linear least squares.
The vector q4 of matrix Q
Q=(Q3 q4),
can be determined by assuming (arbitrarily) that the
origin of world coordinate system and the origin of the
first camera’s frame coincide.
21-Jun-17 Structure from Motion 35
Eliminating the projective ambiguity
Method 2:
The previous equations are linear in the 10 coefficients of the symmetric matrix =Q3Q3
T (linear least
squares)
To enforce the rank 3 of , we do
=UΛUT,
where Λ is the diagonal matrix with the eigenvalues of and U is the orthogonal matrix formed by its
eigenvectors.
21-Jun-17 Structure from Motion 36
Eliminating the projective ambiguity
Method 2 (cont.):
To enforce rank 3 of we do
Q=U3Λ3½
where U3 is formed by the 3 columns of U associated to
the 3 largest eigenvalues of and Λ3 is the
corresponding sub matrix of Λ.
=
Vector q4 can be computed as in the previous method.
21-Jun-17 Structure from Motion 37
Eliminating the projective ambiguity
These methods can be adapted to the case where only some
of the intrinsic camera parameters are known.
Since i are orthogonal matrices
each image provides a set of constraints between the entries of i and .
2 T
iii
T
ii KKMM
21-Jun-17 Structure from Motion 38
Eliminating the projective ambiguity
Assuming, for example that the center of the image is known for
each camera, we can take u0=v0=0 and
Each zero entry of provides 2 independent linear equations
in the 10 coefficients of the 4×4 matrix . With m ≥ 5 images, these
parameters can be estimated via linear least squares.
What if zero skew can be assumed?
T
i iKK
0
0
32
31
i
T
i
i
T
i
mm
mm
100
0sin
1
sin
cos
0sin
cos
sin
1
2
2
2
2
2
2
2
2
i
i
i
iii
i
iii
i
i
T
i i
KK
21-Jun-17 Structure from Motion 39
Readings
C. Tomasi and T. Kanade. Shape and motion from image streams under
orthography: A factorization method. International Journal of Computer
Vision, 9(2):137-154, November 1992.
F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting,
Modeling, and Matching Video Clips Containing Multiple Moving Objects.
PAMI 2007.
Muhamad, S. and Hebert, M. (2000), Iterative Projective Reconstruction
from Multiple Views, Proc. IEEE Conference on Computer Vision and
Pattern Recognition, 2000, pp. II, 430-437.
Computer Vision - A modern approach 2nd ED , by D. Forsyth and J.
Ponce, 2012, chapter 8.
21-Jun-17 Structure from Motion 40
Next Topic
Kalman Filter