3D from Pictures

50
3D from Pictures Jiajun Zhu Sept.29 2006 University of Virginia

description

3D from Pictures. Jiajun Zhu Sept.29 2006 University of Virginia. What can we compute from a collection of pictures?. - 3D structure - camera poses and parameters. One of the most important / exciting results in computer vision from 90s’. - PowerPoint PPT Presentation

Transcript of 3D from Pictures

Page 1: 3D from Pictures

3D from Pictures

Jiajun ZhuSept.29 2006

University of Virginia

Page 2: 3D from Pictures

What can we compute from a collection of

pictures?

Page 3: 3D from Pictures

- 3D structure- camera poses and

parameters

Page 4: 3D from Pictures

One of the most important / exciting results in computer vision from 90s’

It is difficult, largely due to numerical computation in practice.

Page 5: 3D from Pictures

But this is SO powerful!!!

2 SIGGRAPH papers with several sketches this year!

show a few demo videos

Page 6: 3D from Pictures

Now let’s see how this works!

Input: (1) A collection of pictures.

Output:(1) camera parameters(2) sparse 3D scene

structure

Page 7: 3D from Pictures

Consider 1 camera first

What’s the relation between pixels and rays in space?

Page 8: 3D from Pictures

10100

ZYX

ff

ZfYfX

1010101

1ZYX

ff

ZfYfX PXx

0|I)1,,(diagP ff

Page 9: 3D from Pictures

C~-X~RX~ cam

X10RCR

110C~RRXcam

ZYX

camX0|IKx XC~|IKRx

~

Page 10: 3D from Pictures

C~|IKRP P is a 3x4 Matrix7 degree of freedom:1 from focal length3 from rotation3 from translation

t|RKP

Simplified projective camera model P

x = P X = K [ R | t ] X

Page 11: 3D from Pictures

x = P X

Consider 1 cameraP3x4 has 7 degrees of freedom

Given one image, we observe xCan we recover X or P?

If P is known, what do we know about X?If X is known, can we recover P?

# unknown = 7Each X gives 2 equations

2n >= 7 i.e. n >= 4

Page 12: 3D from Pictures

This is a Camera Calibration Problem

Input: n>4 world to image point correspondences {Xi xi}

Output:camera parameters P = K[R|T]

Page 13: 3D from Pictures

Direct Linear Transform (DLT)ii PXx

ii PXx where [Xi]x =

0 -w yw 0 –x-y x 0

Page 14: 3D from Pictures

Direct Linear Transform (DLT)

n 4 pointsApminimize subject to constraint 1p

use SVD TVUΣA

p is the last column vector of V: p = Vn

Page 15: 3D from Pictures

ObjectiveGiven n≥4, 3D to 2D point correspondences {Xi↔xi’}, determine P

Algorithm(i) Linear solution:

(a) Normalization: (b) DLT

(ii) Minimization of geometric error: Iteratively optimization (Levenberg-Marquardt):

(iii)Denormalization:

ii UXX~ ii Txx~

UP~TP -1

~~~

Implementation in Practice

Page 16: 3D from Pictures

Camera centre C is the point for which PC = 0i.e. the right null vector of P

~ ~

ObjectiveGiven camera projection matrix P, decompose P = K[R|t]

Algorithm

Perform RQ decomposition of M, so that K is the upper-triangular matrix and R is orthonormal matrix.

write M = KR, then P = M[I|- C]~

How to recover K, R and t from P?

~P = K[R|t] = K[R|-RC] = KR[I|-C]

~

Page 17: 3D from Pictures

This is what we learn from 1 Camera

Page 18: 3D from Pictures

Let’s consider 2 cameras

(i) Correspondence geometry: Given an image point x in the first image, how does this constrain the position of the corresponding point x’ in the second image?

(ii) Camera geometry (motion): Given a set of corresponding image points {xi ↔x’i}, i=1,…,n, what are the cameras P and P’ for the two views?

Page 19: 3D from Pictures

(i)Correspondence geometry: Given an image point x in the first image, how does this constrain the position of the corresponding point x’ in the second image?

Page 20: 3D from Pictures

The Fundamental Matrix F

x’T Fx = 0

Page 21: 3D from Pictures

What does Fundamental Matrix F tell us?

x’T Fx = 0

Fundamental matrix F relates corresponding pixels

If the intrinsic parameter (i.e. focal length in our camera model) of both cameras are known, as K and K’.Then we can derive (not here) that: K’TFK = t cross product

R

t and R are translation and rotation for the 2nd camera

i.e. P = [I|0] and P’ = [R|t]

Page 22: 3D from Pictures

Good thing is that …

x’T Fx = 0Fundamental matrix F can be computed:

from a set of pixel correspondences: {x’ x}

Page 23: 3D from Pictures

Compute F from correspondence:

0Fxx'T

separate known from unknown

0'''''' 333231232221131211 fyfxffyyfyxfyfxyfxxfx

0,,,,,,,,1,,,',',',',',' T333231232221131211 fffffffffyxyyyxyxyxxx

(data) (unknowns)(linear)

0Af

0f1''''''

1'''''' 111111111111

nnnnnnnnnnnn yxyyyxyxyxxx

yxyyyxyxyxxx

How many correspondences do we need?

Page 24: 3D from Pictures

What can we do now?(1) Given F, K and K’, we can estimate the relative translationand rotation for two cameras:

(2) Given 8 correspondences: {x’ x}, we can compute F

P = [I | 0] and P’ = [R | t]

Given K and K’, and 8 correspondences {x’ x}, we can compute: P = [I | 0] and P’ = [R | t]

Page 25: 3D from Pictures

This answers the 2nd question

(i) Correspondence geometry: Given an image point x in the first image, how does this constrain the position of the corresponding point x’ in the second image?

(ii) Camera geometry (motion): Given a set of corresponding image points {xi ↔x’i}, i=1,…,n, what are the cameras P and P’ for the two views?

Page 26: 3D from Pictures

But how to make this automatic?

Given K and K’, and 8 correspondences {x’ x}, we can compute: P = [I | 0] and P’ = [R | t] (1) Estimating intrinsic K and K’ (auto-calibration)

will not be discussed here. (involve much projective geometry knowledge)(2) Let’s see how to find correspondences automatically. (i.e. Feature detection and matching)

Page 27: 3D from Pictures

Lowe’s SIFT features invariant to with position, orientation and

scale

Page 28: 3D from Pictures

Scale• Look for strong responses of DOG filter

(Difference-Of-Gaussian) over scale space

• Only consider local maxima in both position and scale

Page 29: 3D from Pictures

Orientation• Create histogram of local

gradient directions computed at selected scale

• Assign canonical orientation at peak of smoothed histogram

• Each key specifies stable 2D coordinates (x, y, scale, orientation)

0 2

Page 30: 3D from Pictures

Simple matchingFor each feature in image 1 find the feature in

image 2 that is most similar (compute correlation of two vectors) and vice-versa

Keep mutual best matchesCan design a very robust RANSAC type

algorithm

Page 31: 3D from Pictures

What have we learnt so far?

Page 32: 3D from Pictures

What have we learnt so far?

Page 33: 3D from Pictures

Consider more then 2 cameras

KK’

PP’

X

P’’

Page 34: 3D from Pictures

Objective

Given N images { Q1, …, QN } with reasonable overlaps

Compute N camera projection matrices { P1, …,

PN }, where each Pi = Ki[Ri |ti], Ki is the intrinsic parameter, Ri and ti are rotation and translation matrix respectively

Page 35: 3D from Pictures

Algorithm(1) Find M tracks T = {T1, T2, …, TN }

(i ) for every pair of image {Qi , Qj}: detect SIFT feature points in Qi and Qj

match feature points robustly (RANSAC)(ii) match features across multiple images, construct tracks.

(2) Estimate { P1… PN } and 3D position for each track { X1… XN }

(i ) select one pair of image {Q1’ , Q2’} (well-conditioned). Let T1’2’ = {their associate overlapping track};(ii) Estimate K1’ and K2’, compute {P1’ , P2’} and 3D position of

T1’2’ from fundamental matrix.(iii) incrementally add new camera Pk into the system, estimate

its camera matrix by DLT (calibration) (iv) repeat (iii) until all the cameras are estimated.

Page 36: 3D from Pictures

Algorithm(1) Find M tracks T = {T1, T2, …, TN }

(i ) for every pair of image {Qi , Qj}: detect SIFT feature points in Qi and Qj

match feature points robustly (RANSAC)(ii) match features across multiple images, construct tracks.

(2) Estimate { P1… PN } and 3D position for each track { X1… XN }

(i ) select one pair of image {Q1’ , Q2’} (well-conditioned). Let T1’2’ = {their associate overlapping track};(ii) Estimate K1’ and K2’, compute {P1’ , P2’} and 3D position of

T1’2’ from fundamental matrix.(iii) incrementally add new camera Pk into the system, estimate

its camera matrix by DLT (calibration) (iv) repeat (iii) until all the cameras are estimated.

However, this won’t work!

Page 37: 3D from Pictures

Algorithm(1) Find M tracks T = {T1, T2, …, TN }

(i ) for every pair of image {Qi , Qj}: detect SIFT feature points in Qi and Qj

match feature points robustly (RANSAC)(ii) match features across multiple images, construct tracks.

(2) Estimate { P1… PN } and 3D position for each track { X1… XN }

(i ) select one pair of image {Q1’ , Q2’} (well-conditioned). Let T1’2’ = {their associate overlapping track};(ii) Estimate K1’ and K2’, compute {P1’ , P2’} and 3D position of

T1’2’ from fundamental matrix. Then non-linearly minimize reprojection errors (LM).

(iii) incrementally add new camera Pk into the system, estimate initial value by DLT, then non-linearly optimize the system.

(iv) repeat (iii) until all the cameras are estimated.

Replaces with more robust non-linear optimization

Page 38: 3D from Pictures

Tired?

Page 39: 3D from Pictures

Recall the camera calibration algorithm

ObjectiveGiven n≥4, 3D to 2D point correspondences {Xi↔xi’}, determine P

Algorithm(i) Linear solution:

(a) Normalization: (b) DLT

(ii) Minimization of geometric error: Iteratively optimization (Levenberg-Marquardt):

(iii)Denormalization:

ii UXX~ ii Txx~

UP~TP -1

~~~

Page 40: 3D from Pictures

We are lucky! 1st time huge amount of visual data is easily accessible. High-level description of these data also become available. How do we explore them? Analysis them? Wisely use them?

What’s the contribution of this paper?

How to extract high-level information?- Computer Vision, Machine Learning Tools. Structure from motion, and more computer vision tools reach a certain robust point for graphics application.- InternetImage search

- Human Labelgame with purpose

Page 41: 3D from Pictures

What is the space of all the pictures?

in the past

present

the future?

Page 42: 3D from Pictures

What’s the space of all the videos?

in the past

present

the future?

Page 43: 3D from Pictures

What else?

Page 44: 3D from Pictures

Using Search Engine?

Page 45: 3D from Pictures

Using human computation power?

Page 46: 3D from Pictures

Using human computation power?

Page 47: 3D from Pictures

Using human computation power?

Page 48: 3D from Pictures

What else?

Page 49: 3D from Pictures

What else?

Page 50: 3D from Pictures

Book:“Multiple View Geometry in Computer Vision” Hartley and Zisserman

Online Tutorial:http://www.cs.unc.edu/~marc/tutorial.pdfhttp://www.cs.unc.edu/~marc/tutorial/

Matlab Toolbox:http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TORR1/index.html