Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion...

22
Announcements

Transcript of Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion...

Page 1: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Announcements

Page 2: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Structure-from-Motion

• Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving camera.– Equivalently, we can think of the world as moving and the

camera as fixed.

• Like stereo, but the position of the camera isn’t known (and it’s more natural to use many images with little motion between them, not just two with a lot of motion).– We may or may not assume we know the parameters of the

camera, such as its focal length.

Page 3: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Structure-from-Motion

• As with stereo, we can divide problem:– Correspondence.– Reconstruction.

• Again, we’ll talk about reconstruction first.– So for the next few classes we assume that

each image contains some points, and we know which points match which.

Page 4: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Structure-from-Motion

Page 5: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Movie

Page 6: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Reconstruction

• A lot harder than with stereo.

• Start with simpler case: scaled orthographic projection (weak perspective).– Recall, in this we remove the z coordinate

and scale all x and y coordinates the same amount.

Page 7: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

First: Represent motion

• We’ll talk about a fixed camera, and moving object.• Key point:

111

...

21

21

21

n

n

n

zzz

yyy

xxx

P

Points

y

x

trrr

trrrS

3,22,21,2

3,12,11,1

Some matrix

n

n

YYY

XXXI

21

21...

The image

SPI Then:

Page 8: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Structure-from-Motion

• S encodes:– Projection: only two lines– Scaling, since S can have a scale factor.– Translation, by tx and ty.– Rotation:

SPI

Page 9: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Rotation

P

rrr

rrr

rrr

3,32,31,3

3,22,21,2

3,12,11,1 Represents a 3D rotation of the points in P.

Page 10: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

First, look at 2D rotation (easier)

n

n

yyy

xxx

21

21 ...

cossin

sincos

cossin

sincosR

Matrix R acts on points by rotating them.

• Also, RRT = Identity. RT is also a rotation matrix, in the opposite direction to R.

Page 11: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Simple 3D Rotation

n

n

n

zzz

yyy

xxx

21

21

21...

100

0cossin

0sincos

Rotation about z axis.

Rotates x,y coordinates. Leaves z coordinates fixed.

Page 12: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Full 3D Rotation

cossin0

sincos0

001

cos0sin

010

sin0cos

100

0cossin

0sincos

R

• Any rotation can be expressed as combination of three rotations about three axes.

100

010

001TRR

• Rows (and columns) of R are orthonormal vectors.

• R has determinant 1 (not -1).

Page 13: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Putting it Together

Prrr

rrr

rrr

t

t

t

s

z

y

x

1000

0

0

0

100

010

001

010

001

3,32,31,3

3,22,21,2

3,12,11,1Scale

Projection

3D Translation

3D Rotation

),,(),,(

0),,(),,(

where

3,22,21,23,12,11,1

3,22,21,23,12,11,1

3,22,21,2

3,12,11,1

ssssss

ssssss

Pstsss

stsss

y

x

We can just write stx as tx and sty as ty.

Page 14: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Affine Structure from Motion

),,(),,(

0),,(),,(

where

3,22,21,23,12,11,1

3,22,21,23,12,11,1

3,22,21,2

3,12,11,1

ssssss

ssssss

Ptsss

tsss

y

x

Page 15: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Affine Structure-from-Motion: Two Frames (1)

111

......

21

21

21

22

3,2

2

2,2

2

1,2

22

3,1

2

2,1

2

1,1

11

3,2

1

2,2

1

1,2

11

3,1

1

2,1

1

1,1

22

2

2

1

22

2

2

1

11

2

1

1

11

2

1

1

n

n

n

y

x

y

x

n

n

n

n

zzz

yyy

xxx

tsss

tsss

tsss

tsss

vvv

uuu

vvv

uuu

Page 16: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Affine Structure-from-Motion: Two Frames (2)

1111

1000

0100

0010

11114321

4321

4321

zzzz

yyyy

xxxx

To make things easy, suppose:

Page 17: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Affine Structure-from-Motion: Two Frames (3)

111

......

21

21

21

22

3,2

2

2,2

2

1,2

22

3,1

2

2,1

2

1,1

11

3,2

1

2,2

1

1,2

11

3,1

1

2,1

1

1,1

22

2

2

1

22

2

2

1

11

2

1

1

11

2

1

1

n

n

n

y

x

y

x

n

n

n

n

zzz

yyy

xxx

tsss

tsss

tsss

tsss

vvv

uuu

vvv

uuu

Looking at the first four points, we get:

1111

1000

0100

0010

22

3,2

2

2,2

2

1,2

22

3,1

2

2,1

2

1,1

11

3,2

1

2,2

1

1,2

11

3,1

1

2,1

1

1,1

2

4

2

3

2

2

2

1

2

4

2

3

2

2

2

1

1

4

1

3

1

2

1

1

1

4

1

3

1

2

1

1

y

x

y

x

tsss

tsss

tsss

tsss

vvvv

uuuu

vvvv

uuuu

Page 18: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Affine Structure-from-Motion: Two Frames (7)

1111

1000

0100

0010

11114321

4321

4321

zzzz

yyyy

xxxx

A

But, what if the first four points aren’t so simple?

Then we define A so that:

This is always possible as long as the points aren’t coplanar.

Page 19: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Affine Structure-from-Motion: Two Frames (8)

111

......

21

21

21

1

22

3,2

2

2,2

2

1,2

22

3,1

2

2,1

2

1,1

11

3,2

1

2,2

1

1,2

11

3,1

1

2,1

1

1,1

22

2

2

1

22

2

2

1

11

2

1

1

11

2

1

1

n

n

n

y

x

y

x

n

n

n

n

zzz

yyy

xxx

AA

tsss

tsss

tsss

tsss

vvv

uuu

vvv

uuu

111

......

21

21

21

22

3,2

2

2,2

2

1,2

22

3,1

2

2,1

2

1,1

11

3,2

1

2,2

1

1,2

11

3,1

1

2,1

1

1,1

22

2

2

1

22

2

2

1

11

2

1

1

11

2

1

1

n

n

n

y

x

y

x

n

n

n

n

zzz

yyy

xxx

tsss

tsss

tsss

tsss

vvv

uuu

vvv

uuu

11111

1000

0100

...0010...

1

22

3,2

2

2,2

2

1,2

22

3,1

2

2,1

2

1,1

11

3,2

1

2,2

1

1,2

11

3,1

1

2,1

1

1,1

22

2

2

1

22

2

2

1

11

2

1

1

11

2

1

1

n

n

n

y

x

y

x

n

n

n

n

z

y

x

A

tsss

tsss

tsss

tsss

vvv

uuu

vvv

uuu

Then, given:

We have:

And:

Page 20: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Affine Structure-from-Motion: Two Frames (9)

11111

1000

0100

...0010...

1

22

3,2

2

2,2

2

1,2

22

3,1

2

2,1

2

1,1

11

3,2

1

2,2

1

1,2

11

3,1

1

2,1

1

1,1

22

2

2

1

22

2

2

1

11

2

1

1

11

2

1

1

n

n

n

y

x

y

x

n

n

n

n

z

y

x

A

tsss

tsss

tsss

tsss

vvv

uuu

vvv

uuu

Given:

1

22

3,2

2

2,2

2

1,2

22

3,1

2

2,1

2

1,1

11

3,2

1

2,2

1

1,2

11

3,1

1

2,1

1

1,1

A

tsss

tsss

tsss

tsss

y

x

y

x

Then we just pretend that:

is our motion, and solve as before.

Page 21: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Affine Structure-from-Motion: Two Frames (10)

111

......

21

21

21

22

3,2

2

2,2

2

1,2

22

3,1

2

2,1

2

1,1

11

3,2

1

2,2

1

1,2

11

3,1

1

2,1

1

1,1

22

2

2

1

22

2

2

1

11

2

1

1

11

2

1

1

n

n

n

y

x

y

x

n

n

n

n

zzz

yyy

xxx

tsss

tsss

tsss

tsss

vvv

uuu

vvv

uuu

This means that we can never determine the exact 3D structure of the scene. We can only determine it up to some transformation, A. Since if a structure and motion explains the points:

111

......

21

21

21

1

22

3,2

2

2,2

2

1,2

22

3,1

2

2,1

2

1,1

11

3,2

1

2,2

1

1,2

11

3,1

1

2,1

1

1,1

22

2

2

1

22

2

2

1

11

2

1

1

11

2

1

1

n

n

n

y

x

y

x

n

n

n

n

zzz

yyy

xxx

AA

tsss

tsss

tsss

tsss

vvv

uuu

vvv

uuuSo does another of the form:

Page 22: Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.

Affine Structure-from-Motion: Two Frames (11)

111

......

21

21

21

1

22

3,2

2

2,2

2

1,2

22

3,1

2

2,1

2

1,1

11

3,2

1

2,2

1

1,2

11

3,1

1

2,1

1

1,1

22

2

2

1

22

2

2

1

11

2

1

1

11

2

1

1

n

n

n

y

x

y

x

n

n

n

n

zzz

yyy

xxx

AA

tsss

tsss

tsss

tsss

vvv

uuu

vvv

uuu

10004,33,32,31,3

4,23,22,21,2

4,13,12,11,1

aaaa

aaaa

aaaa

A

Note that A has the form:

A corresponds to translation of the points, plus a linear transformation.