University of Pennsylvania 1 GRASP
Single View Geometry
Jianbo Shi
Camera model & Orientation + Position estimation
What am I?
University of Pennsylvania 2 GRASP
Camera projection model
The overall goal is to compute 3D geometry of the scene from just 2D images. We will first study 3D->2D camera projection using linear algebra.
University of Pennsylvania 3 GRASP
The Pinhole Camera
● Light enters a darkened chamber through a pinhole opening and forms an image on the further surface
University of Pennsylvania 4 GRASP
The Pinhole Camera Model
University of Pennsylvania 5 GRASP
Ideal case:
y y’
f z
C
yc
xc
zc
p
Zx’ = f X Zy’ = f Y
Z = Z
Projection equation: x’ = f X / Z y’ = f Y / Z
University of Pennsylvania 6 GRASP
Step 1: Camera projection matrix
P0 X = x
y y’
f z
C
yc
xc
zc
p
Zx’ = f X Zy’ = f Y
Z = Z
x’ y’ 1
Xc Yc Zc 1
=
f 0 0 0 0 f 0 0 0 0 1 0
Zc
University of Pennsylvania 7 GRASP
Step 2: Intrinsic camera parameters: map camera coordinate to pixel coordinate
x y z
= y y’
f z
C
yc
xc
zc
p
αx s px 0 αy py 0 0 1
K (3x3 submatrix)
x’ y’ z’
αx, αy is pixel scaling factor
Px, py is the principle point (where optical axis hits image plane)
s is the slant factor, when the image plane is not normal to the optical axis
Optical world Pixel world
University of Pennsylvania 8 GRASP
Combine the Intrinsic camera parameters
P0 X = x
x y z
xc yc zc 1
=
αxf s px 0 0 αyf py 0 0 0 1 0
K (Calibration matrix)
P = [K , 0] = K [I, 0]
αx s px 0 αy py 0 0 1
x y 1
Xc Yc Zc 1
=
f 0 0 0 0 f 0 0 0 0 1 0
University of Pennsylvania 9 GRASP
Step 3: External parameters: rotation and translation map
the world to camera coordinates
y y’
f z
C
yc
xc
zc
p
(R,t)
World coordinate
Camera coordinate
3x3 3x1
University of Pennsylvania 10 GRASP
Combining Internal and External parameters
y y’
f z
C
yc
xc
zc
p
(R,t)
World coordinate
Camera coordinate 1)
1) Translate the world coordinate into the camera coordinate
2) Translate the camera coordinate into the pixel coordinate
2)
University of Pennsylvania 11 GRASP
Combining Internal and External parameters
y y’
f z
C
yc
xc
zc
p
(R,t)
World coordinate
Camera coordinate
After simplication:
x = K [R, t] X
pixel world
University of Pennsylvania 12 GRASP
Special case, planar world, homograph
(R,t)
Camera coordinate
y y’
f z
C
yc
xc
zc
p
World coordinate
X= (xw,yw,zw=0,1) x = K [R, t] X Expand:
x = K r1 r2 r3 t xw yw zw=0 1
University of Pennsylvania 13 GRASP
Special case, planar world, homograph
(R,t)
Camera coordinate
y y’
f z
C
yc
xc
zc
p
World coordinate
X= (xw,yw,zw=0,1)
x = K r1 r2 t xw yw
1
After simplication:
X = K H3x3 X We have the homographic mapping:
University of Pennsylvania 14 GRASP
Special case: rotating camera
(R,t)
Camera coordinate
y y’
f z
C
yc
xc
zc
p
World coordinate
x = K [R, t] X Expand:
with t = 0
x = K [R] X
xw yw
zw X = This is also a homography with
H = R
University of Pennsylvania 15 GRASP
Panoramas
x = K [R] X
University of Pennsylvania 16 GRASP
Visual Odometry: Overview
1) The transformation (R,t) tells us how to rotate + move the world coordinate to the camera coordinate.
2) In the following slides, we will see 1) how to get the world reference’s frame
orientation+position from (R,t), 2) how to get the pan/tilt/yaw angles from R
3) If we want to know the camera orientation+position, we need to look at its inverse transform, which we should see is , 1) The computation of the camera pan/tilt/
yaw angle is the same as in the previous case
y y’
f z
C
yc
xc
zc
p
World coordinate
(R,t)
University of Pennsylvania 17 GRASP
Recover Camera orientation Case 1: single vanishing point (z-direction)
University of Pennsylvania 18 GRASP
Pan/tilt/yaw angles
x z
x y
z y
Pan : rotation around y-axis
Yaw :rotation around z-axis
Tilt : rotation around x-axis
University of Pennsylvania 19 GRASP
Rotation: Pan/Tilt/Yaw
Given the 3x3 rotation matrix R, we can recover, pan/tilt/yaw angle, and vise. vica.
xcam,
ycam
zworld, (0,0,1)T
University of Pennsylvania 20 GRASP
y y’
f z
C
yc
xc
zc
p
(R,t)
World coordinate
Camera coordinate
The basic idea is that if we can see the “north” star, we know how to oriented ourselves (minus yaw angle) (why?)
And the ``north” star is just the point of infinity in some direction
University of Pennsylvania 21 GRASP
Columns of R matrix = image of vanishing points of world-space axes!
r1,r2,r3 == image of x,y,z axis vanishing points
xc yc zc
xw yw zw tw
=
• • • • • • • • • • • •
R x=K·[R,t]·X, or
[R,t]= • • • • • • • • • • • •
r1 r2 r3 t
f C
yc
zc
xc
P
t K
Camera projection equation:
• • • • • • • • • • • •
r1 r2 r3 t
0 0 1 0
= r3
z-direction
vanishing point
(need to normalize to norm =1)
Seeing the vanishing point can tell us:
University of Pennsylvania 22 GRASP
Recover Pan/Tilt angle from z-vanishing point
Geometric explanation: Pan Tilt The alignment between z-axis of the world and image is defined only
by the pan/tilt angle:
zcam
xcam,
ycam
zworld, (0,0,1)T
r3 =
Note: tilt angle with rotation around x-axis is positive when pointing down in this figure
University of Pennsylvania 23 GRASP
Sanity check Vanishing point in z
University of Pennsylvania 24 GRASP
y
x
z
x = -1
y = az -b
x = 1
Ground
plane
Image plane
In this example, we are walking down a track, the ground is tilted up(we are going up the hill). Assume we are b meter tall. We see two parallel lines to our left and right, with x = 1,-1. A line on the ground is described by equation
y = az - b, where a is the slope the ground.
So we know:
Tilt angle
y
z
y = az -b
University of Pennsylvania 25 GRASP
As a line gets far away, z -> infinity. If (1,y,z) is a point on this line, its image is f(1/z,y/z). As z -> infinity, 1/z -> 0. What about y/z?
y/z = (az-b)/z = a - b/z -> a.
So a point on the line appears at: (0,a).
y
x
z
y = az - b Ground
plane
Image plane
VanishingPoint (0,a)
University of Pennsylvania 26 GRASP
y
x z
y = az - b
(0,a)
zworld, (0,0,1)T rz = =
Now, we check
University of Pennsylvania 27 GRASP
Recover Pan/Tilt angle from z-vanishing point
Geometric explanation: Pan Tilt The alignment between z-axis of the world and image is defined only
by the pan/tilt angle:
xcam,
zcam
ycam
zworld, (0,0,1)T
r3 =
University of Pennsylvania 28 GRASP
Recover Camera orientation Case 2: two vanishing points, x-y direction
University of Pennsylvania 29 GRASP
In some case, we can’t measure the z-vanishing point directly, but if we have x-y vanishing points, we can
imagine what the z-vanishing point
University of Pennsylvania 30 GRASP
Recall: vanishing points of x,y direction, give us the, r1 and r2
• • • • • • • • • • • •
r1 r2 r3 t first two columns of R
Basic idea: 1) x-y vanishing point gives us the first two columns of R, r1 and r2
2) z-vanishing point, the third columns of R, can be computed by r3 = r1 r2 x
University of Pennsylvania 31 GRASP
Steps for Recovering R from Two vanishing Point
1. Measure X / Y direction vanishing point
University of Pennsylvania 32 GRASP
Recovering R from Two vanishing Point (2)
2. Translate to the camera coordinates by intrinsic matrix K
University of Pennsylvania 33 GRASP
Recovering R from Two vanishing Point (3)
3. Get r1 and r2 from vanishing point xvX and xvY
vanishing points of x,y direction,
give us the, r1 and r2
• • • • • • • • • • • •
r1 r2 r3 t first two columns of R
University of Pennsylvania 34 GRASP
Recovering R from Two vanishing Point (4)
4. Get r3
5. Get R
r3 = r1 r2 x
r1 r2 r3
University of Pennsylvania 35 GRASP
Example Recover Camera Pan /Tilt From r3
University of Pennsylvania 36 GRASP
Recover Camera Pan /Tilt From r3 (2)
α
Examples of Camera Pan/Tilt angle
x
z
y x
- -
University of Pennsylvania 37 GRASP
Recovering all three angles: yaw + pan/tilt
• • • • • • • • •
r1 r2 r3 R=
Depends on pan/tilt
Depends on yaw/pan/tilt
r1 1) Write out equation in yaw/pan/tilt angles for r2
2) Solve for yaw angle given pan/tilt angles
University of Pennsylvania 38 GRASP
Summary so far
1) If we have the vanishing point in z-direction, we can recover pan/tilt angle
1) This is when the planar target is on the ground
2) If don’t see the z-direction vanishing point directly, you would need two vanishing points of x-y directions
3) Given x-y directions vanishing point, we can cover the full rotation matrix, therefore its z-direction (pan/tilt)
4) Given pan/tilt + x-direction vanishing point, we can recover yaw angle
University of Pennsylvania 39 GRASP
Recover Camera orientation + position Case 3: homography matrix
University of Pennsylvania 40 GRASP
(R,t)
Camera coordinate
y y’
f z
C
yc
xc
zc
p
World coordinate
x = K r1 r2 t xw yw
1
Recall for planar surface
X = K H3x3 X We have the homographic mapping:
H3x3
University of Pennsylvania 41 GRASP
(R,t)
Camera coordinate
y y’
f z
C
yc
xc
zc
p
World coordinate
x = K r1 r2 t xw yw
1
Recall for planar surface
H3x3
This implies if we have H,
1) We can recover the full rotation matrix, R,
2) We can recover the position vector t
University of Pennsylvania 42 GRASP
Example Recover [R,t] from H (1)
1. Get H from 4 points
2. Compute the norm of first column
University of Pennsylvania 43 GRASP
Example Recover [R,t] from H (2)
3. Compute r1, r2, t
4. Compute r3
University of Pennsylvania 44 GRASP
x = K r1 r2 t xw
yw
1
Procedure of recovery R, and t from H
1) Compute the rotation of x-axis, set
2) Compute the rotation of the y-axis, set
3) Compute the translation vector t (the position of the robot)
4) Compute the rotation of the z-axis, set
5) Use the previous case(2), to compute the pan/tilt/yaw angle
University of Pennsylvania 45 GRASP
Summary so far
1) Each vanishing point in the image gives you one rotation vector 2) Given the z-vanishing point, we can detect the pan/tilt angle.
1) This is the case when the planar target is on the ground.
3) Given x-y vanishing point, we can compute the z-vanishing point(therefore pan/tilt angle). 1) This is the case when we are facing the planar surface 2) Given x-rotation vector, pan-tilt angle, we can determine the yaw angle 3) But we can’t determine the position of the robot
4) If we are given the homograph, (using 4 corresponding points), we can recover both the rotational angles, and the position of the robot.
University of Pennsylvania 46 GRASP
How to estimate the rotation and translation of the robot from the world point of view?
In the case of moving robot(rather than moving target), we need to know the orientation/position of the robot in the world ==> we need to how to pan/tilt the world oriented to the robot. Note: pan/tilt of the camera is very different from the pan/tilt of the world!
University of Pennsylvania 47 GRASP
Converting rotation/translation from (world->camera) to (camera->world)
y y’
f z
C
yc
xc
zc
p
(R,t)
World coordinate
Camera coordinate
With the fact: =
(world-> camera)
(camera->world)
University of Pennsylvania 48 GRASP
Converting rotation/translation from (world->camera) to (camera->world)
y y’
f z
C
yc
xc
zc
p
(R,t)
World coordinate
Camera coordinate
(world-> camera)
(camera-> world)
What does it mean:
1) The position of the camera in the world reference frame is:
2) The pan/tilt/yaw of the world to align with camera is computed from the rows of the R instead of columns of R
University of Pennsylvania 49 GRASP
Recovering World pan/tilt from Rotation matrix R^T
1. Compute the (world->camera) rotation R and translation t
2. Take the last row of the R, and recover pan/tilt angle of camera 3) Compute the yaw angle if needed, from first two rows of R 4) Compute the position of the robot, by
University of Pennsylvania 50 GRASP
Example Recover Pan/Tilt in World Frame
From World-> Camera: [R, t] èFrom Camera -> World : [RT, -RTt]
Pan
Tilt
x
z
x y
University of Pennsylvania 51 GRASP
Example Recover Translation in World Frame (2)
x
z target
University of Pennsylvania 52 GRASP
Final notes
1) There are two rotation/translation we care about: 1) (R,t): world->camera transformation. It tells us parameters of the camera.
The columns of R: rotation of camera so it aligns with the world. The vector t = coordinate of the world center in camera frame.
2) : camera-> world. It tells us parameters of the world. Columns of R^t: rotation of the world so it aligns with the camera. The vector R^t(-t) = coordinate of the camera in the world frame.
3) For the visual odometry task used in project 2, you need to use the transformation .
4) The pan angle of the world, is the robot orientation in the world frame.