Intensity-Difference Based Monocular Visual Odometry for ...
What is Visual Odometry - UCSBlbmedia.ece.ucsb.edu/.../Lecture6-visual-odometry.pdf · Visual...
Transcript of What is Visual Odometry - UCSBlbmedia.ece.ucsb.edu/.../Lecture6-visual-odometry.pdf · Visual...
What is Visual Odometry
• 视觉里程计(Visual odometry)起初主要应
用在机器人领域中,用于解决移动机器人在未知环境中的自主定位和导航问题
• 它的核心功能是分析采集的图片序列,并由此确定相机的当前位置和姿态。根据每一帧的相机姿态,可以得到整个系统的轨迹图
baseline
Epipolar plane
Epipolar holes
Epipolar lines
The epipolar geometry
C,C’,x,x’ and X are coplanar
The epipolar geometry
All points on p project on l and l’
The epipolar geometry
Family of planes p and lines l and l’
Intersection in e and e’
The epipolar geometry
epipoles e,e’
= intersection of baseline with image plane
= projection of projection center in other image
an epipolar plane = plane containing baseline (1-D family)
an epipolar line = intersection of epipolar plane with image
(always come in corresponding pairs)
Example: converging cameras
Example: motion parallel with image plane
Example: forward motion
e
e’
Essential Matrix and Fundamental Matrix
两个相机拍摄同一场景获得的左右两幅图像对应的点之间的关系可以
通过Essential矩阵或者Fundamental矩阵来表明。
Matrix form of cross product
bab
aa
aa
aa
baba
baba
baba
ba
0
0
0
12
13
23
1221
3113
2332
0)(
0)(
bab
baa
Calibrated Camera
T
T
vup
vupRptp
)1,','('
)1,,( with 0)](['
0' Epp
' 0 with p Ep E t R SR Essential matrix
Uncalibrated Camera
0 ' pEp
scoordinate camerain ' and toingcorrespond scoordinate pixelin points ' and pppp
p'MppMp '' and 1
int
1
int
with
0'
1
int
'
int
MEMF
F pp
T
T
Fundamental matrix
Properties of fundamental and
essential matrix
• Matrix is 3 x 3
• Transpose : If F is essential matrix of cameras (P, P’).
FT is essential matrix of camera (P’,P)
• Epipolar lines: Think of p and p’ as points in the projective
plane then F p is projective line in the right image.
That is l’=F p l = FT p’
• Encodes information of the extrinisic parameters only
Least square approach
1|| constraint under the
) Minimize
2
2
1
F
Fp'(p i
n
i
i
We have a homogeneous system A f =0
The least square solution is smallest singular value of A,
i.e. the last column of V in SVD of A = U D VT
3D Reconstruction
• Stereo: we know the viewing geometry (extrinsic parameters) and the intrinsic parameters: Find correspondences exploiting epipolar geometry, then reconstruct
• Structure from motion (with calibrated cameras): Find correspondences, then estimate extrinsic parameters (rotation and direction of translation), then reconstruct.
• Uncalibrated cameras: Find correspondences,
Compute projection matrices (up to a projective transformation), then reconstruct up to a projective transformation.
Point reconstruction
M Xx XM 'x'
Geometric error
Reconstruct matches in projective frame
by minimizing the reprojection error
Non-iterative optimal solution
Reconstruction for intrinsically
calibrated cameras
• Compute the essential matrix E using normalized points.
• Select M=[I|0] M’=[R|T] then E=[Tx]R
• Find T and R using SVD of E
Reconstruction from uncalibrated
cameras Reconstruction problem:
given xi↔x‘i , compute M,M‘ and Xi
ii M Xx ii XMx for all i
without additional information possible
only up to projective ambiguity
Projective Reconstruction
Theorem
• Assume we determine matching points xi and xi’. Then we can compute a unique Fundamental matrix F.
• The camera matrices M, M’ cannot be recovered uniquely
• Thus the reconstruction (Xi) is not unique
• There exists a projective transformation H such that
11'
2'1
12,,1,2 HMMHMMHXX ii
Reconstruction ambiguity:
projective
iii XHM HM Xx P
-1
P
Structure from Motion
or
Simultaneous Localization and Mapping (SLAM)
or
Visual Odometry
Camera calibration
• Determine camera parameters from known
3D points or calibration object(s)
1. internal or intrinsic parameters such as
focal length, optical center, aspect ratio:
what kind of camera?
2. external or extrinsic (pose)
parameters:
where is the camera?
• How can we do this?
Coordinate Systems
• World Coordinate System: It’s a known
reference coordinate system with respect to
which we calibrate the camera.
• Camera Coordinate System: It’s a
coordinate system with its origin at the
optical center of the camera.
• Pixel Coordinate System
Camera:Geometry Involved
• Mathematical Definition: A camera is a mapping
between a 3D world (object space) and a 2D
image.
• Calibration: The objective of calibration is to
calculate the intrinsic and/or extrinsic parameters
of a camera given a set of images taken using the
camera.
Camera Models
• Perspective:
• Orthographic:
Note: We will deal only with perspective projection
z
xfx '
z
yfy '
xx ' yy '
Intrinsic Parameters
• Let be the coordinates of a point in 3D. Its projection on
the image plane is given by:
),,( zyx
Pixel Square Camera, Ideal
z
yfv
z
xfu
Pixelr Rectangula Camera, Ideal
z
ylfv
z
xkfu
Center Displaced
0
0
vz
ylfv
uz
xkfu
Axes Coordinate Rotated
0
0
)sin(
)cot(
vz
yv
uz
y
z
xu
),( vu
Intrinsic Parameters
KPz
p1
1
v
u
p
1
z
y
x
P
0100
0)sin(
0
0)cot(
0
0
v
u
K
Hence, the 5 intrinsic parameters of a camera are:
θ β α vu 00
Extrinsic Parameters
• The camera frame ( C ) can be different from the world frame (W).
Vectoron translati3x1 t
MatrixRotation 3x3 R
1101
wc PtRP
c
c
c
c
z
y
x
P
w
w
w
w
z
y
x
P
T
T
T
r
r
r
R
3
2
1
z
y
x
t
t
t
t
Hence, we have 6 extrinsic
Parameters
Perspective Projection Matrix
• M = 3x4 matrix. Taking into consideration both the intrinsic
and extrinsic parameters:
z
T
zy
TT
zyx
TTT
tr
tvtrvr
tuttrurr
tRKM
3
0302
03021
)sin()sin(
)cot()cot(
MPz
p1
Pm
Pmv
Pm
Pmu
.
.
.
.
3
2
3
1
Rotation and Translation
Camera matrix
• Fold intrinsic calibration matrix K and
extrinsic pose parameters (R,t) together into
a camera matrix
• M = K [R | t ]
• (put 1 in lower r.h. corner for 11 d.o.f.)
Camera matrix calibration
• Directly estimate 11 unknowns in the M
matrix using known 3D points (Xi,Yi,Zi) and
measured feature positions (ui,vi)
Camera matrix calibration
• Linear regression:
– Bring denominator over, solve set of (over-
determined) linear equations. How?
– Least squares (pseudo-inverse)
Projective structure from motion
• Given: m images of n fixed 3D points
• xij = Pi Xj , i = 1,… , m, j = 1, … , n
• Problem: estimate m projection matrices Pi and n 3D points Xj from the mn corresponding points xij
x1j
x2j
x3j
Xj
P1
P2
P3
Slides from Lana Lazebnik
Bundle adjustment • Non-linear method for refining structure and motion
• Minimizing reprojection error
2
1 1
,),(
m
i
n
j
jiijDE XPxXP
x1j
x2j
x3
j
Xj
P1
P2
P3
P1Xj
P2Xj
P3Xj
Existing attempts : SLAM
• SLAM : Simultaneous Localization and Mapping
can use many different types of sensor to acquire observation data used in building the
map such as laser rangefinders, sonar sensors and cameras.
– Well-established in robotics (using a rich array of sensors)
– Demonstrated with a single hand-held camera by Davison at 2003 (Mono-
SLAM).
– Mono-SLAM was applied to
AR system at 2004.
Existing attempts : Model based tracking
• Model-based tracking is
– More robust
– More accurate
– Proposed by Lepetit et. al.
at ISMAR 2003
Frame by Frame SLAM • Why?
is SLAM fundamentally harder?
Time
One frame
Find features
Draw graphics
Update camera pose and entire map Many DOF
Frame by Frame SLAM • SLAM
– Updating entire map every frame is so
expensive!!!
– Needs “sparse map of high-quality features”
- A. Davison
• Proposed approach
– Use dense map (of low quality features)
– Don’t update the map every frame : Keyframes
– Split the tracking and mapping into two
threads
Parallel Tracking and Mapping
• Proposed method - Split the tracking and mapping into two threads
Time
One frame
Find features
Draw graphics
Update camera pose only Simple & easy
Thread #2 Mapping
Thread #1 Tracking
Update map
Parallel Tracking and Mapping
Tracking thread:
• Responsible estimation of camera
pose and rendering augmented
graphics
• Must run at 30 Hz
• Make as robust and accurate as
possible
Mapping thread:
• Responsible for providing the
map
• Can take lots of time per key
frame
• Make as rich and accurate as
possible
Tracking thread • Overall flow
Pre-process frame
Project points
Measure points
Update Camera Pose
Project points
Measure points
Update Camera Pose
Draw Graphics
Coarse stage Fine stage
Map
Pre-process frame
• Make for pyramid levels
640x480 320x240 160x120 80x60
Pre-process frame • Make for pyramid levels
• Detect Fast corners
– E. Rosten et al (ECCV 2006)
640x480 320x240 160x120 80x60
Project Points
• Use motion model to update camera pose
– Constant velocity model
Vt =(Pt – Pt-1)/∇t
Estimated current Pt+1
Previous pos Pt
Previous pos Pt-1
Pt+1=Pt+∇t’(Vt)
∇t
∇t’
Project Points
• Choose subset to measure
– ~ 50 biggest features for coarse stage
– 1000 randomly selected for fine stage
640x480 320x240 160x120 80x60
~50 1000
Measure Points • Generate 8x8 matching template (warped
from source key-frame:map)
• Search a fixed radius around projected
position
– Use Zero-mean SSD
– Only search at Fast corner
points
Update caemra pose • 6-DOF problem
– Obtain by SFM (Three-point algorithm)
?
Mapping thread • Overall flow
Stereo Initialization
Wait for new key frame
Add new map points
Optimize map
Map maintenance
Tracker
Stereo Initialization
• Use five-point-pose algorithm
– D. Nister et. al. 2006
• Requires a pair of frames and feature correspondences
• Provides initial map
• User input required:
– Two clicks for two key-frames
– Smooth motion for feature correspondence
Wait for new key frame
• Key frames are only added if :
– There is a sufficient baseline to the other key frame
– Tracking quality is good
– Keyframe (4 level pyramid images and its corners)
• When a key frame is added :
– The mapping thread stops whatever it is doing
– All points in the map are measured in the keyframe
– New map points are found and added to the map
Add new map points • Want as many map points as possible
• Check all maximal FAST corners in the key
frame :
– Check score
– Check if already in map
• Epipolar search in a neighboring key frame
• Triangulate matches and add to map
• Repeat in four image pyramid levels
Optimize map
• Use batch SFM method: Bundle Adjustment
• Adjusts map point positions and key frame
poses
• Minimize reprojection error of all points in all
keyframes (or use only last N key frames)
Map maintenance
• When camera is not exploring, mapping thread
has idle time
• Data association in bundle adjustment is
reversible
• Re-attempt outlier measurements
• Try measure new map features in all old key
frames
Comparison to EKF-SLAM • More Accurate
• More robust
• Faster tracking
<
SLAM based AR Proposed AR
System and Results • Environment
– Desktop PC (Intel Core 2 Duo 2.66 GHz)
– OS : Linux
– Language : C++
• Tracking speed
Total 19.2 ms
Key frame preparation 2.2 ms
Feature Projection 3.5 ms
Patch search 9.8 ms
Iterative pose update 3.7 ms
System and Results • Mapping scalability and speed
– Practical limit
• 150 key frames
• 6000 points
– Bundle adjustment timing
Key frames 2-49 50-99 100-149
Local Bundle Adjustment 170 ms 270 ms 440 ms
Global Bundle Adjustment 380 ms 1.7 s 6.9 s
Demonstration
Remaining problem
• Outlier management
• Still brittle in some scenario
– Repeated texture
– Passive stereo initialization
• Occlusion problem
• Relocation problem