Summary - Forsiden - Universitetet i Oslo€¦ · – Pose in 2D and 3D ... – Robust estimation...
Transcript of Summary - Forsiden - Universitetet i Oslo€¦ · – Pose in 2D and 3D ... – Robust estimation...
Summary
IMAGE FORMATION, PROCESSING AND FEATURES • Image formation
– Light, cameras, optics and color – Pose in 2D and 3D – Basic projective geometry – The perspective camera model
• Image processing – Image filtering – Image pyramids – Laplace blending
• Feature detection – Line features – Local keypoint features – Robust estimation with RANSAC
• Feature matching – From keypoints to feature correspondences – Feature descriptors – Feature matching – Estimating homographies from feature
correspondences
WORLD GEOMETRY AND 3D • Single-view geometry
– The camera matrix P – Pose from known 3D points – Camera calibration
• Stereo imaging – Basic epipolar geometry – Stereo imaging – Stereo processing
• Two-view geometry – Epipolar geometry – Triangulation – Pose from epipolar geometry
• Multiple-view geometry – Multiple-view geometry – Structure from motion – Multiple-view stereo
SCENE ANALYSIS • Image analysis
– Image segmentation – Image feature extraction – Introduction to machine learning
• Object detection – Descriptor-based detection – Face recognition – Introduction to deep learning
• Computer Vision Applications – Visual words and computer vision
applications
Forelesninger 2017
IMAGE FORMATION, PROCESSING AND FEATURES • Image formation
– Light, cameras, optics and color – Pose in 2D and 3D – Basic projective geometry – The perspective camera model
• Image processing – Image filtering – Image pyramids – Laplace blending
• Feature detection – Line features – Local keypoint features – Robust estimation with RANSAC
• Feature matching – From keypoints to feature correspondences – Feature descriptors – Feature matching – Estimating homographies from feature
correspondences
WORLD GEOMETRY AND 3D • Single-view geometry
– The camera matrix P – Pose from known 3D points – Camera calibration
• Stereo imaging – Basic epipolar geometry – Stereo imaging – Stereo processing
• Two-view geometry – Epipolar geometry – Triangulation – Pose from epipolar geometry
• Multiple-view geometry – Multiple-view geometry – Structure from motion – Multiple-view stereo
SCENE ANALYSIS • Image analysis
– Image segmentation – Image feature extraction – Introduction to machine learning
• Object detection – Descriptor-based detection – Face recognition – Introduction to deep learning
• Computer Vision Applications – Visual words and computer vision
applications
Forelesninger 2017
Image formation Light, cameras, optics and color
4
Image formation: • Illumination • Cameras • Optics • Image Capture • Colour Sensing.
Image capture
5
Light source
CMOS image sensor (CMOSIS 48Mp)
Scene Imaging system (camera)
Shutter
Shutter: Mechanical / electronic Global / rolling
Detector
Image formation Pose in 2D and 3D
6
• Pose – General properties
• Representing pose in 2D
– Homogeneous transformations
• Representing pose in 3D – Homogeneous transformations – Other alternatives
• Representing rotation in 3D
– Rotation matrix – Euler angles – Angle-axis – Unit quaternion
Pose – Representation in 2D
• The pose of 𝐵𝐵 relative to 𝐴𝐴 can be represented by a homogeneous transformation 𝑇𝑇𝐵𝐵𝐴𝐴 ∈ 𝑆𝑆𝑆𝑆 2
• Properties
• Points are represented in homogeneous coordinates
7
1
A A B A A BB B
A A B A A BC B C C B C
A AB B
TT T T
T
ξξ ξ ξ
ξ −
= ⋅ == ⊕ =
p p p p
11 1
A A BB
A BA A
A BB B
T
x xR
y y
=
=
p p
t0
cos sinsin cos
10 0 1
ABxA A
A A AB BB B By
tR
T tθ θ
ξ θ θ −
= =
t0
𝑥𝑥𝐴𝐴
𝑦𝑦𝐴𝐴
𝐴𝐴
𝐵𝐵
𝑥𝑥𝐵𝐵
𝑦𝑦𝐵𝐵
𝜉𝜉𝐵𝐵𝐴𝐴
𝒑𝒑𝐴𝐴
𝒑𝒑𝐵𝐵 𝑃𝑃
Pose – Representation in 3D
• The pose of 𝐵𝐵 relative to 𝐴𝐴 can be represented by a homogeneous transformation 𝑇𝑇𝐵𝐵𝐴𝐴 ∈ 𝑆𝑆𝑆𝑆 3
• Properties
• Points are represented in homogeneous coordinates
8
11 12 13
21 22 23
31 32 3310 0 0 1
ABx
A A AA A B B By
B B ABz
r r r tR r r r t
Tr r r t
ξ
= =
t0
1
A A B A A BB B
A A B A A BC B C C B C
A AB B
TT T T
T
ξξ ξ ξ
ξ −
= ⋅ == ⊕ =
p p p p
11 1
A A BB
A B
A A A BB B
A B
T
x xy R yz z
=
=
p p
t0
𝑥𝑥𝐴𝐴
𝑦𝑦𝐴𝐴
𝐴𝐴
𝐵𝐵
𝑥𝑥𝐵𝐵
𝑦𝑦𝐵𝐵
𝜉𝜉𝐵𝐵𝐴𝐴 𝑧𝑧𝐵𝐵
𝑧𝑧𝐴𝐴
𝒑𝒑𝐴𝐴
𝒑𝒑𝐵𝐵 𝑃𝑃
Image formation Basic projective geometry
9
• The projective plane 2 – Homogeneous coordinates – Line at infinity – Points & lines are dual
• The projective space 3
– Homogeneous coordinates – Plane at infinity – Points & planes are dual
• Linear transformations of 2 and 3
– Represented by homogeneous matrices – Homographies ⊃ Affine ⊃ Similarities ⊃
Euclidean ⊃ Translations
Transformations of the projective plane
10
Transformation of 2 Matrix #DoF Preserves Visualization
Translation
𝐼𝐼 𝒕𝒕𝟎𝟎𝑻𝑻 1
2 Orientation + all below
Euclidean
𝑅𝑅 𝒕𝒕𝟎𝟎𝑻𝑻 1
3 Lengths + all below
Similarity
𝑠𝑠𝑅𝑅 𝒕𝒕𝟎𝟎𝑻𝑻 1
4 Angles + all below
Affine 𝑎𝑎11 𝑎𝑎12 𝑎𝑎13𝑎𝑎21 𝑎𝑎22 𝑎𝑎230 0 1
6 Parallelism,
line at infinity + all below
Homography /projective
ℎ11 ℎ12 ℎ13ℎ21 ℎ22 ℎ23ℎ31 ℎ32 ℎ33
8 Straight lines
Image formation The perspective camera model
11
• The perspective camera model – 𝑃𝑃 = 𝐾𝐾 𝑅𝑅, 𝒕𝒕 – The camera matrix – Intrinsic: 𝐾𝐾 – The camera calibration matrix – Extrinsic: 𝑅𝑅, 𝒕𝒕
• Lens distortion
– Radial distortion – Tangential distortion (often ignored)
The perspective camera model
12
𝑌𝑌 𝑋𝑋
𝑍𝑍
𝑊𝑊
𝐶𝐶
𝐼𝐼 𝑿𝑿
𝑥𝑥
𝑦𝑦
𝑧𝑧
𝑢𝑢
𝑣𝑣
𝒖𝒖
𝑧𝑧 = 1
• The perspective camera model describes the correspondence between observed points in the world and points in the captured image
𝒖𝒖� = 𝑃𝑃 𝑿𝑿�𝑊𝑊 𝒖𝒖� = 𝐾𝐾 𝑅𝑅 𝒕𝒕 𝑿𝑿�𝑊𝑊
The perspective camera model
13
𝑌𝑌 𝑋𝑋
𝑍𝑍
𝑊𝑊
𝐶𝐶
𝑿𝑿
𝑥𝑥
𝑦𝑦
𝑧𝑧
𝒙𝒙
𝑧𝑧 = 1
𝐼𝐼 𝑢𝑢
𝑣𝑣
𝒖𝒖
• The perspective camera model describes the correspondence between observed points in the world and points in the captured image
𝒖𝒖� = 𝑃𝑃 𝑿𝑿�𝑊𝑊 𝒖𝒖� = 𝐾𝐾 𝑅𝑅 𝒕𝒕 𝑿𝑿�𝑊𝑊
Intrinsic Extrinsic
𝑅𝑅 𝒕𝒕 𝐾𝐾
The perspective camera model
14
• The perspective camera model describes the correspondence between observed points in the world and points in the captured image
𝒖𝒖� =𝑓𝑓𝑢𝑢 𝑠𝑠 𝑐𝑐𝑢𝑢0 𝑓𝑓𝑣𝑣 𝑐𝑐𝑣𝑣0 0 1
1 00 10 0
0 00 01 0
𝑅𝑅 𝒕𝒕𝟎𝟎 1 𝑿𝑿�𝑊𝑊
Intrinsic Extrinsic
where 𝑅𝑅 𝒕𝒕𝟎𝟎 1 = 𝜉𝜉𝐶𝐶
−1𝑊𝑊 = 𝜉𝜉𝑊𝑊𝐶𝐶
𝑌𝑌 𝑋𝑋
𝑍𝑍
𝑊𝑊
𝐶𝐶
𝑿𝑿
𝑥𝑥
𝑦𝑦
𝑧𝑧
𝒙𝒙
𝑧𝑧 = 1
𝐼𝐼 𝑢𝑢
𝑣𝑣
𝒖𝒖 𝑅𝑅 𝒕𝒕 𝐾𝐾
Practical use
• The geometry of the perspective camera is simple since we assume the pinhole to be infinitely small
• In reality the light passes through a lens that complicates the camera intrinsics
• If we want to use images for geometrical computations based on the perspective camera model, we MUST compensate for distortion
– Radial distortion • Barrel • Pincushion
– Tangential distortion
15
Barrel
Pincushion
IMAGE FORMATION, PROCESSING AND FEATURES • Image formation
– Light, cameras, optics and color – Pose in 2D and 3D – Basic projective geometry – The perspective camera model
• Image processing – Image filtering – Image pyramids – Laplace blending
• Feature detection – Line features – Local keypoint features – Robust estimation with RANSAC
• Feature matching – From keypoints to feature correspondences – Feature descriptors – Feature matching – Estimating homographies from feature
correspondences
WORLD GEOMETRY AND 3D • Single-view geometry
– The camera matrix P – Pose from known 3D points – Camera calibration
• Stereo imaging – Basic epipolar geometry – Stereo imaging – Stereo processing
• Two-view geometry – Epipolar geometry – Triangulation – Pose from epipolar geometry
• Multiple-view geometry – Multiple-view geometry – Structure from motion – Multiple-view stereo
SCENE ANALYSIS • Image analysis
– Image segmentation – Image feature extraction – Introduction to machine learning
• Object detection – Descriptor-based detection – Face recognition – Introduction to deep learning
• Computer Vision Applications – Visual words and computer vision
applications
Forelesninger 2017
Image processing Image filtering
17
Image Processing • Point operators • Image filtering in spatial domain
– Linear filters – Non-linear filters
• Image filtering in frequency domain
– Fourier transforms – Gaussian (low pass) filtering
Linear filtering
18
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 0 0 0 0 0
0 0 0 90 90 90 0 0 0 0
0 0 90 90 90 90 0 0 0 0
0 0 90 90 90 90 90 0 0 0
0 90 90 90 90 90 90 90 0 0
0 0 90 90 90 90 90 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 10 20 20 10 0 0 0 0
0 0 20
1 1 1 1 1 1 1 1 1
Image processing Image pyramids
19
Image Pyramids: • Downsampling • Upsampling • Gaussian Pyramids
Filtering in frequency domain
Fourier (1807): Any univariate function can be rewritten
as a weighted sum of sines and cosines of different frequencies (true with some subtle restrictions).
This leads to: • Fourier Series • Fourier Transform (continuous and
discrete) • Fast Fourier Transform (FFT)
Jean Baptiste Joseph Fourier (1768-1830)
20
Image processing Laplace blending
21
Laplacian Pyramids: • Laplacian filter • Laplacian pyramid • Image blending
Laplacian pyramid
22
- -
-
Laplacian pyramid
23
Image blending with Laplacian pyramids
Weighted sum for each level of the pyramid
X X
L Laplacian
blend
L1 Laplacian of img 1
G Gaussian of mask
L2 Laplacian of img 2
1-G Flipped mask
24
IMAGE FORMATION, PROCESSING AND FEATURES • Image formation
– Light, cameras, optics and color – Pose in 2D and 3D – Basic projective geometry – The perspective camera model
• Image processing – Image filtering – Image pyramids – Laplace blending
• Feature detection – Line features – Local keypoint features – Robust estimation with RANSAC
• Feature matching – From keypoints to feature correspondences – Feature descriptors – Feature matching – Estimating homographies from feature
correspondences
WORLD GEOMETRY AND 3D • Single-view geometry
– The camera matrix P – Pose from known 3D points – Camera calibration
• Stereo imaging – Basic epipolar geometry – Stereo imaging – Stereo processing
• Two-view geometry – Epipolar geometry – Triangulation – Pose from epipolar geometry
• Multiple-view geometry – Multiple-view geometry – Structure from motion – Multiple-view stereo
SCENE ANALYSIS • Image analysis
– Image segmentation – Image feature extraction – Introduction to machine learning
• Object detection – Descriptor-based detection – Face recognition – Introduction to deep learning
• Computer Vision Applications – Visual words and computer vision
applications
Forelesninger 2017
Feature detection Line features
26
Line features: • Edge detectors • Line detection with the Hough transform
Thinning and thresholding
• Detection of local maxima (i.e. suppression of non-maxima)
• Thresholding
Binary image with isolated edges (single pixels at discrete locations along edge contours)
Edge image (Canny) Edge enhanced image (Sobel) 2
7
Line detection - Hough transform
28
The set of all lines going through a given point corresponds to a sinusoidal curve in the plane. Two or more points on a straight line will give rise to sinusoids intersecting at the point for that line.
The Hough transform can be generalized to other shapes.
Example
29
Original Edge image (Canny)
Example (2)
30
Detected lines
Feature detection Local keypoint features
31
• Corner detectors – Stable in space – Min eigenvalue, Harris
• Blob detectors
– Stable in scale and space – LoG, DoG
Characteristics of good features
• Repeatability • Distinctiveness
• Efficiency • Locality
32
Local measure of feature distinctiveness
• Consider a small window of pixels around a feature • How does the window change when you shift it?
33
“Flat” region: No change in all directions
“Edge”: No change along edge
“Corner”: Change in all directions
Corner detection summary
• Compute the gradient at each point in the image using derivatives of Gaussians • Create the second moment matrix M from the entries in the gradient • Compute the eigenvalues • Find points with large response (λmin > threshold) • Choose those points where λmin is a local maximum as features
34
Feature detection Robust estimation with RANSAC
35
• RANSAC – A robust iterative method for estimating the parameters of a mathematical model
from a set of observed data containing outliers – Separates the observed data into “inliers” and “outliers” which is very useful if we
want to use better, but less robust, estimation methods
RANSAC Objective Robustly fit a model 𝒚𝒚 = 𝑓𝑓 𝒙𝒙;𝜶𝜶 to a data set 𝑆𝑆 = 𝒙𝒙𝑖𝑖
36
Algorithm 1. Determine a test model 𝒚𝒚 = 𝑓𝑓 𝒙𝒙;𝜶𝜶𝑡𝑡𝑡𝑡𝑡𝑡 from 𝑛𝑛 random data points 𝒙𝒙1,𝒙𝒙2, … ,𝒙𝒙𝑛𝑛
2. Check how well each individual data point in 𝑆𝑆 fits with the test model
– Data points within a distance 𝑡𝑡 of the model constitute a set of inliers 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡 ⊆ 𝑆𝑆 – Data points outside a distance 𝑡𝑡 of the model are outliers
3. If 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡 is the largest set of inliers encountered so far, we keep this model – Set 𝜶𝜶 = 𝜶𝜶𝑡𝑡𝑡𝑡𝑡𝑡 and 𝑆𝑆𝐼𝐼𝐼𝐼 = 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡
4. Repeat steps 1-3 until 𝑁𝑁 models have been tested
RANSAC
Comments • Number of iterations required to achieve
confidence 𝑝𝑝 when testing random models from 𝑛𝑛-tuples of data elements from a dataset with inlier probability 𝜔𝜔
𝑁𝑁 = 𝑙𝑙𝑙𝑙𝑙𝑙 1−𝑝𝑝𝑙𝑙𝑙𝑙𝑙𝑙 1−𝜔𝜔𝑛𝑛
• Typical desired level of confidence
𝑝𝑝 = 0.99
• Inlier probability 𝜔𝜔 is typically unknown, but can be estimated per iteration
𝜔𝜔 =#𝑚𝑚𝑎𝑎𝑥𝑥 𝑒𝑒𝑠𝑠𝑡𝑡𝑒𝑒𝑚𝑚𝑎𝑎𝑡𝑡𝑒𝑒𝑒𝑒 𝑒𝑒𝑛𝑛𝑖𝑖𝑒𝑒𝑒𝑒𝑖𝑖𝑠𝑠
#𝑒𝑒𝑎𝑎𝑡𝑡𝑎𝑎 𝑒𝑒𝑖𝑖𝑒𝑒𝑚𝑚𝑒𝑒𝑛𝑛𝑡𝑡𝑠𝑠
• Instead of operating with a fixed and larger than necessary 𝑁𝑁 we can update 𝑁𝑁 for each iteration
Adaptive RANSAC Objective Robustly fit a model 𝒚𝒚 = 𝑓𝑓 𝒙𝒙;𝜶𝜶 to a data set 𝑆𝑆 = 𝒙𝒙𝑖𝑖
38
Algorithm 1. Let 𝑁𝑁 = ∞, 𝑆𝑆𝐼𝐼𝐼𝐼 = ∅
2. As long as #iterations < 𝑁𝑁 repeat steps 3-5
3. Determine a test model 𝒚𝒚 = 𝑓𝑓 𝒙𝒙;𝜶𝜶𝑡𝑡𝑡𝑡𝑡𝑡 from 𝑛𝑛 random data points 𝒙𝒙1,𝒙𝒙2, … ,𝒙𝒙𝑛𝑛
4. Determine how well the data points in 𝑆𝑆 fits with the test model
– Data points within a distance 𝑡𝑡 of the model constitute a set of inliers 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡 ⊆ 𝑆𝑆
5. If 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡 > 𝑆𝑆𝐼𝐼𝐼𝐼 , update the model parameters 𝜶𝜶, the inlier set 𝑆𝑆𝐼𝐼𝐼𝐼 and the max number of iterations 𝑁𝑁 – Set 𝜶𝜶 = 𝜶𝜶𝑡𝑡𝑡𝑡𝑡𝑡 and 𝑆𝑆𝐼𝐼𝐼𝐼 = 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡
– Compute 𝑁𝑁 = 𝑙𝑙𝑙𝑙𝑙𝑙 1−𝑝𝑝𝑙𝑙𝑙𝑙𝑙𝑙 1−𝜔𝜔𝑛𝑛 using that 𝜔𝜔 = 𝑆𝑆𝐼𝐼𝐼𝐼
𝑆𝑆 and 𝑝𝑝 = 0.99
IMAGE FORMATION, PROCESSING AND FEATURES • Image formation
– Light, cameras, optics and color – Pose in 2D and 3D – Basic projective geometry – The perspective camera model
• Image processing – Image filtering – Image pyramids – Laplace blending
• Feature detection – Line features – Local keypoint features – Robust estimation with RANSAC
• Feature matching – From keypoints to feature correspondences – Feature descriptors – Feature matching – Estimating homographies from feature
correspondences
WORLD GEOMETRY AND 3D • Single-view geometry
– The camera matrix P – Pose from known 3D points – Camera calibration
• Stereo imaging – Basic epipolar geometry – Stereo imaging – Stereo processing
• Two-view geometry – Epipolar geometry – Triangulation – Pose from epipolar geometry
• Multiple-view geometry – Multiple-view geometry – Structure from motion – Multiple-view stereo
SCENE ANALYSIS • Image analysis
– Image segmentation – Image feature extraction – Introduction to machine learning
• Object detection – Descriptor-based detection – Face recognition – Introduction to deep learning
• Computer Vision Applications – Visual words and computer vision
applications
Forelesninger 2017
Feature matching Feature descriptors and matching
• Matching keypoints – Comparing local patches in canonical scale and orientation
• Feature descriptors
– Robust, distinctive and efficient
• Descriptor types – HoG descriptors – Binary descriptors
• Putative matching
– Closest match, distance ratio, cross check
40
Feature matching From keypoints to feature correspondences
41
1. Detect a set of distinct feature points
2. Define a patch around each point
3. Extract and normalize the patch
4. Compute a local descriptor
5. Match local descriptors
Patch at detected position, scale, orientation
42
SIFT descriptor
• Extract a 16x16 patch around detected keypoint • Compute the gradients and apply a Gaussian weighting function • Divide the window into a 4x4 grid of cells • Compute gradient direction histograms over 8 directions in each cell • Concatenate the histograms to obtain a 128 dimensional feature vector • Normalize to unit length
43
Feature matching Estimating homographies from feature correspondences
44
• Homography 𝐻𝐻𝒖𝒖� = 𝒖𝒖�′
𝐻𝐻 =ℎ1 ℎ2 ℎ3ℎ4 ℎ5 ℎ6ℎ7 ℎ8 ℎ9
• Automatic point-correspondences
• Wrong correspondences are common
• RANSAC estimation
– Basic DLT (Direct Linear Transform) on 4 random correspondences
– Inliers determined from the reprojection error 𝜖𝜖𝑖𝑖 = 𝑒𝑒 𝐻𝐻𝒖𝒖𝑖𝑖 ,𝒖𝒖′𝑖𝑖 + 𝑒𝑒 𝒖𝒖𝑖𝑖 ,𝐻𝐻−1𝒖𝒖′𝑖𝑖
• Improve estimate by normalized DLT on inliers or iterative methods for an even better estimate
Homographies induced by central projection
45
• Homography 𝐻𝐻𝒖𝒖� = 𝒖𝒖�′
𝐻𝐻 =ℎ1 ℎ2 ℎ3ℎ4 ℎ5 ℎ6ℎ7 ℎ8 ℎ9
• Point-correspondences can be determined automatically • Erroneous correspondences are common • Robust estimation is required to find 𝐻𝐻
Direct Linear Transform (DLT)
46
• Solve the equation 𝐻𝐻𝒖𝒖� = 𝒖𝒖�′ for the entries of the homography matrix
1 2 3
4 5 6
7 8 9
1 2 3
4 5 6
7 8 9
1
2
9
1 1
1
0 0 0 1 01 0 0 0 0
0 0 0 0
h h h u uh h h v vh h h
uh vh h uuh vh h vuh vh h
hu v uv vv v
hu v uu vu u
uv vv v uu vu uh
′ ′=
′+ + = ′⇔ + + = + + =
′ ′ ′− − −
′ ′ ′⇔ − − − = ′ ′ ′ ′ ′ ′ − − −
A
⇔ =h 0
47
1 2 3
4 5 6
7 8 9 1
uh vh h uuh vh h vuh vh h
′+ + = ′+ + = + + =
( )( )( ) ( )
1 2 3
4 5 6
1 2 3 4 5 6
1 2 3 4 5 6 0
uh vh h v u vuh vh h u u v
uh vh h v uh vh h uuv h vv h v h uu h vu h u h
′ ′ ′+ + ⋅ = ⋅ ⇒ ′ ′ ′+ + ⋅ = ⋅ ′ ′⇒ + + = + +
′ ′ ′ ′ ′ ′⇒ + + − − − =
( )1 2 3
7 8 9
1 2 3 7 8 9
1 2 3 7 8 9
1
0
uh vh h uuh vh h u u
uh vh h uu h vu h u huh vh h uu h vu h u h
′+ + = ⇒ ′ ′+ + ⋅ = ⋅
′ ′ ′⇒ + + = + +′ ′ ′⇒ + + − − − =
( )4 5 6
7 8 9
4 5 6 7 8 9
4 5 6 7 8 9
1
0
uh vh h vuh vh h v v
uh vh h uv h vv h v huh vh h uv h vv h v h
′+ + = ⇒ ′ ′+ + ⋅ = ⋅
′ ′ ′⇒ + + = + +′ ′ ′⇒ + + − − − =
1
2
9
0 0 0 1 01 0 0 0 0
0 0 0 0
hu v uv vv v
hu v uu vu u
uv vv v uu vu uh
′ ′ ′− − −
′ ′ ′− − − = ′ ′ ′ ′ ′ ′ − − −
IMAGE FORMATION, PROCESSING AND FEATURES • Image formation
– Light, cameras, optics and color – Pose in 2D and 3D – Basic projective geometry – The perspective camera model
• Image processing – Image filtering – Image pyramids – Laplace blending
• Feature detection – Line features – Local keypoint features – Robust estimation with RANSAC
• Feature matching – From keypoints to feature correspondences – Feature descriptors – Feature matching – Estimating homographies from feature
correspondences
WORLD GEOMETRY AND 3D • Single-view geometry
– The camera matrix P – Pose from known 3D points – Camera calibration
• Stereo imaging – Basic epipolar geometry – Stereo imaging – Stereo processing
• Two-view geometry – Epipolar geometry – Triangulation – Pose from epipolar geometry
• Multiple-view geometry – Multiple-view geometry – Structure from motion – Multiple-view stereo
SCENE ANALYSIS • Image analysis
– Image segmentation – Image feature extraction – Introduction to machine learning
• Object detection – Descriptor-based detection – Face recognition – Introduction to deep learning
• Computer Vision Applications – Visual words and computer vision
applications
Forelesninger 2017
Single-view geometry The camera matrix P
49
• The camera matrix P – Contains useful geometrical
information – Vanishing points – Vanishing lines – Can be estimated from 3D-2D
correspondences
Projective cameras
50
Pinhole camera
𝑃𝑃 =𝑓𝑓 0 00 𝑓𝑓 00 0 1
𝑅𝑅 𝒕𝒕
Finite projective camera
𝑃𝑃 =𝑓𝑓𝑢𝑢 𝑠𝑠 𝑐𝑐𝑢𝑢0 𝑓𝑓𝑣𝑣 𝑐𝑐𝑣𝑣0 0 1
𝑅𝑅 𝒕𝒕
General projective camera
𝑃𝑃 =𝑝𝑝11 𝑝𝑝12𝑝𝑝21 𝑝𝑝22𝑝𝑝31 𝑝𝑝32
𝑝𝑝13 𝑝𝑝14𝑝𝑝23 𝑝𝑝24𝑝𝑝33 𝑝𝑝34
with rank = 3 and 11 dof
𝑌𝑌 𝑋𝑋
𝑍𝑍
𝑊𝑊
𝐶𝐶
𝐼𝐼 𝑿𝑿
𝑥𝑥
𝑦𝑦
𝑧𝑧
𝑢𝑢
𝑣𝑣
𝒖𝒖
𝑧𝑧 = 1
Projective cameras
51
• The camera matrix 𝑃𝑃 can be estimated directly from 3D-2D correspondences 𝒖𝒖𝑖𝑖 ↔ 𝑿𝑿𝑖𝑖
• Similar to homography estimation: At least 6 correspondences, Ransac, DLT, normalized DLT
𝑌𝑌 𝑋𝑋
𝑍𝑍
𝑊𝑊
𝐶𝐶
𝐼𝐼 𝑿𝑿
𝑥𝑥
𝑦𝑦
𝑧𝑧
𝑢𝑢
𝑣𝑣
𝒖𝒖
𝑧𝑧 = 1
11 12 13 14
21 22 23 24
31 32 33 3411
i i
ii
ii
i
PX
u p p p pY
v p p p pZ
p p p p
=
=
u X
Single-view geometry Pose from known 3D points
• Decomposition of P matrix • PnP
52
Pose estimation from known 3D points
Given at least 6 3D to 2D correspondences we can estimate a camera matrix 𝑃𝑃� that fits with the camera projection model where 𝑃𝑃 is given by
53
{ },i i
X u
i iP=
u X
[ | ]P K R= t
n-Point Pose Problem (PnP)
• Several different methods available – Typically fast non-iterative methods – Minimal in number of points – Accuracy comparable to iterative methods
• Examples
– P3P • Estimate pose, known K
– P4Pf • Estimate pose and focal length
– P6P • DLT!
– R6P • Estimate pose with rolling shutter
54
Single-view geometry Camera calibration
55
• Calibration – Zhangs method using planar 3D-points – Matlab app – DLT method: Est 𝑃𝑃, decompose into 𝐾𝐾 𝑅𝑅 𝒕𝒕
• Undistortion
– For geometrical computations we work on undistorted images/feature points
Zhang’s method in short
• Zhang’s method requires that the calibration object is planar
• Then the 3D-2D relationship is described by a homography
[ ] [ ]1 2 3 01 1 1
1
Xu X X
Yv K K Y H Y
= = =
1 2r r r t r r t
56
𝑋𝑋𝑊𝑊
𝑌𝑌𝑊𝑊
𝑍𝑍𝑊𝑊
𝑥𝑥𝐶𝐶 𝑦𝑦𝐶𝐶
𝑧𝑧𝐶𝐶 𝐶𝐶
𝑊𝑊
IMAGE FORMATION, PROCESSING AND FEATURES • Image formation
– Light, cameras, optics and color – Pose in 2D and 3D – Basic projective geometry – The perspective camera model
• Image processing – Image filtering – Image pyramids – Laplace blending
• Feature detection – Line features – Local keypoint features – Robust estimation with RANSAC
• Feature matching – From keypoints to feature correspondences – Feature descriptors – Feature matching – Estimating homographies from feature
correspondences
WORLD GEOMETRY AND 3D • Single-view geometry
– The camera matrix P – Pose from known 3D points – Camera calibration
• Stereo imaging – Basic epipolar geometry – Stereo imaging – Stereo processing
• Two-view geometry – Epipolar geometry – Triangulation – Pose from epipolar geometry
• Multiple-view geometry – Multiple-view geometry – Structure from motion – Multiple-view stereo
SCENE ANALYSIS • Image analysis
– Image segmentation – Image feature extraction – Introduction to machine learning
• Object detection – Descriptor-based detection – Face recognition – Introduction to deep learning
• Computer Vision Applications – Visual words and computer vision
applications
Forelesninger 2017
Stereo imaging Basic epipolar geometry
58
• Epipolar geometry – Epipolar planes – Epipolar lines – Epipoles
Stereo imaging Stereo imaging
• Stereo imaging – Horizontal epipolar lines – Disparity – 3D from disparity – Stereo rectification
59
Stereo geometry
• Parallel identical cameras – Translated along x-axis
60
𝑥𝑥𝐿𝐿 𝑦𝑦𝐿𝐿
𝑧𝑧𝐿𝐿
𝑏𝑏𝑥𝑥
𝑷𝑷𝐿𝐿 = 𝑋𝑋,𝑌𝑌,𝑍𝑍
Stereo geometry
• Parallel identical cameras – Translated along x-axis
• Horizontal epipolar lines
– Corresponding points lie along the same row in the two images
61
𝑥𝑥𝐿𝐿 𝑦𝑦𝐿𝐿
𝑧𝑧𝐿𝐿
𝑏𝑏𝑥𝑥
𝑷𝑷𝐿𝐿 = 𝑋𝑋,𝑌𝑌,𝑍𝑍
Stereo geometry
• Parallel identical cameras – Translated along x-axis
• Horizontal epipolar lines
– Corresponding points lie along the same row in the two images
• Depth from disparity
62
𝑥𝑥𝐿𝐿 𝑦𝑦𝐿𝐿
𝑧𝑧𝐿𝐿
𝑏𝑏𝑥𝑥
𝑷𝑷𝐿𝐿 = 𝑋𝑋,𝑌𝑌,𝑍𝑍 𝑏𝑏𝑥𝑥
𝑍𝑍
𝑒𝑒
Stereo geometry
• Parallel identical cameras – Translated along x-axis
• Horizontal epipolar lines
– Corresponding points lie along the same row in the two images
• 3D from disparity
63
𝑥𝑥𝐿𝐿 𝑦𝑦𝐿𝐿
𝑧𝑧𝐿𝐿
𝑏𝑏𝑥𝑥
𝑷𝑷𝐿𝐿 = 𝑋𝑋,𝑌𝑌,𝑍𝑍 𝑏𝑏𝑥𝑥
𝑍𝑍
𝑒𝑒
xL
bX xd
= xL
bY yd
=
Stereo rectification
• Reproject image planes onto a common plane parallel to the line between the camera centers
• The epipolar lines are horizontal after this transformation
• Two homographies
• C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999.
64
𝑥𝑥𝐶𝐶1 𝑦𝑦𝐶𝐶1
𝑧𝑧𝐶𝐶1
𝑥𝑥𝐶𝐶2 𝑦𝑦𝐶𝐶2
𝑧𝑧𝐶𝐶2
Stereo imaging Stereo processing
• Stereo processing – Sparse vs dense matching – DSI – Typical failures – Removing failures vs smoothness
65
Stereo processing
• Sparse stereo – Extract keypoints – Match keypoints along the same row – Compute 3D from disparity
• Dense stereo – Try to match all pixels along rows – Compute disparity image by finding the best disparity for each pixel – Refine and clean disparity image – Compute dense 3D point cloud or surface from disparity
66
419
657
1038
642
610
566
526
574
534
672
571
667
563
476
562
620
616
638
548
637
584 560
475
617
564
584
1019
567
613
565 569
469
665
578
651
597
431
683
558
512
613
445
473
547
571
630
657
443
657
1046
537
546
668
554
708 693
655
465
623
559
593
511
1063
504
523
664
680
625
573 563
567
691
614
645
799
653
578
512
564
618
567
491
560
535
618
518
655
567
555
447
420 419
614
696
566
694
418
640
514
507
655
423
482
657
648
567
69
415
598
529
557
605
469
508
1041 651
470
721
450
561
554
525
774
654
510
423
599
661
418
587 650
491
530
570
628
622
663
578 546
558
651
555
610 656
488
657
434
595
685
1067
775
556
472
510
395
563
416
494
1040
504
632
601
569
625
476 488
683
530
550
432 418
614
403
419
572 547
515
635
415
640
658
551
696
513
547
540 567
682
487 517
529
1076
462
434
409
416
-300-200-1000100200300
Disparity (pixels)
-0.5
0
0.5
Sim
ilarit
y
Dense stereo matching
67
• For a patch in the left image – Compare with patches along
the same row in the right image – Select patch with highest score
• Repeat for all pixels in the left image
IMAGE FORMATION, PROCESSING AND FEATURES • Image formation
– Light, cameras, optics and color – Pose in 2D and 3D – Basic projective geometry – The perspective camera model
• Image processing – Image filtering – Image pyramids – Laplace blending
• Feature detection – Line features – Local keypoint features – Robust estimation with RANSAC
• Feature matching – From keypoints to feature correspondences – Feature descriptors – Feature matching – Estimating homographies from feature
correspondences
WORLD GEOMETRY AND 3D • Single-view geometry
– The camera matrix P – Pose from known 3D points – Camera calibration
• Stereo imaging – Basic epipolar geometry – Stereo imaging – Stereo processing
• Two-view geometry – Epipolar geometry – Triangulation – Pose from epipolar geometry
• Multiple-view geometry – Multiple-view geometry – Structure from motion – Multiple-view stereo
SCENE ANALYSIS • Image analysis
– Image segmentation – Image feature extraction – Introduction to machine learning
• Object detection – Descriptor-based detection – Face recognition – Introduction to deep learning
• Computer Vision Applications – Visual words and computer vision
applications
Forelesninger 2017
Two-view geometry Epipolar geometry
69
• Algebraic representation of epipolar geometry – The essential matrix – The fundamental matrix
• Estimating the epipolar geometry
– Estimate 𝐹𝐹: 7pt, 8pt, RANSAC – Estimate 𝑆𝑆: 5pt
𝒖𝒖 𝒖𝒖′
𝐶𝐶
𝒙𝒙 𝒙𝒙′
𝐶𝐶′
𝑿𝑿
𝐹𝐹
𝐾𝐾 𝐾𝐾′
𝑆𝑆
Representations of epipolar geometry • Observing the same points in two views puts a
strong geometrical constraint on the cameras • Algebraically this epipolar constraint is usually
represented by two related 3 × 3 matrices
• The fundamental matrix 𝐹𝐹 𝒖𝒖�′𝑇𝑇𝐹𝐹𝒖𝒖�
• The essential matrix 𝑆𝑆
𝒙𝒙�′𝑇𝑇𝑆𝑆𝒙𝒙�
• These are related through the two calibration matrices 𝐾𝐾 and 𝐾𝐾′
70
𝒖𝒖 𝒖𝒖′
𝐶𝐶
𝒙𝒙 𝒙𝒙′
𝐶𝐶′
𝑿𝑿
𝐹𝐹
𝐾𝐾 𝐾𝐾′
𝑆𝑆
The essential matrix E • Let 𝒙𝒙𝐶𝐶 ↔ 𝒙𝒙′𝐶𝐶𝐶 be corresponding points in the
normalized image planes and let the pose of 𝐶𝐶 relative to 𝐶𝐶′ be
𝜉𝜉𝐶𝐶𝐶𝐶𝐶 = 𝑅𝑅 𝒕𝒕𝟎𝟎 1
• In terms of vectors, the equation for the epipolar
plane can be written like 𝒙𝒙�′𝐶𝐶𝐶 × 𝒕𝒕 ∙ 𝑅𝑅 𝒙𝒙�𝐶𝐶 = 0
• Rewritten in terms of matrices this takes the form
𝒙𝒙�′𝑇𝑇𝐶𝐶𝐶 𝒕𝒕 ×𝑅𝑅 𝒙𝒙�𝐶𝐶 = 0
• This relationship defines the essential matrix 𝑆𝑆 = 𝒕𝒕 ×𝑅𝑅
71
𝒙𝒙�′𝑇𝑇𝑆𝑆𝒙𝒙� = 0
𝒙𝒙𝐶𝐶
𝒙𝒙′𝐶𝐶𝐶
𝐶𝐶
𝐶𝐶′
𝒕𝒕 𝒙𝒙�′𝐶𝐶𝐶
𝒙𝒙�𝐶𝐶
The fundamental matrix F
• The epipolar constraint on image points is naturally connected to the essential matrix by the calibration matrices 𝐾𝐾 and 𝐾𝐾′
• Combined with the epipolar constraint for normalized image points we get
• This defines the fundamental matrix 𝐹𝐹 = 𝐾𝐾′−𝑇𝑇𝑆𝑆𝐾𝐾−1
72
𝒖𝒖�′𝑇𝑇𝐹𝐹𝒖𝒖� = 0
𝒖𝒖 𝒖𝒖′
𝐶𝐶
𝒙𝒙 𝒙𝒙′
𝐶𝐶′
𝑿𝑿
𝐹𝐹
𝐾𝐾 𝐾𝐾′
𝑆𝑆 1
1
C C
C C C T T T
K KK K K
−
′ ′ ′− −
= ⇒ =
′ ′ ′ ′ ′ ′ ′ ′ ′= ⇒ = ⇒ =
x u x ux u x u x u
1
00
C T C
T T
EK EK
′
− −
′ =
′ ′ =
x xu u
Two-view geometry Triangulation
73
• Triangulation – Estimate a 3D point 𝑿𝑿𝑖𝑖 for a noisy 2D correspondence under the assumption that camera matrices 𝑃𝑃 and 𝑃𝑃′ are known
• Minimal 3D error – Choose 𝑿𝑿𝑖𝑖 to be the mid-point between back projected image points
• Minimal algebraic error – Combine the two perspective models to get a homogeneous system of linear equations, then determine 𝑿𝑿𝑖𝑖 by SVD
• Minimal reprojection error – Determine the epipolar plane (and points 𝒖𝒖�𝒊𝒊 and 𝒖𝒖�𝑖𝑖𝐶) that minimize the reprojection error by minimizing a 6th order polynomial 𝐶𝐶
𝐶𝐶′
𝑿𝑿𝑖𝑖
𝒙𝒙𝒊𝒊 𝒙𝒙𝑖𝑖𝐶
𝒙𝒙�𝒊𝒊
𝒙𝒙�𝑖𝑖𝐶
Two-view geometry Pose from epipolar geometry
74
• Pose from epipolar geometry
• Non-planar case – Estimate epipolar geometry – Estimate relative pose from 𝑆𝑆
• Planar case
– Estimate homography – Estimate relative pose from 𝐻𝐻
• Visual odometry
𝜉𝜉𝑘𝑘0 cam0 cam𝑘𝑘
cam𝑘𝑘−1 cam𝑘𝑘+1
𝜉𝜉𝑘𝑘+1𝑘𝑘
Pose from epipolar geometry
75
• Since we only can estimate 𝑆𝑆 up to scale, we can always rescale it so that the SVD of 𝑆𝑆 has the form where det 𝑈𝑈 = det 𝑉𝑉 = 1
• Then one can show that 𝑅𝑅 ∈ 𝑈𝑈𝑊𝑊𝑉𝑉𝑇𝑇 ,𝑈𝑈𝑊𝑊𝑇𝑇𝑉𝑉𝑇𝑇 𝒕𝒕 = ±𝜆𝜆𝒖𝒖3; 𝜆𝜆 ∈ ℝ\0
where
𝑊𝑊 =0 1 0−1 0 00 0 1
[ ]1
1 2 3 2
3
1 0 00 1 00 0 0
T
T T
T
E UDV = =
vu u u v
v
Visual odometry
• Based on what we now know it is possible to do visual odometry, i.e. estimating the motion of a single camera from captured images
• A visual odometry algorithm can look like this – How to compute 𝒕𝒕𝑘𝑘𝑘𝑘+1 from 𝒕𝒕𝑘𝑘−1𝑘𝑘 ? – Determine two scene points 𝑿𝑿𝑘𝑘−1,𝑘𝑘
𝑘𝑘 and 𝑿𝑿′𝑘𝑘−1,𝑘𝑘𝑘𝑘 by triangulation of two 2D-
correspondences 𝒙𝒙𝑘𝑘−1 ↔ 𝒙𝒙𝑘𝑘 and 𝒙𝒙′𝑘𝑘−1 ↔ 𝒙𝒙′𝑘𝑘 – Determine the same two scene points 𝑿𝑿𝑘𝑘,𝑘𝑘+1
𝑘𝑘 and 𝑿𝑿′𝑘𝑘,𝑘𝑘+1𝑘𝑘 by triangulation of two 2D-correspondences 𝒙𝒙𝑘𝑘 ↔ 𝒙𝒙𝑘𝑘+1 and 𝒙𝒙′𝑘𝑘 ↔ 𝒙𝒙′𝑘𝑘+1
– Then
Visual odometry from 2D-correspondences 1. Capture new frame 𝑒𝑒𝑚𝑚𝑖𝑖𝑘𝑘+1 2. Extract and match features between 𝑒𝑒𝑚𝑚𝑖𝑖𝑘𝑘+1
and 𝑒𝑒𝑚𝑚𝑖𝑖𝑘𝑘 3. Estimate the essential matrix 𝑆𝑆𝑘𝑘,𝑘𝑘+1 4. Decompose the 𝑆𝑆𝑘𝑘,𝑘𝑘+1 into 𝑅𝑅𝑘𝑘+1𝑘𝑘 and 𝒕𝒕𝑘𝑘+1𝑘𝑘 to
get the relative pose 𝜉𝜉𝑘𝑘+1𝑘𝑘 = 𝑅𝑅𝑘𝑘+1𝑘𝑘 𝒕𝒕𝑘𝑘+1𝑘𝑘
5. Compute 𝒕𝒕𝑘𝑘+1𝑘𝑘 from 𝒕𝒕𝑘𝑘𝑘𝑘−1 and rescale 𝒕𝒕𝑘𝑘+1𝑘𝑘 accordingly
6. Calculate the pose of camera 𝑘𝑘 + 1 relative to the first camera
𝜉𝜉𝑘𝑘+10 = 𝜉𝜉𝑘𝑘0 𝜉𝜉𝑘𝑘+1𝑘𝑘
76
11, 1,
1 , 1 , 1
k k kk k k k k
k k kk k k k k
−− −
+ + +
′−=
′−
t X X
t X X
𝜉𝜉𝑘𝑘0 cam0 cam𝑘𝑘
cam𝑘𝑘−1 cam𝑘𝑘+1
𝜉𝜉𝑘𝑘+1𝑘𝑘
IMAGE FORMATION, PROCESSING AND FEATURES • Image formation
– Light, cameras, optics and color – Pose in 2D and 3D – Basic projective geometry – The perspective camera model
• Image processing – Image filtering – Image pyramids – Laplace blending
• Feature detection – Line features – Local keypoint features – Robust estimation with RANSAC
• Feature matching – From keypoints to feature correspondences – Feature descriptors – Feature matching – Estimating homographies from feature
correspondences
WORLD GEOMETRY AND 3D • Single-view geometry
– The camera matrix P – Pose from known 3D points – Camera calibration
• Stereo imaging – Basic epipolar geometry – Stereo imaging – Stereo processing
• Two-view geometry – Epipolar geometry – Triangulation – Pose from epipolar geometry
• Multiple-view geometry – Multiple-view geometry – Structure from motion – Multiple-view stereo
SCENE ANALYSIS • Image analysis
– Image segmentation – Image feature extraction – Introduction to machine learning
• Object detection – Descriptor-based detection – Face recognition – Introduction to deep learning
• Computer Vision Applications – Visual words and computer vision
applications
Forelesninger 2017
Multiple-view geometry Multiple-view geometry
78
• Multiple-view geometry • Correspondences
– Two-view vs Three-view – Fundamental matrix vs Trifocal tensor
Correspondences
79
Three views • As we just saw, point transfer can be done
directly from the epipolar constraints 𝒖𝒖�3 = 𝐹𝐹31𝒖𝒖�1 × 𝐹𝐹32𝒖𝒖�2
• However, this fails for points in the plane
defined by the three camera centers – the trifocal plane – since the epipolar lines then will coincide
• The trifocal tensor allows point transfer also for points in the trifocal plane
img1
𝒖𝒖1 ⊗
img2
𝒖𝒖2 ×
img3
𝒖𝒖3 ⊠
𝐹𝐹31
𝐹𝐹32 𝐹𝐹12
𝐹𝐹31𝒖𝒖�1
𝐹𝐹32𝒖𝒖�2
𝒖𝒖�3 = 𝐹𝐹31𝒖𝒖�1 × 𝐹𝐹32𝒖𝒖�2
Example Point transfer based on epipolar constraints
80
𝒖𝒖1 𝒖𝒖2
𝒖𝒖3
𝐹𝐹31 𝐹𝐹32
𝐹𝐹12
𝒖𝒖�3 = 𝐹𝐹31𝒖𝒖�1 × 𝐹𝐹32𝒖𝒖�2
Uncertainty in feature points transfer to uncertainty in the epipolar lines Hence the reliability of the predicted point depends on the angle between the epipolar lines A large angle is good!
Multiple-view geometry Structure from motion
81
• Structure from motion – Sequential SfM – Bundle adjustment
𝜖𝜖 = ��𝑒𝑒 𝒖𝒖�𝑖𝑖𝑖𝑖 ,𝑃𝑃𝑖𝑖𝑿𝑿�𝑖𝑖2
𝑛𝑛
𝑖𝑖=1
𝑚𝑚
𝑖𝑖=1
Structure from Motion Problem Given 𝑚𝑚 images of 𝑛𝑛 fixed 3D points, estimate the 𝑚𝑚 projection matrices 𝑃𝑃𝑖𝑖 and the 𝑛𝑛 points 𝑿𝑿𝑖𝑖 from the 𝑚𝑚 ∙ 𝑛𝑛 correspondences 𝒖𝒖𝑖𝑖𝑖𝑖 ↔ 𝒖𝒖𝑘𝑘𝑖𝑖 • We can solve for structure and motion when
2𝑚𝑚𝑛𝑛 ≥ 11𝑚𝑚 + 3𝑛𝑛 − 15 • In the general/uncalibrated case, cameras and
points can only be recovered up to a projective ambiguity (𝒖𝒖�𝑖𝑖𝑖𝑖 = 𝑃𝑃𝑖𝑖𝑄𝑄−1𝑄𝑄𝑿𝑿�𝑖𝑖)
• In the calibrated case, they can be recovered up to a similarity (scale)
– Known as Euclidean/metric reconstruction
82
𝒖𝒖�𝑖𝑖𝑖𝑖 = 𝑃𝑃𝑖𝑖𝑿𝑿�𝑖𝑖
Multiple-view geometry Multiple-view stereo
83
• Multi-view stereo – Plane-sweep – Volumetric stereo – Surface expansion
• Surface reconstruction
• Multi-view stereo software
– Patch-based Multi-view Stereo (PVMS)
Plane sweep
• Sweep planes at different depths
84
Robert Collins, A Space-Sweep Approach to True Multi-Image Matching, CVPR 1996. D. Gallup, J.-M. Frahm, P. Mordohai, Q. Yang and M. Pollefeys, Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions, CVPR 2007
𝑲𝑲𝒌𝒌
Reference camera Camera k
𝑿𝑿
𝐻𝐻
𝑲𝑲𝒓𝒓𝒓𝒓𝒓𝒓
𝑃𝑃𝑟𝑟𝑟𝑟𝑟𝑟 𝑃𝑃𝑘𝑘
𝒖𝒖 𝒖𝒖′
𝒙𝒙 𝒙𝒙′
𝒖𝒖 = 𝑲𝑲𝑟𝑟𝑟𝑟𝑟𝑟 𝐼𝐼 | 𝟎𝟎 𝑿𝑿 𝒖𝒖′ = 𝑲𝑲𝑘𝑘 𝑅𝑅𝑘𝑘 | 𝒕𝒕𝑘𝑘 𝑿𝑿
𝒏𝒏𝑚𝑚−1
𝑒𝑒𝑚𝑚−1
Plane sweep
• Sweep planes at different depths
85
𝑿𝑿
𝐻𝐻
𝑲𝑲𝒓𝒓𝒓𝒓𝒓𝒓
𝑃𝑃𝑟𝑟𝑟𝑟𝑟𝑟 𝑃𝑃𝑘𝑘
𝒖𝒖 𝒖𝒖′
𝒙𝒙 𝒙𝒙′
𝒖𝒖 = 𝑲𝑲𝑟𝑟𝑟𝑟𝑟𝑟 𝐼𝐼 | 𝟎𝟎 𝑿𝑿 𝒖𝒖′ = 𝑲𝑲𝑘𝑘 𝑅𝑅𝑘𝑘 | 𝒕𝒕𝑘𝑘 𝑿𝑿
𝒏𝒏𝑚𝑚
𝑒𝑒𝑚𝑚
Robert Collins, A Space-Sweep Approach to True Multi-Image Matching, CVPR 1996. D. Gallup, J.-M. Frahm, P. Mordohai, Q. Yang and M. Pollefeys, Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions, CVPR 2007
𝑲𝑲𝒌𝒌
Reference camera Camera k
Plane sweep
• Sweep planes at different depths
86
𝐻𝐻
𝑲𝑲𝒓𝒓𝒓𝒓𝒓𝒓
𝑃𝑃𝑟𝑟𝑟𝑟𝑟𝑟 𝑃𝑃𝑘𝑘
𝒖𝒖 𝒖𝒖′
𝒙𝒙 𝒙𝒙′
𝒖𝒖 = 𝑲𝑲𝑟𝑟𝑟𝑟𝑟𝑟 𝐼𝐼 | 𝟎𝟎 𝑿𝑿 𝒖𝒖′ = 𝑲𝑲𝑘𝑘 𝑅𝑅𝑘𝑘 | 𝒕𝒕𝑘𝑘 𝑿𝑿
𝒏𝒏𝑚𝑚+1
𝑒𝑒𝑚𝑚+1
Robert Collins, A Space-Sweep Approach to True Multi-Image Matching, CVPR 1996. D. Gallup, J.-M. Frahm, P. Mordohai, Q. Yang and M. Pollefeys, Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions, CVPR 2007
𝑲𝑲𝒌𝒌
Reference camera Camera k
Plane sweep and ambiguities
• Multiple views can resolve ambiguities in difficult areas!
87
Plane sweep through oriented planes
• Fronto-parallel
• Other plane orientations
88
𝑿𝑿
𝐻𝐻
𝑲𝑲𝒓𝒓𝒓𝒓𝒓𝒓
𝑃𝑃𝑟𝑟𝑟𝑟𝑟𝑟 𝑃𝑃𝑘𝑘
𝒖𝒖 𝒖𝒖′
𝒙𝒙 𝒙𝒙′
𝒖𝒖 = 𝑲𝑲𝑟𝑟𝑟𝑟𝑟𝑟 𝐼𝐼 | 𝟎𝟎 𝑿𝑿 𝒖𝒖′ = 𝑲𝑲𝑘𝑘 𝑅𝑅𝑘𝑘 | 𝒕𝒕𝑘𝑘 𝑿𝑿
𝒏𝒏𝑚𝑚
𝑒𝑒𝑚𝑚
Robert Collins, A Space-Sweep Approach to True Multi-Image Matching, CVPR 1996. D. Gallup, J.-M. Frahm, P. Mordohai, Q. Yang and M. Pollefeys, Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions, CVPR 2007
𝑲𝑲𝒌𝒌
Reference camera Camera k
[ ]( , )
1m
m Tref m
dZ u vu v K −
−=
n
[ ]0 0 1 Tm = −n
( , )m mZ u v d=
Plane sweep with ground normal
89
𝑒𝑒𝑚𝑚 = 200 meter below reference camera
Red:
Green:
Blue:
𝑍𝑍𝑚𝑚 = 790 meter
Plane sweep with ground normal
90
𝑒𝑒𝑚𝑚 = 261 meter below reference camera
Red:
Green:
Blue:
𝑍𝑍𝑚𝑚 = 800 meter
Plane sweep with ground normal
91
𝑒𝑒𝑚𝑚 = 298 meter below reference camera
Red:
Green:
Blue:
𝑍𝑍𝑚𝑚 = 811 meter
𝑍𝑍𝑚𝑚 = 645 meter
Plane sweep with ground normal
92
𝑒𝑒𝑚𝑚 = 471 meter below reference camera
Red:
Green:
Blue:
𝑍𝑍𝑚𝑚 = 2169 meter
𝑍𝑍𝑚𝑚 = 1967 meter
IMAGE FORMATION, PROCESSING AND FEATURES • Image formation
– Light, cameras, optics and color – Pose in 2D and 3D – Basic projective geometry – The perspective camera model
• Image processing – Image filtering – Image pyramids – Laplace blending
• Feature detection – Line features – Local keypoint features – Robust estimation with RANSAC
• Feature matching – From keypoints to feature correspondences – Feature descriptors – Feature matching – Estimating homographies from feature
correspondences
WORLD GEOMETRY AND 3D • Single-view geometry
– The camera matrix P – Pose from known 3D points – Camera calibration
• Stereo imaging – Basic epipolar geometry – Stereo imaging – Stereo processing
• Two-view geometry – Epipolar geometry – Triangulation – Pose from epipolar geometry
• Multiple-view geometry – Multiple-view geometry – Structure from motion – Multiple-view stereo
SCENE ANALYSIS • Image analysis
– Image segmentation – Image feature extraction – Introduction to machine learning
• Object detection – Descriptor-based detection – Face recognition – Introduction to deep learning
• Computer Vision Applications – Visual words and computer vision
applications
Forelesninger 2017
Image analysis Image segmentation
94
Image Segmentation: • Thresholding techniques • Clustering methods for segmentation • Morphological operations
Segmentation methods
• Active contours (Snakes, Scissors, Level Sets) • Split and merge (Watershed, Divisive & agglomerative clustering, Graph-
based segmentation) • K-means (parametric clustering) • Mean shift (non-parametric clustering) • Normalized cuts • Graph cuts • Gray level thresholding • …
95
Image analysis Image feature extraction
96
Image feature extraction: • Feature extraction • Feature selection
Feature extraction
The goal is to generate features that exhibit high information-packing properties: • Extract the information from the raw data that is most
relevant for discrimination between the classes • Extract features with low within-class variability and high
between class variability • Discard redundant information. • The information in an image f[i,j] must be reduced to enable
reliable classification (generalization) • A 64x64 image 4096-dimensional feature space!
97
Image analysis Introduction to machine learning
98
Machine learning: • Pattern classification • Training of classifiers (supervised
learning) • Parametric and non-parametric
methods • Discriminant functions • Dimensionality reduction
IMAGE FORMATION, PROCESSING AND FEATURES • Image formation
– Light, cameras, optics and color – Pose in 2D and 3D – Basic projective geometry – The perspective camera model
• Image processing – Image filtering – Image pyramids – Laplace blending
• Feature detection – Line features – Local keypoint features – Robust estimation with RANSAC
• Feature matching – From keypoints to feature correspondences – Feature descriptors – Feature matching – Estimating homographies from feature
correspondences
WORLD GEOMETRY AND 3D • Single-view geometry
– The camera matrix P – Pose from known 3D points – Camera calibration
• Stereo imaging – Basic epipolar geometry – Stereo imaging – Stereo processing
• Two-view geometry – Epipolar geometry – Triangulation – Pose from epipolar geometry
• Multiple-view geometry – Multiple-view geometry – Structure from motion – Multiple-view stereo
SCENE ANALYSIS • Image analysis
– Image segmentation – Image feature extraction – Introduction to machine learning
• Object detection – Descriptor-based detection – Face recognition – Introduction to deep learning
• Computer Vision Applications – Visual words and computer vision
applications
Forelesninger 2017
Object detection Descriptor-based detection
100
Descriptor-based detection: • Feature Descriptors • Object Detection • Instance Recognition
Object detection Face recognition
101
Face recognition: • Face detection • Face recognition.
Object detection Introduction to deep learning
102
Topics covered: • Deep learning • Artificial neural networks • Convolutional neural networks
Software: • Caffe (http://caffe.berkeleyvision.org) • TensorFlow (https://www.tensorflow.org/) • MatConvNet (http://www.vlfeat.org/matconvnet)
Deep Learning for Object Recognition
«Ship»
Millions of images Millions of parameters Thousands of classes
(AlexNet)
103
Semantic Segmentation (meaningful regions)
Road
Building
Sky
Building
104
IMAGE FORMATION, PROCESSING AND FEATURES • Image formation
– Light, cameras, optics and color – Pose in 2D and 3D – Basic projective geometry – The perspective camera model
• Image processing – Image filtering – Image pyramids – Laplace blending
• Feature detection – Line features – Local keypoint features – Robust estimation with RANSAC
• Feature matching – From keypoints to feature correspondences – Feature descriptors – Feature matching – Estimating homographies from feature
correspondences
WORLD GEOMETRY AND 3D • Single-view geometry
– The camera matrix P – Pose from known 3D points – Camera calibration
• Stereo imaging – Basic epipolar geometry – Stereo imaging – Stereo processing
• Two-view geometry – Epipolar geometry – Triangulation – Pose from epipolar geometry
• Multiple-view geometry – Multiple-view geometry – Structure from motion – Multiple-view stereo
SCENE ANALYSIS • Image analysis
– Image segmentation – Image feature extraction – Introduction to machine learning
• Object detection – Descriptor-based detection – Face recognition – Introduction to deep learning
• Computer Vision Applications – Visual words and computer vision
applications
Forelesninger 2017
Computer Vision Applications Visual words and computer vision applications
• Vocabulary of visual words
• Computer Vision applications – Scene Reconstruction and Visualization From Community Photo Collections
https://www.microsoft.com/en-us/research/wp-content/uploads/2010/08/Snavely-PIEEE10.pdf
– ORB-SLAM: a Versatile and Accurate Monocular SLAM System http://webdiis.unizar.es/~raulmur/MurMontielTardosTRO15.pdf
– A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots http://rpg.ifi.uzh.ch/docs/RAL16_Giusti.pdf
– RI Seminar: Tim Barfoot : Long-Term Visual Route Following for Mobile Robots https://www.youtube.com/watch?v=W_zVn9y9wLs
106