EECS 274 Computer Vision Local Invariant Features.
-
Upload
sarah-clark -
Category
Documents
-
view
228 -
download
0
Transcript of EECS 274 Computer Vision Local Invariant Features.
EECS 274 Computer Vision
Local Invariant Features
:Local features
• Matching with Harris Detector• Scale-invariant Feature Detection• Scale Invariant Image Descriptors• Affine-invariant Feature Detection• Object Recognition• SIFT Features
• Reading: S Chapter 4
Examples
Features
What local features to use?
Aperture problem
Ambiguity of 1-dimensional motion perception
Stripes moved upward 6 pixels
Stripes moved left 5 pixels
Introduction
Local invariant photometric descriptors
( )local descriptor
Local : robust to occlusion/clutter + no segmentationPhotometric : distinctiveInvariant : to image transformations + illumination changes
History - Recognition
Color histogram [Swain 91]
b
g
r
Each pixel is describedby a color vector
Distribution of color vectorsis described by a histogram
not robust to occlusion, not invariant, not distinctive
History - Recognition
Eigenimages [Turk 91]• Each face vector is represented in the eigenimage space
– eigenvectors with the highest eigenvalues = eigenimages
• The new image is projected into the eigenimage space– determine the closest face
. .1v.
3v
2v.
not robust to occlusion, requires segmentation, not invariant, discriminant
History - Recognition
Geometric invariants [Rothwell 92]• Function with a value independent of the
transformation
• Invariant for image rotation : distance of two points
• Invariant for planar homography : cross-ratio local and invariant, not discriminant, requires sub-pixel extraction of primitives
tt yxTyxyxfyxf ),(),( where),(),(
History - Recognition
Problems : occlusion, clutter, image transformations, distinctiveness
Solution : recognition with local photometric invariants
[ Local greyvalue invariants for image retrieval, C. Schmid and R. Mohr, PAMI 1997 ]
Approach
1) Extraction of interest points (characteristic locations)
2) Computation of local descriptors
3) Determining correspondences
4) Selection of similar images
( )local descriptor
Matching with interest points
• Extraction of interest points with the Harris detector
• Comparison of points with cross-correlation
• Verification with the fundamental matrix
Moravec corner detector
• Developed for Stanford Cart in 1977
Moravec corner detectorChange of intensity for the shift [u,v]:
2,
( , ) ( , ) ( , ) ( , )x y
E u v w x y I x u y v I x y
IntensityShifted intensity
Window function
Four shifts: (u,v) = (1,0), (1,1), (0,1), (-1, 1)Look for local maxima in min(E)
Problems of Moravec detector• Noisy response due to a binary window function• Only a set of shifts at every 45 degree is
considered• Only minimum of E is taken into account
Harris corner detector (1988) solves these problems.
Harris detector
Based on the idea of auto-correlation
Important difference in all directions interest point
Interest points
Geometric featuresrepeatable under transformations
2D characteristics of the signalhigh informational content
Comparison of different detectors [Schmid98] Harris detector
Harris detectorAuto-correlation function for a point and a shift
Discrete shifts can be avoided with the auto-correlation matrix
),( yx ),(),( yxvu
x y
vyuxIyxIyxwvuE 2)),(),()(,(),(
v
uyxIyxIyxIvyuxI yx )),(),((),(),(with
2
2
,
2
,
,,
2
2
*
)),()(,(),(),(*),(
),(),(*),()),((*),(
),(),(),(),(
yyx
yxx
T
yxy
yxyx
yxyx
yxx
yyx
x
III
IIIwA
A
v
u
yxIyxwyxIyxIyxw
yxIyxIyxwyxIyxw
vu
v
uyxIyxIyxwvuE
uu
Comparison of detectors: Rotation
[Comparing and Evaluating Interest Points, Schmid, Mohr & Bauckhage, ICCV 98]
repeatability = #good matches/mean(#points)
Comparison of detectors: Perspective
[Comparing and Evaluating Interest Points, Schmid, Mohr & Bauckhage, ICCV 98]
repeatability = #good matches/mean(#points)
Harris corner detectorIntensity change in shifting window: eigenvalue analysis
1, 2 – eigenvalues of A
direction of the slowest change
direction of the fastest change
(max)-1/2
(min)-1/2
Ellipse E(u,v) = const
2
2
*
),()(
yyx
yxx
T
III
IIIwA
AvuEE uuu
uncertainty ellipse
Shi and Tomasi use min(1, 2)
to locate good features to track
Harris corner detector
1
2
Corner1 and 2 are large, 1 ~ 2;E increases in all directions
1 and 2 are small;E is almost constant in all directions
edge 1 >> 2
edge 2 >> 1
flat
Classification of image points using eigenvalues of M:
Harris detection• Auto-correlation matrix
– captures the structure of the local neighborhood
– measure based on eigenvalues of this matrix• 2 strong eigenvalues => interest point• 1 strong eigenvalue => contour• 0 eigenvalue => uniform region
• Interest point detection– threshold on the eigenvalues– local maximum for localization
Harris corner detectorMeasure of corner response:
(k – empirical constant, k = 0.04-0.06)
22121
2
)(
))(()det(
k
AtracekAR
Example
Example
Example
Good features
Using auto-correlation or Hessian matrix
Local descriptors
Descriptors characterize the local neighborhood of a point
local descriptor
( )
Local jet
Convolution of image I with Gaussian derivatives
)(*),(
)(*),(
)(*),(
)(*),(
)(),(
)(),(
),(
yy
xy
xx
y
x
GyxI
GyxI
GyxI
GyxI
GyxI
GyxI
yxv
)2
),(exp(
2
1),),((
),(),()(),(
2
2
2
tt yx
yxG
ydxdyyxxIyxGGyxI
N-Jet, local jet
nsor epsilon te ricantisymmet theis where
2
2
)(
ij
yyyyxyxyxxxx
yyxx
yyyyyxxyxxxx
yyxx
kjiijk
lkijklij
ljiijkkkjiij
llijkklkijklij
ijij
ii
jiji
ii
LLLLLL
LL
LLLLLLLL
LLLL
L
LLLL
LLLL
LLLLLLLL
LLLLLLLL
LL
L
LLL
LL
L
• Invariance to image rotation : differential invariants [Koen87]
Local descriptors
Robustness to illumination changes
In case of an affine transformation
2
2
2
2
2/1
2/3
)(
)(
)(
)
)(
)(
)(
)(
ii
kjiijk
ii
lkijklij
ii
kjiijkkkjiij
ii
llijkklkijklij
ii
jiij
ii
ii
ii
jiji
LL
LLLL
LL
LLLL
LL
LLLLLLLL
LL
LLLLLLLL
LL
LL
LL
L
LL
LLL
baII )()( 21 xx
or normalization of the image patch with mean and variance
Determining correspondences
Vector comparison using the Mahalanobis distance
=?
)()(),( 1 qpqpqp TMdist
( )( )
Selection of similar images
• In a large database – voting algorithm– additional constraints
• Rapid acces with an indexing mechanism
Voting algorithm
local characteristicsvector of
( )
Voting algorithm
} }
1 1 02 1 1
I is the corresponding model image1
2 1 1
Additional constraints
• Semi-local constraints– neighboring points should match– angles, length ratios should be similar
• Global constraints– robust estimation of the image transformation (homogaphy,
epipolar geometry)
11
2
3
2
3
Results
database with ~1000 images
Results
Harris detector
Interest points extracted with Harris (~ 500 points)
Cross-correlation matching
Initial matches (188 pairs)
Global constraints
Robust estimation of the fundamental matrix
99 inliers 89 outliers
Summary of Harris detector
• Very good results in the presence of occlusion and clutter– local information– discriminant greyvalue information – invariance to image rotation and illumination
• Not invariance to scale and affine changes
• Solution for more general view point changes– local invariant descriptors to scale and rotation– extraction of invariant points and regions
Scale Invariant Feature Detection• Consider two images of the same scene,
related by a scale change (i.e. zooming)• How are their scale space representations
related? Scale Space Theory (Lindeberg ’98):
Normalized derivatives have the same value atcorresponding relative scales
handat task for theparameter free a is
),,()',','('
'
)','(),(
),()','(
, s,derivative normalized
)1(
2
2/2/
tyxLstyxL
tst
yxIyxI
sysxyx
tt
m
yx
Harris detector + scale changes
Scale Adapted Harris Detector
Many corresponding points at which the scale factor correspondsto scale change between images
Scale invariant Harris points• Multi-scale extraction of Harris interest points
• Selection of points at characteristic scale in scale space
Laplacian
Chacteristic scale :
- maximum in scale space
- scale invariant
Scale invariant interest points
invariant points + associated regions [Mikolajczyk & Schmid’01] Harris-Laplacian Feature
multi-scale Harris points
selection of points
at the characteristic scale
with Laplacian
Harris detector – adaptation to scale
Evaluation of scale invariant detectors
repeatability – scale changes
SIFT: Overview
• 1999• Generates image features,
“keypoints”– invariant to image scaling and rotation– partially invariant to change in
illumination and 3D camera viewpoint– many can be extracted from typical
images– highly distinctive
Algorithm overview
1. Scale-space extrema detection– Uses difference-of-Gaussian function
2. Keypoint localization– Sub-pixel location and scale fit to a
model
3. Orientation assignment– 1 or more for each keypoint
4. Keypoint descriptor– Created from local image gradients
Scale space
• Definition: where
),(),,(),,( yxIyxGyxL 222 2/)(
22
1),,(
yxeyxG
Scale space
• Keypoints are detected using scale-space extrema in difference-of-Gaussian function D
• D definition:
• Efficient to compute
),,(),,(
),()),,(),,((),,(
yxLkyxL
yxIyxGkyxGyxD
Relationship of D to
• Close approximation to scale-normalized Laplacian of Gaussian,
• Diffusion equation:
• Approximate ∂G/∂σ:– giving,
• When D has scales differing by a constant
factor it already incorporates the σ2 scale normalization required for scale-invariance
• e.g.,
G22
G22
GG 2
k
yxGkyxGG ),,(),,(
Gk
yxGkyxG 2),,(),,(
GkyxGkyxG 22)1(),,(),,(
2k
Scale space construction
2k2σ
2kσ
2σ
kσ
σ
2kσ
2σ
kσ
σ
Scale space• A collection of images obtained by progressively smoothing
the input image• Analogous to gradually reducing image resolution • See Vedaldi’s implementation (http://www.vlfeat.org/ ) for
details• Discretized scales
)),(min(logO 3,S
octaves of # : octave,per scales of # :
1,,,1,,0,6.1
,2),(
2
minmin0
),()1,(),()/(
0
hw
soosoosooSs
II
Os
OoooSs
IIDoGos
Scale space images
…
first octave
…
…
second octave
…
…
third octave
…
fourth octave
…
…
Difference-of-Gaussian images
…
first octave
…
…
second octave
…
…
third octave
…
fourth octave
…
…
Frequency of sampling
• There is no minimum• Best frequency determined experimentally
Prior smoothing for each octave
• Increasing σ increases robustness, but costs• σ = 1.6 a good tradeoff• Doubling the image initially increases
number of keypoints
Finding extrema
• Sample point D(x,y,σ) is selected only if it is a minimum or a maximum of these points
DoG scale space
Extrema in this image
Localization
• 3D quadratic function is fit to the local sample points
• Start with Taylor expansion with sample point as the origin– where
• Take the derivative with respect to X, and set it to 0, giving
• is the location of the keypoint• This is a 3x3 linear system
2
2
2
1)(
DDDD T
T
Tyx ),,(
XX
D
X
D ˆ02
2
DD2
12
ˆ
Localization
• Hessian and derivative approximated by finite differences,– example:
• If X is > 0.5 in any dimension, process repeated
x
Dy
D
D
x
y
x
D
yx
D
x
Dyx
D
y
D
y
Dx
D
y
DD
2
222
2
2
22
22
2
2
4
)()(
1
2
2
,11
,11
,11
,11
2
,1
,,1
2
2
,1
,1
jik
jik
jik
jik
jik
jik
jik
jik
jik
DDDD
y
D
DDDD
DDD
XX
D
X
D ˆ02
2
Filtering
• Contrast (use prev. equation):– If |D(X)| < 0.03, throw it out
• Edge-iness:– Use ratio of principal curvatures to throw out
poorly defined peaks– Curvatures come from Hessian:– Ratio of Trace(H)2 and Determinant(H)
– If ratio > (r+1)2/(r), throw it out (SIFT uses r=10)
XD
DDT
ˆ2
1)ˆ(
yyxy
xyxx
DD
DDH
22
2 )(
)(
)(,)()(
)(
HDet
HTrratioDDDHDet
DDHTr
xyyyxx
yyxx
Orientation assignment
• Descriptor computed relative to keypoint’s orientation achieves rotation invariance
• Precomputed along with mag. for all levels (useful in descriptor computation)
• Multiple orientations assigned to keypoints from an orientation histogram– Significantly improve stability of matching
))),1(),1(/())1,()1,(((2tan),(
))1,()1,(()),1(),1((),( 22
yxLyxLyxLyxLayx
yxLyxLyxLyxLyxm
Keypoint images
Keypoint Selection
Finding extremawith DoG
Removing|D(X)| < 0.03
Removing|D(X)| < 0.03
Select canonical orientation• Create histogram of local
gradient directions computed at selected scale
• Assign canonical orientation at peak of smoothed histogram
• Each key specifies stable 2D coordinates (x, y, scale, orientation)
0 2
Descriptor
• Descriptor has 3 dimensions (x,y,θ)• Orientation histogram of gradient magnitudes• Position and orientation of each gradient sample
rotated relative to keypoint orientation
Descriptor
• Weight magnitude of each sample point by Gaussian weighting function
• Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects)
Descriptor• Best results achieved with 4x4x8 =
128 descriptor size• Normalize to unit length
– Reduces effect of illumination change
• Cap each element to 0.2, normalize again– Reduces non-linear illumination changes– 0.2 determined experimentally
Object detection
• Create a database of keypoints from training images
• Match keypoints to a database– Nearest neighbor
search
PCA-SIFT
• Different descriptor (same keypoints)• Apply PCA to the gradient patch• Descriptor size is 20 (instead of 128)• More robust, faster
SIFT: Summary
• Scale space• Difference-of-Gaussian• Localization• Filtering• Orientation assignment• Descriptor, 128 elements
Object recognition
• Definition: Identify an object and determine its pose and model parameters
• Commercial object recognition– $4 billion/year industry for inspection and
assembly– Almost entirely based on template matching
• Upcoming applications– Mobile robots, toys, user interfaces– Location recognition– Digital camera panoramas, 3D scene modeling
Invariant local features• Image content => local feature coordinates invariant to translation, rotation, scale• Technical details regarding
– Keypoint matching (using 1st and 2nd nearest neighbors)– Efficient nearest neighbor indexing – Clustering with Hough transform (model location, orientation, scale)– Account for affine distortion
Features
Advantages of invariant local features
• Locality: features are local, so robust to occlusion and clutter (no prior segmentation)
• Distinctiveness: individual features can be matched to a large database of objects
• Quantity: many features can be generated for even small objects
• Efficiency: close to real-time performance
Experimental evaluation
Scale change (factor 2.5)
Harris-Laplace DoG
Viewpoint change (60 degrees)
Harris-Affine (Harris-Laplace)
Descriptors - conclusion
• SIFT + steerable perform best
• Performance of the descriptor independent of the detector
• Errors due to imprecision in region estimation, localization
Image retrieval
…> 5000images
change in viewing angle
Matches
22 correct matches
Image retrieval
…> 5000images
change in viewing angle
+ scale change
Matches
33 correct matches
Multiple panoramas from an unordered image set
Image registration and blending
Location recognition
Robot localization• Joint work with Stephen Se, Jim Little
Recognizing panoramas• Matthew Brown and David Lowe• Recognize overlap from an unordered set of
images and automatically stitch together• SIFT features provide initial feature matching• Image blending at multiple scales hides the
seams
Panorama automatically assembled from 143 images
Map continuously built over time
Planar recognition• Planar surfaces can be
reliably recognized at a rotation of 60° away from the camera
• Affine fit approximates perspective projection
• Only 3 points are needed for recognition
3D object recognition• Extract
outlines with background subtraction
3D object recognition
• Only 3 keys are needed for recognition, so extra keys provide robustness
• Affine model is no longer as accurate
Recognition under occlusion
Comparison to template matching
• Costs of template matching– 250,000 locations x 30 orientations x 4 scales =
30,000,000 evaluations– Does not easily handle partial occlusion and other
variation without large increase in template numbers– Viola & Jones cascade must start again for each
qualitatively different template
• Costs of local feature approach– 3000 evaluations (reduction by factor of 10,000)– Features are more invariant to illumination, 3D rotation,
and object variation– Use of many small subtemplates increases robustness to
partial occlusion and other variations
More local features
• Harris Laplace• Harris Affine• Hessian detector• Hessian Laplace• Hessian Affine• MSER (Maximally Stable Extremal
Regions)• SURF (Speeded-Up Robust Feature)