Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.
-
Upload
milton-colden -
Category
Documents
-
view
218 -
download
0
Transcript of Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.
Object Recognition from Local Scale-Invariant Features
David G. Lowe
Presented by Ashley L. Kapron
Introduction
• Object Recognition– Recognize known objects in unknown
configurations
Previous Work
• Zhang et al – Harris Corner Detection – Detect peaks in local image
variation
• Schmid and Mohr– Harris Corner Detection– Local image descriptor at each
interest pt from an orientation-invariant vector of derivative-of-Gaussian image measurements
Motivation
• Limitations of previous work: – Examine image only on a single scale
• Current paper addresses this concern by identifying stable key locations in scale space
• Identify features that are invariant
Invariance
• Illumination
• Scale
• Rotation
• Affine
Scale Space• Different scales are appropriate for
describing different objects in the image, and we may not know the correct scale/size ahead of time.
Difference of Gaussian
1. A = Convolve image with vertical and horizontal 1D Gaussians, σ=sqrt(2)
2. B = Convolve A with vertical and horizontal 1D Gaussians, σ=sqrt(2)
3. DOG (Difference of Gaussian) = A – B
4. Downsample B with bilinear interpolation with pixel spacing of 1.5 (linear combination of 4 adjacent pixels)
A1
B1
Difference of Gaussian Pyramid
Input Image
Blur
Blur
Blur
Downsample
Downsample
B2
B3
A2
A3
A3-B3
A2-B2
A1-B1
DOG2
DOG1
DOG3
Blur
Pyramid Example
A1 B1 DOG1
DOG3
DOG3A2
A3 B3
B2
Feature detection
• Find maxima and minima of scale space• For each point on a DOG level:
– Compare to 8 neighbors at same level– If max/min, identify corresponding point at pyramid
level below– Determine if the corresponding point is max/min of its 8
neighbors– If so, repeat at pyramid level above
• Repeat for each DOG level• Those that remain are key points
Identifying Max/Min
DOG L-1
DOG L
DOG L+1
Refining Key List: Illumination
• For all levels, use the “A” smoothed image to compute– Gradient Magnitude
• Threshold gradient magnitudes: – Remove all key points with MIJ less than 0.1
times the max gradient value
• Motivation: Low contrast is generally less reliable than high for feature points
Assigning Canonical Orientation
• For each remaining key point:– Choose surrounding N x N window at DOG
level it was detected
DOG image
Assigning Canonical Orientation
• For all levels, use the “A” smoothed image to compute– Gradient Orientation
+
Gaussian Smoothed Image Gradient Orientation Gradient Magnitude
Assigning Canonical Orientation
• Gradient magnitude weighted by 2D gaussian
Gradient Magnitude 2D Gaussian Weighted Magnitude
* =
Assigning Canonical Orientation• Accumulate in histogram
based on orientation• Histogram has 36 bins with
10° increments
Weighted Magnitude
Gradient OrientationGradient OrientationS
um o
f W
eigh
ted
Mag
nitu
des
Assigning Canonical Orientation• Identify peak and assign
orientation and sum of magnitude to key point
Weighted Magnitude
Gradient OrientationGradient OrientationS
um o
f W
eigh
ted
Mag
nitu
des
Peak*
Refining Key List: Rotation
• The user may choose a threshold to exclude key points based on their assigned sum of magnitudes.
Example of Refinement
Max/mins from DOG pyramid
Filter for illumination
Filter for edge orientation
Local Image Description
• SIFT keys each assigned:– Location– Scale (analogous to level it was detected)– Orientation (assigned in previous canonical
orientation steps)
• Now: Describe local image region invariant to the above transformations
SIFT key example
Local Image Description
For each key point:
• Identify 8x8 neighborhood (from DOG level it was detected)
• Align orientation to x-axis
Local Image Description
3. Calculate gradient magnitude and orientation map
4. Weight by Gaussian
Local Image Description
5. Calculate histogram of each 4x4 region. 8 bins for gradient orientation. Tally weighted gradient magnitude.
Local Image Description
6. This histogram array is the image descriptor. (Example here is vector, length 8*4=32. Best suggestion: 128 vector for 16x16 neighborhood)
Database Creation
• Index all key points of reference model image(s)– Store key point
descriptor vectors in database
Image Matching
• Find all key points identified in target image– Each key point will have 2d location, scale and
orientation, as well as invariant descriptor vector
• For each key point, find similar descriptor vectors in reference image database. – Descriptor vector may match more than one reference
image database– The key point “votes” for image(s)
• Use best-bin-first algorithm
Hough Transform Clustering• Create 4D Hough Transform (HT) Space
for each reference image1. Orientation bin = 30° bin
2. Scale bin = 2
3. X location bin = 0.25*ref image width
4. Y location bin = 0.25*ref image height
• If key point “votes” for reference image, tally its vote in 4D HT Space.
– This gives estimate of location and pose– Keep list of which key points vote for a bin
Verification
• Identify bins with largest votes (must have at least 3).
• Using list of key points which voted for a cell, compute affine transformation parameters (m, t)– Use corresponding coordinates of reference
model (x,y) and target image (u,v).
Verification
• If more than three points, solve in least-squares sense
Verification: Remove Outliers
• After applying affine transformation to key points, determine difference between calculated location and actual target image location
• Throw out if:– Orientation different by 15°– Scale off by sqrt(2)– X,Y location by 0.2*model size
• Repeat least-squares solution until no points are thrown out
SIFT Example
SIFT Example
SIFT example
Advantages of SIFT• Numerous keys can be generated for even small objects
• Partial occlusion/image clutter ok because dozens of SIFT keys may be associated with an object, but only need to find 3
• Object models can undergo limited affine projection.
– Planar shapes can be recognized at 60 degree rotation away from camera.
• Individual features can be matched to a large database of objects
Limitations of SIFT
• Fully affine tranformations require additional steps
• Many parameters “engineered” for specific application. May need to be evaluated on case-to-case basis
Thank you!