Post on 20-Dec-2015
Distinctive Image Feature from Scale-Invariant KeyPoints
David G. Lowe, 2004
Presentation Content
• Introduction• Related Research• Algorithm
– Keypoint localization
– Orientation assignment
– Keypoint descriptor
• Recognizing images using keypoint descriptors• Achievements and Results• Conclusion
Introduction
• Image matching is a fundamental aspect of many problems in computer vision.
So how do we do that?
Scale Invariant Feature Transform(SIFT)
• Object or Scene recognition.• Using local invariant image features. (keypoints)
– Scaling– Rotation– Illumination– 3D camera viewpoint (affine)– Clutter / noise– Occlusion
• Realtime
Related Research– Corner detectors
• Moravec 1981• Harris and Stepens 1988• Harris 1992• Zhang 1995• Torr 1995• Schmid and Mohr 1997
– Scale invariant• Crowley and Parker 1984• Shokoufandeh 1999• Lindeberg 1993, 1994• Lowe 1999 (this author)
– Invariant to full affine transformation• Baumberg 2000• Tuytelaars and Van Gool 2000• Mikolajczyk and Schmid 2002• Schaffalitzky and Zisserman 2002• Brown and Lowe 2002
Keypoint Detection
• Goal: Identify locations and scales that can be repeatably assigned under differing views of the same object.
• Keypoints detection is done at a specific scale and location
• Difference of gaussian function
• Search for stable features across all possible scales
D(x, y, σ) = (G(x, y, kσ) − G(x, y, σ)) ∗ I (x, y) = L(x, y, kσ) − L(x, y, σ).
σ = amount of smoothingk = constant : 2^(1/s)
KeyPoint Detection
• Reasonably low cost• Scale sensative• Number of scale samples per
octave?
• 3 scale samples per octave where used (although more is better).
• Determine amount of smoothing (σ)• Loss of high frequency information so double up
Accurate Keypoint Localization (1/2)
• Use Taylor expansion to determine the interpolated location of the extrema (local maximum). Calculate the extrema at this exact location and discart extrema below 3% difference of it surroundings.
Accurate Keypoint Localization (2/2)
• Eliminating Edge Responses• Deffine a Hessian matrix with derivatives of
pixel values in 4 directions• Detirmine ratio of maxiumum eigenvalue
divided by smaller one.
• #KeyPoints0 832729 536
Orientation Assignment
• Caluculate orientation and magnitude of gradients in each pixel
• Histogram of orientations of sample points near keypoint.
• Weighted by its gradient magnitude and by a Gaussian-weighted circular window with a σ that is 1.5 times that of the scale of the keypoint.
Stable orientation results
• Multiple keypoints for multiple histogram peaks
• Interpolation
The Local Image Discriptor
• We now can find keypoints invariant to location scale and orientation.
• Now compute discriptors for each keypoint.• Highly distinctive yet invariant for illumination
and 3D viewpoint changes.• Biologically inspired approach.
• Divide sample points around keypoint in 16 regions (4 regions used in picture)
• Create histogram of orientations of each region (8 bins)• Trilinear interpolation.• Vector normalization
Descriptor Testing
This graph shows the percent of keypoints giving the correct match to a database of 40,000 keypoints as a function of width of the n×n keypoint descriptor and the number of orientations in each histogram. The graph is computed for images with affine viewpoint change of 50 degrees and
addition of 4% noise.
Keypoint Matching
• Look for nearest neighbor in database (euclidean distance)
• Comparing the distance of the closest neighbor to that of the second-closest neighbor.
• Distance closest / distance second-closest > 0.8 then discard.
Efficient Nearest Neighbor Indexing .
• 128-dimensional feature vector• Best-Bin-First (BBF)• Modified k-d tree algorithm.• Only find an approximate answer.• Works well because of 0.8 distance rule.
Clustering with the Hough Transform
• Select 1% inliers among 99% outliers• Find clusteres of features that vote for the
same object pose.– 2D location– Scale– Orientation– Location relative to original training image.
• Use broad bin sizes.
Solution for Affine Parameters
• An affine transformation correctly accounts for 3D rotation of a planar surface under orthographic projection, but the approximation can be poor for 3D rotation of non-planar objects.Basiclly: we do not create a 3D representation of the object.
• The affine transformation of a model point [x y] to an image point [u v] can be written as
•Outliers are discarded•New matches can be found by top-down matching
Results
Results
Conclusion
• Invariant to image rotation and scale and robust across a substantial range of affine distortion, addition of noise, and change in illumination.
• Realtime• Lots of applications
Further Research
• Color• 3D representation of world.