Local feature descriptors for visual recognition

BMVA2013 1K. Mikolajczyk, Local Feature Descriptors for Visual Recognition

Krystian Mikolajczyk

Center for Vision, Speech and Signal Processing, University of Surrey, Guildford UK

Local Feature Descriptors for Visual Recognition

BMVA2014 Tutorial


Introduction• Krystian Mikolajczyk, Reader in Robot Vision• University of Surrey, Guildford, UK

• 50km south west of London• 5 Faculties, • 12 000 students, 2500 staff

• Faculty of Electronic and Physical Sciences• Electronic Engineering Department

• CVSSP ‐ Centre for Vision, Speech, and Signal Processing– 19 Academics– 30 Research Fellows– 60 PhD students


Research

Image enhancement,VideorestorationSuperresolutionHDR imaging

Image and video representationLocal descriptorsMotion estimationSegmentationClustering

Machine Learning methodsLDA, KDA, SVM

Retrieval, indexing, data structures

Image and video recognition

Object detectionScene classification

Activity recognition


Program• Local Feature Definitions / Properties• Applications• Interest point detectors• Local Descriptors• Evaluations


Features Detector• Definition: A feature detector (extractor) is an algorithm taking an image

as input and outputting a set of regions (“local features”).

• “Local Features” are regions, i.e. in principle arbitrary sets of pixels, not necessarily contiguous, which are at least :– distinguishable in an image regardless of viewpoint/illumination– robust to occlusion must be local– Must have a discriminative neighborhood: they are “features”

• Terminology has not stabilised:Local Feature = Interest “Point” = Keypoint =

= Feature “Point” = The “Patch”= Distinguished Region = Features = (Transformation) Covariant Region

• Definition: A descriptor is computed on an image region defined by a detector. The descriptor is a representation of the intensity (colour, ….) function on the region.


Feature Detectors• Invariance (or covariance) to a broad class of geometric and photometric

transforms• Efficiency: close to real‐time performance• Quantity/Density of features to cover small object/part of scenes• Robustness to:

– occlusion and clutter (requires locality)– to noise, blur, discretization, compression

• Distinctiveness: individual features can be matched to a large database of objects

• Stability over time (to support long‐temporal‐baseline matching)• Geometrically accuracy: precise localization• Generalization to similar objects• Even coverage, complementarity, number of geometric constraints, …• No detector dominates in all aspects, some properties are competing,

e.g. level of invariance x speed


Feature Descriptor• Definition: A descriptor is computed on an image region defined

by a detector. The descriptor is a representation of the intensity (colour, ….) function on the region.

Desiderata for feature descriptors:• Discriminability• Robustness to misalignment, illumination, blur, compression, …• Efficiency: real‐time often required• Compactness: small memory footprint. Very significant on

mobile large‐scale applications

• Note: The region on which a descriptor is computed is a called a measurement region. This may be directly the feature detector output or any other function of it (eg. convex hull, triple area region..)


• Methods based on “Local Features” are the state‐of‐the‐art for number of computer vision problems (mostly those that require local correspondences).– Registration– Stereo vision– Motion estimation– Matching– Retrieval– Image & Video Classification– Detection– Action recognition– Robot navigation

However, there are still many issues to address

Local Features


Local transf: scale/affine – Detector: affine‐Harris Descriptor: SIFT

Example 1: Wide baseline matching• Establish correspondence between two (or more) images• Useful in visual geometry: Camera calibration, 3D reconstruction, Structure and motion estimation, …


Example 2: Panoramic mosaic


M. Brown, D. Lowe, B. Hearn, J. BeisBMVA2013 12K. Mikolajczyk, Local Feature Descriptors for Visual Recognition

Example 3: 3D reconstruction• Photo Tourism overview

Scene reconstruction

Photo ExplorerInput photographs

Relative camera positions and orientations

Point cloud

Sparse correspondence

Slide: N. Snavely

Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," ACM Transactions on Graphics (SIGGRAPH Proceedings)


Example 3: 3D reconstruction• 57,845 downloaded images, 11,868 registered images. This video: 4,619 images. • The Old City of Dubrovnik •

Building Rome in a Day, Agarwal, Snavely, Simon, Seitz, Szeliski, ICCV 2009 See also [Havlena, Torrii, Knop pand Pajdla, CVPR 2009].


Example 4: Query by example search in large scale image datasets

Find these objects ...in these images and 1M more

Search the web with a visual query …


Example 5: Google goggles

Slide credit: I. Laptev BMVA2013 16K. Mikolajczyk, Local Feature Descriptors for Visual Recognition

Example 6: Where am I?• Place recognition ‐ retrieval in a structured (on a map) database

[Knopp, Sivic, Pajdla, ECCV 2010] http://www.di.ens.fr/willow/research/confusers/

Query

Query Expansion(Panoramio,

Flickr, … )

Best match

Image indexingwith spatial verification

ConfuserSuppressionOnly negative training data

(from geotags)

Image database


Example 8: Re‐acquisition in tracking• Tracking Loop

ECCV 2012 Modern features: … Introduction.

Detect Correspondence generation + PROSAC

Update Structured SVM + stochastic gradient descent

Hare, Amri, Torr, CVPR 2012


Example 8: Object category recognitionSliding window detector

• Classifier: SVM with linear kernel

• BOW representation for ROI

Example detections for dog

Lampert et al CVPR 08: Efficient branch and bound search over all windows

BMVA2013 19K. Mikolajczyk, Local Feature Descriptors for Visual Recognition BMVA2013 20K. Mikolajczyk, Local Feature Descriptors for Visual Recognition


Human Action Recognition


Local features meet Invariants: Schmid and Mohr, 1997.

C. Schmid, R. Mohr, "Local Gray-Value Invariants for Image Retrieval", IEEE Trans. PAMI, vol. 19 (5), 1997, pp. 530--535.

• Multi scale differential gray value invariants computed at Harris points

• Similarity‐based geometric constraint to reject mismatches

• Canonical Framenot used.


D. Lowe, Object recognition from local scale‐invariant features, ICCV, 1999

Detector:• Scale‐space peaks of Difference‐of‐Gaussians filter

response (Lindeberg 1995 )• Similarity frame from modes of gradient histogram

SIFT Descriptor:• Local histograms of gradient orientation• Allows for small misalignments

=> robust to non‐similarity transformsIndexing:• Modified kD‐tree structureVerification:• Hough transform based clustering of

correspondences with similar transformations

Fast, efficient implementation, real‐time recognition

D. G. Lowe: “Distinctive image features from scale-invariant keypoints”. IJCV, 2004.


J. Sivic& A. Zisserman, Video Google …, ICCV 2003

• Given an image or a part ofit, return its label or a ranking of labels

• Local image patches– Interest points– Regular grid sampling

• Image descriptors– Histogram of gradients

• Visual vocabulary – Clustering of descriptors

– Learning codewordweights per category

• Codeword occurrence distribution per image

…

I14I22..

Image indexes

I44I212..

I14I22..

I134I252..

I14I22..

I34I82..

I184I52..

I514I542..

I64I692..

I664I252..

I784I72..

…

…

voting


Image Classificationhttp://kahlan.eps.surrey.ac.uk/featurespace/web/Classification_Exercise.zip

• Local image patches

• Image descriptors

• Visual vocabulary

• Codeword occurrence distribution per image

• Machine learning– classifiers

freq

uen

cy

codewords

…

4. Classification

…

…

… ………

Kernel matrix

Kernel Discriminant

Analysis

Cla

ss la

bels

…


Properties of the ideal feature

• Local: features are local, so robust to occlusion and clutter (no prior segmentation)

• Invariant (or covariant)• Robust: noise, blur, discretization, compression, etc. do not have a big impact on the feature

• Distinctive: individual features can be matched to a large database of objects

• Quantity: many features can be generated for even small objects

• Accurate: precise localization• Efficient: close to real‐time performance


How to cope with transformations?

• Exhaustive search• Robustness• Invariance


Invariance

• Integration, e.g.– moment invariants, …

• Heuristics, e.g.– Difference of intensity values for photom. offset– Ratio of intensity values for photom. scale factor

• Selection and normalization, e.g.– Automatic scale selection (Lindeberg et al., 1996)– Orientation assignment– Affine normalization (‘deskewing’)

• …


Photometric transformations

Modelled as a linear transformation:scaling + offset (Color features, T. Gevers)

baII +='


Geometric transformations

• Translation• Euclidean (translation + rotation)• Similarity (transl. + rotation + scale)• Affine transformations• Projective transformations

Only holds for planar patches!


The need for geometric invariance


Overview of existing detectors

• Hessian & Harris• Lowe: DoG• Mikolajczyk&Schmid:

Hessian/Harris‐Laplacian/Affine• Tuytelaars& Van Gool: EBR and IBR• Matas: MSER• Kadir& Brady: Salient Regions • Others


• Hessian determinant

⎥⎦

⎤⎢⎣

⎡=

yyxy

xyxx

IIII

IHessian )(

2))(det( xyyyxx IIIIHessian −=

Ixx

Iyy

Ixy

2)^(. xyyyxx III −∗In Matlab:

Hessian detector (Beaudet, 1978)


Hessian detector (Beaudet, 1978)


35

• Second moment matrix / autocorrelation matrix

1. Image derivatives

2. Square of derivatives

3. Gaussian filter g(σI)

Ix Iy

Ix2 Iy2 IxIy

g(Ix2) g(Iy2) g(IxIy)

222222 )]()([)]([)()( yxyxyx IgIgIIgIgIg +−− α=−= ))],([trace()],(det[ DIDIhar σσμασσμ

4. Cornerness function – both eigenvalues are strong

har5. Non‐maxima suppression

Harris detector (Harris, 1988)

⎥⎥⎦

⎤

⎢⎢⎣

⎡∗=

)()()()(

)(),( 2

2

DyDyx

DyxDxIDI III

IIIg

σσσσ

σσσμ


Scale invariant detectorsLaplacian of Gaussian

• Local maxima in scale space of Laplacian of Gaussian LoG

)()( σσ yyxx LL +

σ

σ2

σ3

σ4

σ5

list of (x, y, σ)


• LoG –> diffution quation ‐> derivative to scale

)(σL

)()( σσ yyxx LLLLsL

+=Δ=∇⋅∇=∂∂ vv

- =

)( σkL

LLΔ=

∂∂ σσ

2σ=s

)()()1( 2 σσσ LkLLk −≈Δ−

)()( σσ LkL −

))()((2 σσσ yyxx LL +scale normalized Laplacean

σσσσσ

σ −−

≈Δ=∂∂

kLkLLL )()(

Lowe: DoG


Lowe: DoG


scale‐invariantsimple, efficient schemelaplacian fires more on edges than

determinant of hessian

Properties


Harris Laplace

σ

σ2

σ3

σ4

Detecting local maxima

1. Initialization: Multiscale Harris corner detection


Harris Laplace

Harris points

Harris‐Laplace points

1. Initialization: Multiscale Harris corner detection2. Scale selection based on Laplacian


Harris Affine

1. Detect multi‐scale Harris points2. Automatically select the scales3. Adapt affine shape based on second order moment matrix4. Refine point location


Harris & Hessian Affine


T. Tuytelaars, B. Leibe 44

Scale or affine invariantDetects blob‐ and corner‐like structures

large number of regionswell suited for object class recognitionless accurate than some competitors

Properties


Matas: Maximally Stable Extremal Regions (MSERs)

• Based on watershed algorithm











Maximally Stable Extremal Regions


Affine invariantDetects blob‐like structures

Simple, efficient schemeHigh repeatabilityFires on similar features as IBR

(regions need not be convex, but need to be closed)

Sensitive to image blur

Properties


Kadir&Brady: salient regions• Based on entropy


• Maxima in entropy, combined with inter‐scale saliency

• Extended to affine invariance

Kadir& Brady: salient regions


Kadir& Brady: salient regions


PropertiesScale or affine invariantDetects blob‐like structures

very good for object class recognitionlimited number of regionsslow to extract


55

Affine normalization (‘deskewing’)

rotate

rescale


Local descriptors - rotation invariance

• Estimation of the dominant orientation– extract gradient orientation– histogram over gradient orientation– peak in this histogram

• Rotate patch in dominant direction0 2π

• Plus: invariance• Minus: less discriminant, additional noise


57

• Scale, stretch and skew• Fixed size disk

Affine Normalization


ASIFT ‐ a new affine invariant detector?Idea:• (due to Lepetit et al. and others) Synthesize warped views of both images in two view

matching• Match all pairs of synthesized images.• Impose a geometric constraint to prune the tentative correspondences• Positives:

– yes, more correct correspondences are found =>– some (very) difficult matching problems solvable

• Negatives:– detection time goes up significantly– matching time goes up even more significantly (quadratically)– problematic use in e.g. retrieval – issue with evaluation (not all reported matches are inliers)

• ASIFT is NOT a detector – rather a “matching” scheme– generates “redundant” representation, that slows down the matching significantly– any detector may benefit from this “matching” scheme

Guoshen Yu, Jean‐Michel Morel: A fully affine invariant image comparison method. ICASSP 2009: 1597‐1600G.Yu and J.M. Morel, ASIFT: An Algorithm for Fully Affine Invariant Comparison, Image Processing On Line, 2011.

ECCV 2012 Modern features: … Detectors.


Affine Invariance by Sampling Viewsphere• Why use DoG (“SIFT”)?

Replacing DoG by HessianAffine or MSER is beneficial and more efficient!• HessianAffine matches from direct matching:

• HessianAffine matches with synthesized images (viewsphere sampling)

• Conclusions: 1. generating synthesized view work 2. no reason to use DoGECCV 2012 Modern features: … Detectors. BMVA2013 60K. Mikolajczyk, Local Feature Descriptors for Visual Recognition

Efficient methods

• Consider that descriptor calculation may take longer that the detection process! Sometimes, “auxiliary calculations” like non‐maximum suppression dominates computation time.

• Consider the required level of invariance:in some applications, reduced level invariance is sufficient

• Consider fast approximations• Use fast implementations, e. g. on GPU (GPU SURF, GPU SIFT)

ECCV 2012 Modern features: … Detectors. 60/60


Speeded Up Robust Features• Idea:

– Approximate Hessian + SIFT calculation with a computationally efficient algorithm.

• Properties:– exploit the integral image – the SURF detector is an approximation to the Hessian– reuse the calculations needed for detection in descriptor computation

– maintain robustness to rotation, scale illumination change– approximately 2x faster than DoG

10x faster Hessian‐Laplace detector

Herbert Bay, Tinne Tuytelaars, and Luc Van Gool , SURF: Speeded Up Robust Features, ECCV 2006.

61ECCV 2012 Modern features: … Detectors.

citations2300 (2010)4000 (2012)


62

The Integral image (Sum Table)

To calculate the sum in the DBCA rectangle, only 3 additions are needed



63

SURF Detection

• Approximate second order derivatives with box filters filters (mean/average filter)

Hessian-based interest point localization:Lxx(x,y,σ) is the convolution of the Gaussian second order derivative with the image

ECCV 2012 Modern features: … Detectors. BMVA2013 64K. Mikolajczyk, Local Feature Descriptors for Visual Recognition

64

SURF Detection• Scale analysis easily handled with the integral image

9 x 9, 15 x 15, 21 x 21, 27 x 27 39 x 39, 51 x 51 …1st octave 2nd octave



CenSurE‐Oct detector• Approximation of LoG / DoG by octagonal box filters• Sum of intensities inside an octagon

calculated in O(1) using 3 integral images:• Zero DC response requires normalisation• Constant DC response over scales• 3x3x3 non‐maxima suppression• Edge responses supressed using Harris measure• Scale sampling:


M. Agrawal, K. Konolige, M. R. Blas. CenSurE: Center Surround Extremasfor Realtime Feature Detection and Matching ECCV 2008


STAR detector


Approximating LoG / DoGusing 2 integral images5x5 spatial non-max suppressionA single response at a point: maximum over scalesEdge reponses suppressed using Hessian

K. Konolige et al. View-Based Maps. Journal of Robotics 2010 http://pr.willowgarage.com/wiki/Star_Detector


CenSurE and STAR det. vs. DoG• Small scale features: Only integer locations and scales• Limited rotation invariance• Only 2x speedup for the DoG detector


Det. Time [s] Without SSEinstructions

With SSEinstructions

VLF SIFT (DoG)

0.34 0.14

STAR 0.16 0.08

CenSurE 0.28 X


Fast‐9 and Fast‐ER (E. Rosten)• in some situations (controlled lighting, tracking),

invariance/robustness is less important than speed• simple detector based on intensity comparisons could be very

fast, and yet “repeatable enough”

• detection: 12 contiguous pixels are darker/brighter than the central pixel by at least t.

• http://www.edwardrosten.com/work/fast.html

68

citations:730 (2012)



Fast‐9 and Fast‐ER (E. Rosten)

69ECCV 2012 Modern features: … Detectors. BMVA2013 70K. Mikolajczyk, Local Feature Descriptors for Visual Recognition

Other feature detectors• Edge‐based detectors

– Jurie et al., Mikolajczyk et al., …• Combinations of small‐scale features

– Brown & Lowe• Vertical line segments

– Goedeme et al.• Speeded‐Up Robust Features (SURF)

– Bay et al.• Fast Features

– Rosten et al.• Segmentation based features

– Malik et al, Koniusz et al.


Program• Local Feature Definitions / Properties• Applications• Interest point detectors • Local Descriptors• Evaluations


Extract affine regions Normalize regionsEliminate rotational

+ illuminationCompute appearance

descriptors

SIFT (Lowe ’04)

Descriptors


Descriptors history

• Normalized cross-correlation (NCC) [~ 60s]

• Gaussian derivative-based descriptors– Differential invariants [Koenderink and van Doorn’87]– Steerable filters [Freeman and Adelson’91]

• Moment invariants [Van Gool et al.’96]

• SIFT [Lowe’99]

• Shape context [Belongie et al.’02]

• Gradient PCA [Ke and Sukthankar’04]

• SURF descriptor [Bay et al.’08]

• DAISY descriptor [Tola et al.’08, Windler et al’09]

• …….


SIFT descriptor [Lowe’99]

• Spatial binning and binning of the gradient orientation• 4x4 spatial grid, 8 orientations of the gradient, dim 128 • Soft-assignment to spatial bins• Normalization of the descriptor to norm one (robust to illumination) • Comparison with Euclidean distance

gradient

→ →

image patch

y

x


SURF: Speeded Up Robust Features

• Approximate derivatives with Haar wavelets• Exploit integral images

Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool "SURF: Speeded Up Robust Features", Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346‐‐359, 2008

Citations: 4500 (2012)


DAISY• Optimized for dense sampling• Log‐polar grid• Gaussian smoothing• Dealing with occlusions

Engin Tola, Vincent Lepetit, and Pascal Fua, DAISY: An Efficient Dense Descriptor Applied to Wide‐Baseline Stereo, TPAMI 32(5), 2010.



Fast and compact descriptors• Binary descriptors• Comparison of pairs of intensity values

– LBP– BRIEF– ORB– BRISK


LBP: Local Binary Patterns

T. Ojala, M. Pietikäinen, and D. Harwood (1994), "Performance evaluation of texture measures with classification based on Kullback discrimination of distributions", ICPR 1994, pp.582‐585.M Heikkilä, M Pietikäinen, C Schmid, Description of interest regions with LBP, Pattern recognition 42 (3), 425‐436

• First proposed for texture recognition in 1994.



BRIEF:Binary Robust Independent

Elementary Features• Random selection of pairsof intensity values.

• Fixed sampling patternof 128, 256 or 512 pairs.

• Hamming distance to compare descriptors (XOR).

M. Calonder, V. Lepetit, C. Strecha, P. Fua, BRIEF: Binary Robust Independent Elementary Features, 11th European Conference on Computer Vision, 2010.



Various others• BRISK: Binary Robust Invariant Scalable Keypoints• FREAK: Fast Retina Keypoint• CARD: Compact and Realtime Descriptor• LDB: Local Difference Binary


LIOP: Local Intensity Order Pattern for Feature Description

Zhenhua Wang Bin Fan Fuchao Wu, Local Intensity Order Pattern for Feature Description, ICCV 2011.

• Robustness to monotonic intensity changes• Data‐driven division into cells

(and predecessors MROGH and MRRID)


Linear Discriminant Projections

M. Brown, G. Hua and S. Winder, Discriminant Learning of Local Image Descriptors. PAMI. 2010.H. Cai, K. Mikolajczyk, J Matas, Linear Discriminant Projections, PAMI 2010.

• Learn configuration and other parametersfrom training data obtained from 3D reconstructions



Linear Discriminant Projections• Training data = set of corresponding image patches


Descriptor Learning Using Optimisation

• Learning of– spatial pooling regions– dimensionality reduction

• Learning from very weak supervision

Non‐linear transform

Spatial pooling

Dimensionality reduction

Pre‐rectified keypoint patch

Descriptor vector

Normalisation and cropping

learning

learning

K. Simonyan et al., Descriptor Learning Using Convex Optimisation, ECCV 2012

M. Brown, G. Hua and S. Winder, Discriminant Learning of Local Image Descriptors. PAMI 2010.


D‐BRIEF: Discriminative BRIEF

T. Trzcinski and V. Lepetit, Efficient Discriminative Projections for Compact Binary DescriptorsEuropean Conference on Computer Vision (ECCV) 2012

• Learn linear projections that map image patches to a more discriminative subspace

• Exploit integral images


Dimensionality Reduction• PCA or LDE can reduce dimensionality to 30% without performance decrease

• Improves clustering performance• Especially useful with combined grayvalue and colordescriptors

• Improves matching/recognition performance e.g Scene 15– SIFT 128 dim 83.5%– PCA 30 dim 82.9% – LDE 30 dim 84.5%

H. Cai, K. Mikolajczyk, J Matas, Linear Discriminant Projections, PAMI 2010.


Program• Local Feature Definitions / Properties• Applications• Interest point detectors • Local Descriptors• Evaluations


Setting up an evaluation• Which problem? Performance in different application/niches may vary significantly.– Category recognition, – Matching, – Retrieval

• What dataset?– Pascal VOC 2007– Oxford image pairs– Oxford ‐ Paris buildings

• Protocol and criteria?– Public dataset, – Avoiding risk to over‐fitting/optimizing to the data


Detector evaluations

matchesallmatchescorrectprecision

##

=

A

BB

homography

Two points are correctly matched ifT=40%

TBABA

>∪∩

encescorrespondtruthgroundmatchescorrectrecall

##

=


90

1

Performance measurePrecision‐Recall

recall

1‐precision

correctincorrectcorrectecisionPr

+=

truthgroundcorrectrecall =

High precision= very few incorrect images Low precision= all images

1

High recall = all ground truth images

Low recall = none of ground truth images

Good approach

Bad approach

0.5

0.5


Matching testPrecision‐recall area

matchesallmatchescorrectprecision

##

=

encescorrespondtruthgroundmatchescorrectrecall

##

=

20 30 40 50 600


Previous Evaluations• 2D Scene – Homography

– C. Schmid, R. Mohr, and C. Bauckhage, “Evaluation of interest point detectors,” IJCV, 2000.

– K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” CVPR, 2003.

– T. Kadir, M. Brady, and A. Zisserman, “An affine invariant method for selecting salient regions in images,” in ECCV, 2004.

– K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky,T. Kadir, and L. Van Gool, “A comparison of affine region detectors,” IJCV, 2005.

– A. Haja, S. Abraham, and B. Jahne, Localization accuracy of region detectors, CVPR 2008

– T. Dickscheid, FSchindler, Falko, W. Förstner, Coding Images with Local Features, IJCV 2011


Previous Evaluations• 3D Scene ‐ epipolar constraints

– F. Fraundorfer and H. Bischof, “Evaluation of local detectors on non‐planar, scenes,” in AAPR, 2004.

– P. Moreels and P. Perona, “Evaluation of features detectors and descriptors based on 3D objects,” IJCV, 2007.

– S. Winder and M. Brown, “Learning local image descriptors,” CVPR, 2007,2009.– Dahl, A.L., Aanæs, H. and Pedersen, K.S. (2011): Finding the Best Feature Detector‐

Descriptor Combination. 3DIMPVT, 2011.


Recent Evaluations• Recent detectorsO. Miksik and K. Mikolajczyk, Evaluation of Local Detectors and Descriptors for Fast Feature Matching, ICPR 2012


Recent Descriptor Evaluations• Computation times for the different descriptors for 1000 SURF

keypointsO. Miksik and K. Mikolajczyk, Evaluation of Local Detectors and Descriptors for Fast Feature Matching, ICPR 2012J. Heinly E. Dunn, J‐M. Frahm, Comparative Evaluation of Binary Features, ECCV2012


Previous Evaluations• Image/object categories

– K. Mikolajczyk, B. Leibe, and B. Schiele, “Local features for object class recognition,” in ICCV, 2005

– E. Seemann, B. Leibe, K. Mikolajczyk, and B. Schiele, “An evaluation of local shape‐based features for pedestrian detection,” in BMVC, 2005.

– M. Stark and B. Schiele, “How good are local features for classes of geometric objects,” in ICCV, 2007.

– K. E. A. van de Sande, T. Gevers and C. G. M. Snoek, Evaluation of Color Descriptors for Object and Scene Recognition. CVPR, 2008.


Approach• Bags‐of‐features

1. Interest point / region detector2. Descriptors3. K‐means clustering (4000 clusters)4. Histogram of cluster occurrences (NN assignment)5. Chi‐square distance and RBF kernel for KDA or SVM classifier

• J. Zhang and M. Marszalek and S. Lazebnik and C. Schmid, Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study, IJCV, 2007• K. E. A. van de Sande, T. Gevers and C. G. M. Snoek, Evaluation of Color Descriptors for Object and Scene Recognition. CVPR, 2008


Evaluation

• PASCAL VOC measures– Average precision for every object category– Mean average precision APCategory

precision

recall

APoutput=>


MAP #dimensions density

MAP Rankingcolor/gray, density, dimensionality ...

• SIFT still dominates (Histograms of gradient locations and orientations)• Opponent chromatic space (normalized red‐green, blue‐yellow, and intensity Y


Grayvalue descriptors

• Observations• Color improves• All based on histograms of gradient locations and orientations• Dimensionality not much correlated with the performance• Density Strongly correlated (the more the better)• Results biased by density• Implementation details matter

MAP Ranking density#dimensions


Features• Which features to use ?

– Affine invariant features if large viewpoint changes are expected (>30 degrees)

– Level of invariance needed depends on number of model images

– Features need to be distinctive: risk for false matches is large

– At least a few good matches (if time for post‐processing is not an issue)

– Take into account typical image content (blobs/corners/prints/…)

• MSER, SURF, DoG, Harris/Hessian‐Laplace/Affine• www.featurespace.org, VLFeat, OpenCV(data, code)


Conclusions• Histograms of gradient location‐orientation dominate

• Color brings improvement for most classes– opponent chromatic space

• Feature number– the more the better

• Similar ranking in image matching – performance generalizes across applications

• Exercisehttp://kahlan.eps.surrey.ac.uk/featurespace/web/Classification_Exercise.zip

Local feature descriptors for visual recognition

Science

Transcript of Local feature descriptors for visual recognition