Groups of Adjacent Contour Segments for Object Detection
description
Transcript of Groups of Adjacent Contour Segments for Object Detection
Groups of Adjacent Contour Segments Groups of Adjacent Contour Segments for Object Detectionfor Object Detection
Vittorio FerrariVittorio FerrariLoic FevrierLoic Fevrier
Frederic JurieFrederic JurieCordelia SchmidCordelia Schmid
Problem: object class detection & localizationProblem: object class detection & localization
Training
Testing
?Focus:classes with characteristic shape
Features: pairs of adjacent segments (PAS)Features: pairs of adjacent segments (PAS)
Contour segment network[Ferrari et al. ECCV 2006]
1) edgels extracted with Berkeley boundary detector
2) edgel-chains partitioned into straight contour segments
3) segments connected at edgel-chains’ endpoints and junctions
Features: pairs of adjacent segments (PAS)Features: pairs of adjacent segments (PAS)
segments connected in the network
PAS = groups of two connected segments
2
• encodes geometric properties of the PAS• scale and translation invariant• compact, 5D
PAS descriptor:
Features: pairs of adjacent segments (PAS)Features: pairs of adjacent segments (PAS)
Example PAS
Why PAS ?
+ intermediate complexity:good repeatability-informativeness trade-off
+ scale-translation invariant
+ connected: natural grouping criterion (need not choose a grouping neighborhood or scale)
+ can cover pure portions of the object boundary
PAS codebookPAS codebookBased on descriptors, cluster PAS into types
a few of the most frequent types based on 10 outdoor images (5 horses and 5 background).
types based on 15 indoor images (bottles)
• Frequently occurring PAS have intuitive, natural shapes• As we add images, number of PAS types converges to just ~100• Very similar codebooks come out, regardless of source images
+ general, simple features. We use a single, universal codebook (1st row) for all classes
Window descriptorWindow descriptor
1. Subdivide window into tiles.2. Compute a separate bag of PAS per tile3. Concatenate these semi-local bags
[Lazebnik et al. CVPR 2006]; [Dalal and Triggs CVPR 2005]
+ distinctive: records which PAS appear where weight PAS by average edge strength
+ flexible: soft-assign PAS to types rather coarse tiling
+ fast to compute using Integral Histograms
TrainingTraining1. Learn mean positive window dimensions2. Determine number of tiles T3. Collect positive example descriptors
4. Collect negative example descriptors: slide window over negative training images
TrainingTraining5. Train a linear SVM
Here a few of the top weighted descriptor vector dimensions (= 'PAS + tile'):
+ lie on object boundary (= local shape structure common to many training examples)
TestingTesting1. Slide window of aspect ratio , at multiple scales
2. SVM classify each window + non-maxima suppression
detections
Results – INRIA horsesResults – INRIA horses
+ tiling brings a substantial improvement optimum at T=30 -> keep this setting on all other experiments+ works well: 86% det-rate at 0.3 FPPI (with 50 pos + 50 neg training images)
Dataset: ~ Jurie and Schmid, CVPR 2004 170 positive + 170 negative images (training = 50 pos + 50 neg) wide range of scales; clutter
(missed and FP)
Results – INRIA horsesDataset: ~ Jurie and Schmid, CVPR 2004 170 positive + 170 negative images (training = 50 pos + 50 neg) wide range of scales; clutter
+ PAS better than any IP all interest point (IP) comparisons with T=10, and 120 feature types, (= optimum over INRIA horses, and ETHZ Shape Classes; all IP codebooks are class-specific)
(missed and FP)
Results – Weizmann-Shotton horsesDataset: Shotton et al., ICCV 2005 327 positive + 327 negative images (training = 50 pos + 50 neg) no scale changes; modest clutter
Shotton’s EER
- exact comparison to Shotton et al.: use their images and search at a single scale- PAS same performance (~92% precision-recall EER), but: + no need for segmented training images (only bounding-boxes) + can detect objects at multiple scales (see other experiments)
Results – ETHZ Shape ClassesResults – ETHZ Shape ClassesDataset: Ferrari et al., ECCV 2006 255 images, over 5 classes training = half of positive images for a class + same number from the other classes (1/4 from each) testing = all other images large scale changes; extensive clutter
Results – ETHZ Shape ClassesResults – ETHZ Shape ClassesDataset: Ferrari et al., ECCV 2006 255 images, over 5 classes training = half of positive images for a class + same number from the other classes (1/4 from each) testing = all other images large scale changes; extensive clutter
Missed
Results – ETHZ Shape Classes
+ mean det-rate at 0.4 FPPI = 79%
+ PAS >> I.P for apple logos, bottles, mugs PAS ~= IP for giraffes (texture!) PAS < IP for swan
+ overall best IP: Harris-Laplace
+ class specific IP codebooks
Giraffes Mugs Swans
Apple logos Bottles
Results – Caltech 101Results – Caltech 101Results – Caltech 101Dataset: Fei-Fei et al., GMBV 2004
42 anchor, 62 chair, 67 cup imagestrain = half + same number of caltech101 backgroundtesting = other half pos + same number of backgroundscale changes; only little clutter
Results – Caltech 101Dataset: Fei-Fei et al., GMBV 2004
On caltech101’s anchor, chair, cup:+ PAS better than Harris-Laplace+ mean PAS det-rate at 0.4 FPPI: 85%
Comparison to Dalal and Triggs CVPR 2005
Giraffes Mugs Swans
Apple logos Bottles
Comparison to Dalal and Triggs CVPR 2005
Caltech anchors Caltech chairs Caltech cups
INRIA horses Shotton horses
+ overall mean det-rate at 0.4 FPPI: PAS 82% >> HoG 58%
PAS >> HoG for 6 datasets PAS ~= HoG for 2 datasets PAS < HoG for 2 datasets
Generalizing PAS to Generalizing PAS to kkASASkAS: any path of length k through the contour segment network
segments connected in the network 3AS 4AS
• scale+translation invariant descriptor with dimensionality 4k-2• k = feature complexity; higher k -> more informative, but less repeatable kAS• overall mean det-rates (%)
1AS PAS 3AS 4AS 0.3 FPPI 69 77 64 57 0.4 FPPI 76 82 70 64
PAS do best !
ConclusionsConclusions
Connected local shape features for object class detection
Experiments on 10 diverse classes from 4 datasets show:
+ better suited than interest points for these shape-based classes
- fixed aspect-ratio window: sometimes inaccurate bounding-boxes
+ object detector deals with clutter, scale changes, intra-class variability
- single viewpoint
+ PAS have the best intermediate complexity among kAS
+ object detector compares favorably to HoG-based one
Current work: detecting object outlinesCurrent work: detecting object outlines
Training: learn the common boundaries from examples
Model• collection of PAS and their spatial variability• only common boundary
1. detect edges
Current work: detecting object outlinesCurrent work: detecting object outlinesDetection on a new image
2. match PAS based on descriptors
3. vote for translation + scaleinitializations
4. match deformable thin-plate spline based on deterministic annealing
Outline object in test image,without segmented training images !
A few preliminary resultsA few preliminary results