Visual Element Discovery as Discriminative Mode Seeking

Carl Doersch, Abhinav Gupta, Alexei A. EfrosCMU CMU UCB

The need for mid-level representations

6 billion images 70 billion images 1 billion images served daily

10 billion images

60 hours uploaded per minute

Almost 90% of web traffic is visual!

Discriminative patches

• Visual words are too simple

• Objects are too difficult

• Something in the middle?(Felzenswalb et al. 2008)

(Singh et al. 2012)

Mid-level “Visual Elements”

• Simple enough to be detected easily• Complex enough to be meaningful– “Meaningful” as measured by weak labels

(Doersch et al. 2012)

(Singh et al. 2012)

Mid-level “Visual Elements”

(Singh et al. 2012)

• Doersch et al. 2012• Singh et al. 2012• Jain et al. 2013• Endres et al. 2013• Juneja et al. 2013

• Li et al. 2013• Sun et al. 2013• Wang et al. 2013• Fouhey et al. 2013• Lee et al. 2013

Our goal

• Provide a mathematical optimization for visual elements

• Improve performance of mid-level representations.

Elements as Patch Classifiers

What if the labels are weak?

• E.g. image has horse/no-horse• (Or even weaker, like Paris/not-Paris)

• Idea: Label these all as “horse”

• Problem: 10,000 patches per image, most of which are unclassifiable.

The weaker the label, the bigger the problem.

Task: Learn to classify Paris from Not-Paris

Paris Also Paris

Other approaches

• Latent SVM:– Assumes we have one instance per positive image

• Multiple instance learning– Not clear how to define the bags

What if the labels are weak?

• Negatives are negatives, positives might not be positive

• Most of our data can be ignored• First: how to cluster without clustering everything

(Singh et al. 2012)

Mean shift

Patch distances

Min distance: 2.59e-4

Max distance: 1.22e-4

Input Nearest neighbor

Mean shift

Negative Set Not ParisParis

Density Ratios Not ParisParis

Adaptive Bandwidth NegativePositive

Bandwidth

Discriminative Mode Seeking

• Find local optima of an estimate of the density ratio

• Allow an adaptive bandwidth• Be extremely fast– Minimize the number of passes through the data

• Mean shift: maximize (w.r.t. w)

Centroid

Patch FeatureBandwidth

Distance

B(w) is the value of b satisfying:

optimize

• Distance metric: Normalized Correlation

optimize

NegativePositive

Optimization

• Initialization is straightforward• For each element, just keep around ~500

patches where wTx - b > 0• Trivially parallelizable in MapReduce.• Optimization is piecewise quadratic

Evaluation via Purity-Coverage Plot

• Analogous to Precision-Recall Plot

Low Purity

Element 1

Element 2

Element 3

Element 4

Element 5

High purity, Low Coverage

Element 1

Element 2

Element 3

Element 4

Element 5

0 2 4 6 8 100

0.10.20.30.40.50.60.70.80.9

Purity-Coverage Curve

ParisNot Paris

Purity

Coverage x1e4 pixels

Purity

ParisNot Paris Coverage

0 2 4 6 8 100

0.10.20.30.40.50.60.70.80.9

x1e4 pixels

• Coverage for multiple elements is simply the union.

Purity-Coverage

0 0.1 0.2 0.3 0.4 0.50.8

0 0.2 0.4 0.6 0.8

Coverage (fraction of positive dataset) Coverage (fraction of positive dataset)

Top 25 Elements Top 200 Elements

This workThis work, no inter-elementSVM Retrained 5x (Doersch et al. 2012)LDA Retrained 5xLDA RetrainedExemplar LDA (Hariharan et al. 2012)

Results on Indoor 67 Scenes

Kitchen Grocery Bowling

Elevator Bakery Bathroom

Results on Indoor 67 Scenes

Method Accuracy Method Accuracy

ROI+Gist (Quattoni et al.) 26.05 miSVM (Li et al.) 46.40

MM-Scene (Zhu et al.) 28.00 D. Patches (full) (Singh et al.) 49.40

Scene-DPM (Pandley et al.) 30.40 MMDL (Wang et al.) 50.15

CENTRIST (Wu et al.) 36.90 Discr. Parts (Sun et al.) 51.40

Object Bank (Li et al.) 37.60 IFV (Juneja et al.) 60.77

RBoW (Parizi et al.) 37.93 Bag of Parts+IFV (Juneja et al.) 63.10

Discr. Patches (Singh et al.) 38.10 Ours (no inter-element) 63.36

Latent Pyramid. (Sadeghi et al.) 44.84 Ours 64.03

Bag of Parts (Juneja et al.) 46.10 Ours+IFV 66.87

Qualitative Indoor67 Results

Indoor67: Error Analysis

Ground Truth (GT): deli GT: corridorGuess: grocery store Guess: staircase

GT: laundromat Guess: closetGT: museum Guess: garage

Ground Truth (GT): deli GT: corridorGuess: grocery store Guess: staircase

GT: laundromat Guess: closetGT: museum Guess: garage

Thank you!

More results athttp://graphics.cs.cmu.edu/projects/discriminativeModeSeeking/

Paris Elements • Indoor 67 ElementsIndoor 67 Heatmaps • Source code (soon)

Some New Paris Elements

Visual Element Discovery as Discriminative Mode Seeking

Documents

Transcript of Visual Element Discovery as Discriminative Mode Seeking

Mid-level Visual Element Discovery as Discriminative Mode ...papers.nips.cc/paper/5202-mid-level-visual-element-discovery-as... · we pose visual element discovery as discriminative

Machine Learning Classification, Discriminative …...Machine Learning Classiﬁcation, Discriminative learning Structured output, structured input, discriminative function, joint

Simultaneous Discovery of Common and Discriminative …hkim708/storage/kdd15_discnmf.pdfSimultaneous Discovery of Common and Discriminative Topics via Joint Nonnegative Matrix Factorization

20170806 Discriminative Optimization

Discriminative Orthogonal Neighborhood-Preserving ...

Discriminative Non-blind Deblurring

Hybrid Generative-Discriminative Visual Categorization.welling/publications/papers/holub_et_al_ijcv.pdf · Hybrid Generative-Discriminative Visual Categorization. ... Learning models

DISCOVERY OF DISCRIMINATIVE LC-MS AND 1H NMR METABOLOMICS ...

Discriminative Estimation (Maxentmodelsandperceptron)

Discriminative Random Fields

Semantic Concept Discovery for Large-Scale Zero-Shot Event ...yaoliang/mypapers/ijcai15.pdf · event. Discriminative concept classiﬁers are selected using the skip-gram language

Survey ICASSP 2007 Discriminative Training

Discriminative and Generative Classifiers

Maxent Models and Discriminative Estimation

LECTURE 25: DISCRIMINATIVE TRAINING

Mid-level Visual Element Discovery as Discriminative Mode Seeking

Discriminative Collaborative Representation for Classiﬁcationvigir.missouri.edu/~gdesouza/Research/Conference_CDs/ACCV_2014/... · Discriminative Collaborative Representation for

Unsupervised Discovery of Mid-Level Discriminative Patches · 2013. 1. 9. · The goal of this paper is to discover a set of discriminative ... dense bag-of-words [2], etc). A bit

Discriminative Multiple Target Tracking

Unsupervised Learning of Discriminative Attributes and Visual ......Figure 4. Unsupervised discovery of attribute codes. Each code bit corresponds to a hyperplane (dashed line) of