Lecture 17: object detection - Stanford Computer Vision...

68
Lecture 17 - Fei-Fei Li Lecture 17: object detection Professor Fei-Fei Li Stanford Vision Lab 30-Nov-11 1

Transcript of Lecture 17: object detection - Stanford Computer Vision...

Page 1: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Lecture 17:

object detection

Professor Fei-Fei Li

Stanford Vision Lab

30-Nov-111

Page 2: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Object detection

30-Nov-112

Page 3: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

What we will learn today?

• Implicit Shape Model

– Representation

– Recognition

– Experiments and results

• Deformable Models

– The PASCAL challenge

– Latent SVM Model

30-Nov-113

Page 4: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

What we will learn today?

• Implicit Shape Model

– Representation

– Recognition

– Experiments and results

• Deformable Models

– The PASCAL challenge

– Latent SVM Model

30-Nov-114

Page 5: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Implicit Shape Model (ISM)

• Basic ideas

– Learn an appearance codebook

– Learn a star-topology structural model

• Features are considered independent given obj. center

• Algorithm: probabilistic Gen. Hough Transform

– Exact correspondences → Prob. match to object part

– NN matching → Soft matching

– Feature location on obj. → Part location distribution

– Uniform votes → Probabilistic vote weighting

– Quantized Hough array → Continuous Hough space

x1

x3

x4

x6

x5

x2

Source: Bastian Leibe

30-Nov-115

Page 6: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Implicit Shape Model: Basic Idea

• Visual vocabulary is used to index votes for object

position [a visual word = “part”].

Training image

Visual codeword with

displacement vectors

Source: Bastian Leibe

B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and

Segmentation, International Journal of Computer Vision, Vol. 77(1-3), 2008.

30-Nov-116

Page 7: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

• Objects are detected as consistent configurations of

the observed parts (visual words).

Test image

Implicit Shape Model: Basic Idea

Source: Bastian Leibe

B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and

Segmentation, International Journal of Computer Vision, Vol. 77(1-3), 2008.

30-Nov-117

Page 8: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Implicit Shape Model - Representation

• Learn appearance codebook

– Extract local features at interest points

– Agglomerative clustering ⇒ codebook

• Learn spatial distributions

– Match codebook to training images

– Record matching positions on object

Training images(+reference segmentation)

Appearance codebook…

………

Spatial occurrence distributionsx

y

sx

y

s

x

y

sx

y

s

+ local figure-ground labelsSource: Bastian Leibe

30-Nov-118

Page 9: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Implicit Shape Model - Recognition

Interest Points Matched Codebook Entries

Probabilistic Voting

3D Voting Space(continuous)

x

y

s

Object Position

o,x

Image Feature

f

Interpretation(Codebook match)

Ci

)( fCp i ),,( lin Cxop

Probabilistic vote weighting(will be explained later in detail)

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

30-Nov-119

Page 10: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Implicit Shape Model - Recognition[L

eib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

BackprojectedHypotheses

Interest Points Matched Codebook Entries

Probabilistic Voting

3D Voting Space(continuous)

x

y

s

Backprojectionof Maxima

30-Nov-1110

Page 11: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Original image

Example: Results on Cows

Source: Bastian Leibe

30-Nov-1111

Page 12: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Original imageInterest points

Example: Results on Cows

Source: Bastian Leibe

30-Nov-1112

Page 13: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Original imageInterest pointsMatched patches

Example: Results on Cows

Source: Bastian Leibe

30-Nov-1113

Page 14: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Example: Results on Cows

Prob. VotesSource: Bastian Leibe

30-Nov-1114

Page 15: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

1st hypothesis

Example: Results on Cows

So

urc

e:

K.

Gra

um

an

& B

. Le

ibe

30-Nov-1115

Page 16: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

2nd hypothesis

Example: Results on Cows

Source: Bastian Leibe

30-Nov-1116

Page 17: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Example: Results on Cows

3rd hypothesisSource: Bastian Leibe

30-Nov-1117

Page 18: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

• Scale-invariant feature selection– Scale-invariant interest points

– Rescale extracted patches

– Match to constant-size codebook

• Generate scale votes– Scale as 3rd dimension in voting space

– Search for maxima in 3D voting space

Scale Invariant Voting

Search window

x

y

s

Source: Bastian Leibe

30-Nov-1118

Page 19: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Scale Voting: Efficient Computation

• Continuous Generalized Hough Transform

� Binned accumulator array similar to standard Gen. Hough Transf.

� Quickly identify candidate maxima locations

� Refine locations by Mean-Shift search only around those points

⇒ Avoid quantization effects by keeping exact vote locations.

⇒ Mean-shift interpretation as kernel prob. density estimation.

y

s

xRefinement(Mean-Shift)

y

s

xCandidatemaxima

y

s

Scale votesx

y

s

Binned accum. array

x

Source: Bastian Leibe

30-Nov-1119

Page 20: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

• Scale-adaptive Mean-Shift search for refinement

– Increase search window size with hypothesis scale

– Scale-adaptive balloon density estimatorThis image cannot currently be displayed.

Scale Voting: Efficient Computation

y

s

xRefinement(Mean-Shift)

y

s

xCandidatemaxima

y

s

Scale votesx

y

s

Binned accum. array

x

Source: Bastian Leibe

30-Nov-1120

Page 21: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -

Detection Results

• Qualitative Performance

– Recognizes different kinds of objects

– Robust to clutter, occlusion, noise, low contrast

Source: Bastian Leibe

21

Page 22: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Figure-Ground Segregation

• What happens first – segmentation or recognition?

• Problem extensively studied in Psychophysics

• Experiments with ambiguousfigure-ground stimuli

• Results:

– Evidence that object recognition canand does operate before figure-ground organization

– Interpreted as Gestalt cue familiarity.

M.A. Peterson, “Object Recognition Processes Can an d Do Operate Before Figure-Ground Organization”, Cur. Dir. in Psych. Sc., 3:105-111, 1994.

30-Nov-1122

Page 23: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

ISM – Top-Down Segmentation

BackprojectedHypotheses

Interest Points Matched Codebook Entries

Probabilistic Voting

Segmentation3D Voting Space

(continuous)

x

y

s

Backprojectionof Maxima

p(figure)Probabilities

[Leibe, Leonardis, Schiele, SLCV’04; IJCV’08]

30-Nov-1123

Page 24: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Top-Down Segmentation: Motivation

• Secondary hypotheses (“mixtures of cars/cows/etc.”)– Desired property of algorithm! ⇒ robustness to occlusion

– Standard solution: reject based on bounding box overlap

⇒ Problematic - may lead to missing detections!

⇒ Use segmentations to resolve ambiguities instead.

– Basic idea: each observed pixel can only be explained by (at most) one detection.

Source: Bastian Leibe

30-Nov-1124

Page 25: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

• Secondary hypotheses (“mixtures of cars/cows/etc.”)– Desired property of algorithm! ⇒ robustness to occlusion

– Standard solution: reject based on bounding box overlap

⇒ Problematic - may lead to missing detections!

⇒ Use segmentations to resolve ambiguities instead.

– Basic idea: each observed pixel can only be explained by (at most) one detection.

Top-Down Segmentation: Motivation

Source: Bastian Leibe

30-Nov-1125

Page 26: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Segmentation: Probabilistic Formulation

• Influence of patch on object hypothesis (vote weight)

( ) ( ) ( ) ( )( )xop

f,pfCpCxopxofp

n

i iin

n ,

||,,,

∑=l

l

( ) ( ) ( )∑∈

===),(

,|,,,,|,|l

ll

fnnn xofpxoffigurepxofigurep

p

pp

• Backprojection to features f and pixels p:

Segmentationinformation

Influence on object hypothesis

[Leibe, Leonardis, Schiele, SLCV’04; IJCV’08]

30-Nov-1126

Page 27: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Derivation: ISM Recognition

• Algorithm stages

1. Voting

2. Mean-shift search

3. Backprojection

• Vote weights: contribution of a single feature f

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

Object location

on,x

Image Feature fat location l

f

Codebook matches

Ci

)( fCp i ),,( lin Cxop

Matching probability

Occurrence distribution

30-Nov-1127

Page 28: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

• Algorithm stages

1. Voting

2. Mean-shift search

3. Backprojection

• Vote weights: contribution of a single feature f

� Probability that object on occurs at location x given (f,l)

( , , )ni

p o x f =∑l

)( fCp i

Matching probability

),,( lin Cxop

Occurrence distribution

)( fCp i

Matching probability

),,( lin Cxop

Occurrence distribution

Derivation: ISM Recognition

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

30-Nov-1128

Page 29: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Derivation: ISM Recognition

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

• Algorithm stages

1. Voting

2. Mean-shift search

3. Backprojection

• Vote weights: contribution of a single feature f

� Probability that object on occurs at location x given (f,l)

� How to measure those probabilities?

( , , )ni

p o x f =∑l

{ }1( ) , where | ( , )

| |i i ip C f C C d C fC

θ= = ≤

1( , , )

# ( )n ii

p o x Coccurrences C

=l

)( fCp i ),,( lin Cxop

θ f

Activatedcodebook entries

30-Nov-1129

Page 30: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Derivation: ISM Recognition

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

• Algorithm stages

1. Voting

2. Mean-shift search

3. Backprojection

• Vote weights: contribution of a single feature f

� Probability that object on occurs at location x given (f,l)

� Likelihood of the observed features given the object hypothesis

( , , )ni

p o x f =∑l )( fCp i ),,( lin Cxop

( ) ( ) ( )( )

( ) ( ) ( )( )

, | , |, || ,

, ,n i in i

nn n

p o x C p C f p f,p o x f, p f,p f, o x

p o x p o x= = ∑

l ll ll

( ),np o x : Prior for the object location( )p f,l : Indicator variable for sampled features

30-Nov-1130

Page 31: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Derivation: ISM Recognition

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

• Algorithm stages

1. Voting

2. Mean-shift search

3. Backprojection

• Vote weights: contribution of a single feature f

( ) ( ) ( )( )

( ) ( ) ( )( )

, | , |, || ,

, ,n i in i

nn n

p o x C p C f p f,p o x f, p f,p f, o x

p o x p o x= = ∑

l ll ll

( ) ( ) ( )( )

( ) ( ) ( )( )

, | , |, || ,

, ,n i in i

nn n

p o x C p C f p f,p o x f, p f,p f, o x

p o x p o x= = ∑

l ll ll

30-Nov-1131

Page 32: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Derivation: ISM Recognition

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

• Algorithm stages

1. Voting

2. Mean-shift search

3. Backprojection

• Vote weights: contribution of a single feature f

( ) ( ) ( )( )

( ) ( ) ( )( )

, | , |, || ,

, ,n i in i

nn n

p o x C p C f p f,p o x f, p f,p f, o x

p o x p o x= = ∑

l ll ll

30-Nov-1132

Page 33: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Derivation: ISM Recognition

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

• Algorithm stages

1. Voting

2. Mean-shift search

3. Backprojection

• Vote weights: contribution of a single feature f

( ) ( ) ( )( )

( ) ( ) ( )( )

, | , |, || ,

, ,n i in i

nn n

p o x C p C f p f,p o x f, p f,p f, o x

p o x p o x= = ∑

l ll ll

x

y

s

30-Nov-1133

Page 34: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Derivation: ISM Top-Down Segmentation

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

• Algorithm stages

1. Voting

2. Mean-shift search

3. Backprojection

• Vote weights: contribution of a single feature f

• Figure-ground backprojection

Fig./Gnd. labelfor each occurrence

Influence on object hypothesis

( ) ( ) ( ) ( )( )∑ ∑

==)p

pl

lll

f, i n

iinin xop

f,pfCpCxopCxofig.p

( ,

|,|,,,,|

( ) ( ) ( )( )

( ) ( ) ( )( )

, | , |, || ,

, ,n i in i

nn n

p o x C p C f p f,p o x f, p f,p f, o x

p o x p o x= = ∑

l ll ll

( ) == l,,,,| in Cfxofigurep p

30-Nov-1134

Page 35: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Derivation: ISM Top-Down Segmentation

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

• Algorithm stages

1. Voting

2. Mean-shift search

3. Backprojection

• Vote weights: contribution of a single feature f

• Figure-ground backprojection

Fig./Gnd. labelfor each occurrence

Influence on object hypothesis

( ) ( ) ( ) ( )( )∑ ∑

==)p

pl

lll

f, i n

iinin xop

f,pfCpCxopCxofig.p

( ,

|,|,,,,|

( ) ( ) ( )( )

( ) ( ) ( )( )

, | , |, || ,

, ,n i in i

nn n

p o x C p C f p f,p o x f, p f,p f, o x

p o x p o x= = ∑

l ll ll

( ) ∑==i

n fxofigurep l,,,|p

Marginalize overall codebook entries

matched to f

30-Nov-1135

Page 36: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Derivation: ISM Top-Down Segmentation

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

• Algorithm stages

1. Voting

2. Mean-shift search

3. Backprojection

• Vote weights: contribution of a single feature f

• Figure-ground backprojection

Fig./Gnd. labelfor each occurrence

Influence on object hypothesis

( ) ( ) ( ) ( )( )∑ ∑

==)p

pl

lll

f, i n

iinin xop

f,pfCpCxopCxofig.p

( ,

|,|,,,,|

( ) ( ) ( )( )

( ) ( ) ( )( )

, | , |, || ,

, ,n i in i

nn n

p o x C p C f p f,p o x f, p f,p f, o x

p o x p o x= = ∑

l ll ll

( ) ∑ ∑∈

==),(

,|lf i

n xofigurepp

p

Marginalize overall features contai-

ning pixel p

30-Nov-1136

Page 37: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Top-Down Segmentation Algorithm

• This may sound quite complicated, but it boils down to a very simple algorithm…

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

30-Nov-1137

Page 38: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Segmentation

• Interpretation of p(figure) map

� per-pixel confidence in object hypothesis

� Use for hypothesis verification

p(figure)

p(ground)

Segmentation

p(figure)

p(ground)

Original image

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

30-Nov-1138

Page 39: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Example Results: Motorbikes

[Leibe, Leonardis, Schiele, SLCV’04; IJCV’08]

30-Nov-1139

Page 40: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

• Training

– 112 hand-segmented images

• Results on novel sequences:

Single-frame recognition - No temporal continuity used!

Example Results: Cows

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

30-Nov-1140

Page 41: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Office chairs

Dining room chairs

Example Results: Chairs

Source: Bastian Leibe

30-Nov-1141

Page 42: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Detections Using Ground Plane Constraints

left camera 1175 frames

Battery of 5ISM detectorsfor differentcar views

[Leib

e,

Leonard

is,

Schie

le,

SLCV’0

4;

IJCV’0

8]

30-Nov-1142

Page 43: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

TrainingTraining

TestTest OutputOutput

[Thom

as,

Ferr

ari

, Tu

yte

laars

, Leib

e,

Van G

ool,

3D

RR’0

7;

RSS’0

8]

Inferring Other Information: Part Labels (1)

30-Nov-1143

Page 44: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Inferring Other Information: Part Labels (2)

[Thom

as,

Ferr

ari

, Tu

yte

laars

, Leib

e,

Van G

ool,

3D

RR’0

7;

RSS’0

8]

30-Nov-1144

Page 45: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Inferring Other Information: Depth Maps

“Depth from a single image”

[Thom

as,

Ferr

ari

, Tu

yte

laars

, Leib

e,

Van G

ool,

3D

RR’0

7;

RSS’0

8]

30-Nov-1145

Page 46: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

• Try to fit silhouette to detected person

• Basic idea– Search for the silhouette that simultaneously optimizes the

• Chamfer match to the distance-transformed edge image

• Overlap with the top-down segmentation

– Enforces global consistency

– Caveat: introduces again reliance on global model

Extension: Estimating Articulation

[Leibe, Seemann, Schiele, CVPR’05]

30-Nov-1146

Page 47: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

• Polar instead of Cartesian voting scheme

• Benefits:– Recognize objects under image-plane rotations

– Possibility to share parts between articulations.

• Caveats:– Rotation invariance should only be used when it’s really needed.

(Also increases false positive detections)

Extension: Rotation-Invariant Detection

[Mikolajczyk, Leibe, Schiele, CVPR’06]

θq

φ

dq

φ

θ

d

30-Nov-1147

Page 48: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Sometimes, Rotation Invariance Is Needed…

[Mikolajczyk et al., CVPR’06]

30-Nov-1148

Page 49: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

You Can Try It At Home…

• Linux binaries available

– Including datasets & several pre-trained detectors

– http://www.vision.ee.ethz.ch/bleibe/code

x

y

s

Source: Bastian Leibe

30-Nov-1149

Page 50: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Discussion: Implicit Shape Model• Pros:

– Works well for many different object categories• Both rigid and articulated objects

– Flexible geometric model• Can recombine parts seen on different training examples

– Learning from relatively few (50-100) training examples– Optimized for detection, good localization properties

• Cons:– Needs supervised training data

• Object bounding boxes for detection• Reference segmentations for top-down segm.

– Only weak geometric constraints• Result segmentations may contain superfluous

body parts.

– Purely representative model• No discriminative learning

Source: Bastian Leibe

30-Nov-1150

Page 51: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

What we will learn today?

• Implicit Shape Model

– Representation

– Recognition

– Experiments and results

• Deformable Models

– The PASCAL challenge

– Latent SVM Model

30-Nov-1151

Page 52: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Object Detection

– the PASCAL Challenge

• ~10,000 images, with ~25,000 target objects.

– Objects from 20 categories (person, car, bicycle, cow,

table...).

– Objects are annotated with labeled bounding boxes.

30-Nov-1152

So

urc

e:

Pe

dro

Fe

lze

nsw

alb

Page 53: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li 30-Nov-1153

Page 54: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Latent SVM Model: an Overview

30-Nov-1154

root filter part filters deformation

modelsdetection

So

urc

e:

Pe

dro

Fe

lze

nsw

alb

Page 55: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Histogram of Oriented Gradient (HOG) Features

• Image is partitioned into 8x8 pixel blocks.

• In each block we compute a histogram of gradient

orientations.

– Invariant to changes in lighting, small deformations, etc.

• We compute features at different resolutions (pyramid).

30-Nov-1155

So

urc

e:

Pe

dro

Fe

lze

nsw

alb

Page 56: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Filters

• Filters are rectangular templates defining weights for features.

• Score is dot product of filter and subwindow of HOG pyramid.

30-Nov-1156

HOG pyramid

W

Score of H at this location is H ⋅ W

H

Source: Pedro Felzenswalb

Page 57: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Object Hypothesis

30-Nov-1157

Multiscale model captures features at two-resolutions

Score is sum of filter scores

plus deformation scores

Page 58: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Training the Latent SVM Model

• Training data consists of images with labeled bounding boxes.

• Need to learn the model structure, filters and deformation costs.

30-Nov-1158

Training

Source: Pedro Felzenswalb

Page 59: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Connection with Linear Classifiers

• Score of model is sum of filter scores plus

deformation scores

– Bounding box in training data specifies that the score

should be high for some placement in a range

30-Nov-1159

w is a model

x is a detection window

z are filter placements

Concatenation of filters and

deformation parameters

Concatenation of features

and part displacements

Latent

SVM

Standard

SVM

Weight vector Features

Page 60: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Latent SVM Training

• Semi-convex optimization problem

– is convex in w

– convex if we fix z for positive examples

• Iterative optimization procedure:

– Initialize w

– Iterate:

• Pick best z for each positive example

• Optimize w via gradient descent with data mining

30-Nov-1160

Linear in w if z is fixed Observed variables Latent variables

Page 61: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Latent SVM Training: Initializing w

30-Nov-1161

• For k component mixture model:– Split examples into k sets based on bounding box aspect

ratio

• Learn k root filters using standard SVM– Training data: Warped positive examples and random

windows from negative images (Dalal & Triggs)

• Initialize parts by selecting patches from root filters:– Sub-windows with strong coefficients

– Interpolate to get higher resolution filters

– Initialize spatial model using fixed spring constants

Page 62: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Learned Models

30-Nov-1162

Page 63: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Example Results

30-Nov-1163

So

urc

e:

Pe

dro

Fe

lze

nsw

alb

Page 64: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

More Results

30-Nov-1164

Page 65: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Quantitative Results

• 9 systems competed in the 2007 challenge.

• Out of 20 classes:

– First place in 10 classes

– Second place in 6 classes

• Some statistics:

– It takes ~2 seconds to evaluate a model in one image.

– It takes ~3 hours to train a model.

– MUCH faster than most systems.

30-Nov-1165

Source: Pedro Felzenswalb

Page 66: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Code for Latent SVM

Source code for the system and models trained on PASCAL 2006, 2007 and 2008

data are available at:

http://www.cs.uchicago.edu/~pff/latent

30-Nov-1166

Source: Pedro Felzenswalb

Page 67: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

Summary

• Deformable models provide an elegant framework

for object detection and recognition.

– Efficient algorithms for matching models to images.

– Applications: pose estimation, medical image analysis,

object recognition, etc.

• We can learn models from partially labeled data.

– Generalized standard ideas from machine learning.

– Leads to state-of-the-art results in PASCAL challenge.

• Future work: hierarchical models, grammars, 3D

objects.

30-Nov-1167

Source: Pedro Felzenswalb

Page 68: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous

Lecture 17 -Fei-Fei Li

What we have learned today

• Implicit Shape Model

– Representation

– Recognition

– Experiments and results

• Deformable Models

– The PASCAL challenge

– Latent SVM Model

30-Nov-1168