AAAI08 tutorial: visual object recognition

285
ted Computing ial Visual Object Recognition Perceptual and Sensory Augment Visual Object Recognition Tutori Bastian Leibe & Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 Kristen Grauman Department of Computer Sciences University of Texas in Austin

Transcript of AAAI08 tutorial: visual object recognition

Page 1: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual Object Recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Bastian Leibe &

Computer Vision LaboratoryETH Zurich

Chicago, 14.07.2008

Kristen Grauman

Department of Computer SciencesUniversity of Texas in Austin

Page 2: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l ????

Identification vs. Categorization

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

2K. Grauman, B. Leibe

Page 3: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Object Categorization

• How to recognize ANY car

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

3K. Grauman, B. Leibe

• How to recognize ANY cow

Page 4: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

What could be done with recognition algorithms?

There is a wide range of applications, including…

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Medical image analysis

Navigation, driver safetyAutonomous robots Situated search

Content-based retrieval and analysis for images and videos

Page 5: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Object Categorization

• Task Description

� “Given a small number of training images of a category, recognize a-priori unknown instances of that category and assign the correct category label.”

• Which categories are feasible visually?

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

5K. Grauman, B. Leibe

•� Extensively studied in Cognitive Psychology,

e.g. [Brown’58]

GermanGermanGermanGerman

shepherdshepherdshepherdshepherd

animalanimalanimalanimaldogdogdogdog livinglivinglivingliving

beingbeingbeingbeing

“Fido”“Fido”“Fido”“Fido”

Page 6: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual Object Categories

• Basic Level Categories in human categorization [Rosch 76, Lakoff 87]

� The highest level at which category members have similar perceived shape

� The highest level at which a single mental image reflects the entire category

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

6K. Grauman, B. Leibe

entire category

� The level at which human subjects are usually fastest at identifying category members

� The first level named and understood by children

� The highest level at which a person uses similar motor actions for interaction with category members

Page 7: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual Object Categories

• Basic-level categories in humans seem to be defined predominantly visually.

• There is evidence that humans (usually)start with basic-level categorization before doing identification.

⇒⇒⇒⇒ Basic-level categorization is easierAbstract

animal

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

7K. Grauman, B. Leibe

⇒⇒⇒⇒ Basic-level categorization is easierand faster for humans than objectidentification!

⇒⇒⇒⇒ Most promising starting pointfor visual classification

Basic level

Individual level

Abstract levels

“Fido”

dog

quadruped

German

shepherdDoberman

cat cow

……

… …

Page 8: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Other Types of Categories

• Functional Categories

� e.g. chairs = “something you can sit on”

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

8K. Grauman, B. Leibe

Page 9: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Other Types of Categories

• Ad-hoc categories

� e.g. “something you can find in an office environment”

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

9K. Grauman, B. Leibe

Page 10: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Levels of Object Categorization

“cow”

“motorbike”

“car”

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

10K. Grauman, B. Leibe

• Different levels of recognition

� Which object class is in the image? ⇒⇒⇒⇒ Obj/Img classification

� Where is it in the image? ⇒⇒⇒⇒ Detection/Localization

� Where exactly ― which pixels? ⇒⇒⇒⇒ Figure/Ground segmentation

Page 11: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Challenges: robustness

Illumination Object pose Clutter

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Illumination Object pose Clutter

ViewpointIntra-class appearance

Occlusions

K. Grauman, B. Leibe

Page 12: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Challenges: robustness

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

12K. Grauman, B. Leibe

• Detection in Crowded Scenes� Learn object variability

– Changes in appearance, scale, and articulation

� Compensate for clutter, overlap, and occlusion

Page 13: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Challenges: context and human experience

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

K. Grauman, B. Leibe

Page 14: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Challenges: context and human experience

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Context cues Dynamics

Video credit: J. DavisImage credit: D. Hoeim

Page 15: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Challenges: scale, efficiency

• Thousands to millions of pixels in an image• Estimated 30 Gigapixels of image/video content

generated per second• About half of the cerebral cortex in primates is devoted

to processing visual information [Felleman and van Essen 1991]

• 3,000-30,000 human recognizable object categories

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• 3,000-30,000 human recognizable object categories• 30+ degrees of freedom in the pose of articulated

objects (humans)• Billions of images indexed by Google Image Search• 18 billion+ prints produced from digital camera images

in 2004• 295.5 million camera phones sold in 2005

K. Grauman, B. Leibe

Page 16: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Challenges: learning with minimal supervision

MoreLess

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

K. Grauman, B. Leibe

Page 17: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Rough evolution of focus in recognition research

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

1980s Currently1990s to early 2000s

K. Grauman, B. Leibe

Page 18: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

This tutorial

• Intended for broad AAAI audience

� Assuming basic familiarity with machine learning, linear algebra, probability

� Not assuming significant vision background

• Our goals

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• Our goals

� Describe main approaches to recognition

� Highlight past successes and future challenges

� Provide the pointers (to literature and tools) that would allow you to take advantage of existing techniques in your research

• Questions welcome

18K. Grauman, B. Leibe

Page 19: AAAI08 tutorial: visual object recognition

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

19K. Grauman, B. Leibe

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 20: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visual Object Recognition

Bastian Leibe &Computer Vision LaboratoryETH Zurich

Chicago, 14.07.2008

Kristen GraumanDepartment of Computer SciencesUniversity of Texas in Austin

Page 21: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

2K. Grauman, B. Leibe

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 22: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Detection via classification: Main idea

Car/non-car Classifier

Yes, car.No, not a car.

K. Grauman, B. Leibe

Basic component: a binary classifier

Page 23: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Detection via classification: Main idea

Car/non-car Classifier

K. Grauman, B. Leibe

If object may be in a cluttered scene, slide a window around looking for it.

Page 24: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Detection via classification: Main idea

Car/non-car Classifier

Feature extraction

Training examples

K. Grauman, B. Leibe

1. Obtain training data2. Define features3. Define classifier

Fleshing out this pipeline a bit more, we need to:

Page 25: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

6K. Grauman, B. Leibe

Detection via classification: Main idea

• Consider all subwindows in an imageSample at multiple scales and positions

• Make a decision per window:“Does this contain object category X or not?”

• In this section, we’ll focus specifically on methods using a global representation (i.e., not part-based, not local features).

Page 26: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Feature extraction: global appearance

Feature extraction

Simple holistic descriptions of image contentgrayscale / color histogramvector of pixel intensities

K. Grauman, B. Leibe

Page 27: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Eigenfaces: global appearance description

K. Grauman, B. LeibeTurk & Pentland, 1991

Training images

Mean

Eigenvectors computed from covariance matrix

Project new images to “face space”.

Recognition via nearest neighbors in face space

Generate low-dimensional representation of appearance with a linear subspace.

≈ + +Mean

+ +

...

An early appearance-based approach to face recognition

Page 28: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Feature extraction: global appearance

• Pixel-based representations sensitive to small shifts

• Color or grayscale-based appearance description can be sensitive to illumination and intra-class appearance variation

K. Grauman, B. Leibe

Cartoon example: an albino koala

Page 29: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Gradient-based representations

• Consider edges, contours, and (oriented) intensity gradients

K. Grauman, B. Leibe

Page 30: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Gradient-based representations: Matching edge templates

• Example: Chamfer matching

Template shape

Input image

Edges detected

Distance transform

Gavrila & Philomin ICCV 1999

Best match

At each window position, compute average min distance between points on template (T) and input (I).

K. Grauman, B. Leibe

Page 31: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

• Chamfer matching

Gavrila & Philomin ICCV 1999

Hierarchy of templates

Gradient-based representations: Matching edge templates

K. Grauman, B. Leibe

Page 32: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Gradient-based representations

• Consider edges, contours, and (oriented) intensity gradients

• Summarize local distribution of gradients with histogramLocally orderless: offers invariance to small shifts and rotationsContrast-normalization: try to correct for variable illumination

K. Grauman, B. Leibe

Page 33: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Gradient-based representations:Histograms of oriented gradients (HoG)

Dalal & Triggs, CVPR 2005

Map each grid cell in the input window to a histogram counting the gradients per orientation.

Code available: http://pascal.inrialpes.fr/soft/olt/

K. Grauman, B. Leibe

Page 34: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Gradient-based representations:SIFT descriptor

Lowe, ICCV 1999

Local patch descriptor (more on this later)

K. Grauman, B. Leibe

Code: http://vision.ucla.edu/~vedaldi/code/sift/sift.htmlBinary: http://www.cs.ubc.ca/~lowe/keypoints/

Page 35: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

K. Grauman, B. Leibe

Gradient-based representations:Biologically inspired features

Serre, Wolf, Poggio, CVPR 2005Mutch & Lowe, CVPR 2006

Convolve with Gabor filters at multiple orientations

Pool nearby units (max)

Intermediate layers compare inputto prototype patches

Page 36: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Gradient-based representations:Rectangular features

Compute differences between sums of pixels in rectangles

Captures contrast in adjacent spatial regions

Similar to Haar wavelets, efficient to compute

Viola & Jones, CVPR 2001K. Grauman, B. Leibe

Page 37: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Gradient-based representations:Shape context descriptor

Count the number of points inside each bin, e.g.:

Count = 4

Count = 10...

Log-polar binning: more precision for nearby points, more flexibility for farther points.

Belongie, Malik & Puzicha, ICCV 2001

K. Grauman, B. Leibe

Local descriptor (more on this later)

Page 38: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

• How to compute a decision for each subwindow?

Image feature

Classifier construction

K. Grauman, B. Leibe

Page 39: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Discriminative vs. generative models

0 10 20 30 40 50 60 700

0.05

0.1

0 10 20 30 40 50 60 700

0.5

1x = data

Plots from Antonio Torralba 2007

),Pr( carimage ),Pr( carimage ¬

)|Pr( imagecar )|Pr( imagecar¬

image feature

image feature

Generative: separately model class-conditional and prior densities

Discriminative: directly model posterior

K. Grauman, B. Leibe

Page 40: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Discriminative vs. generative models

• Generative:+ possibly interpretable+ can draw samples- models variability unimportant to classification task- often hard to build good model with few parameters

• Discriminative:+ appealing when infeasible to model data itself+ excel in practice- often can’t provide uncertainty in predictions- non-interpretable

21K. Grauman, B. Leibe

Page 41: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Discriminative methods

106 examples

Nearest neighbor

Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005...

Neural networks

LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998…

Support Vector Machines Conditional Random Fields

McCallum, Freitag, Pereira 2000; Kumar, Hebert 2003…

Guyon, VapnikHeisele, Serre, Poggio, 2001,…

Slide adapted from Antonio TorralbaK. Grauman, B. Leibe

Boosting

Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006,…

Page 42: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Boosting

• Build a strong classifier by combining number of “weak classifiers”, which need only be better than chance

• Sequential learning process: at each iteration, add a weak classifier

• Flexible to choice of weak learnerincluding fast simple classifiers that alone may be inaccurate

• We’ll look at Freund & Schapire’s AdaBoost algorithmEasy to implementBase learning algorithm for Viola-Jones face detector

23K. Grauman, B. Leibe

Page 43: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

AdaBoost: Intuition

24K. Grauman, B. Leibe

Figure adapted from Freund and Schapire

Consider a 2-d feature space with positive and negative examples.

Each weak classifier splits the training examples with at least 50% accuracy.

Examples misclassified by a previous weak learner are given more emphasis at future rounds.

Page 44: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

AdaBoost: Intuition

25K. Grauman, B. Leibe

Page 45: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

AdaBoost: Intuition

26K. Grauman, B. Leibe

Final classifier is combination of the weak classifiers

Page 46: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

AdaBoost AlgorithmStart with uniform weights on training examples

Evaluate weighted error for each feature, pick best.

Incorrectly classified -> more weight

Correctly classified -> less weight

Final classifier is combination of the weak ones, weighted according to error they had.

Freund & Schapire 1995

{x1,…xn}

Page 47: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Cascading classifiers for detection

For efficiency, apply less accurate but faster classifiers first to immediately discard windows that clearly appear to be negative; e.g.,

Filter for promising regions with an initial inexpensive classifier

Build a chain of classifiers, choosing cheap ones with low false negative rates early in the chain

28K. Grauman, B. Leibe

Fleuret & Geman, IJCV 2001Rowley et al., PAMI 1998Viola & Jones, CVPR 2001

Figure from Viola & Jones CVPR 2001

Page 48: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Example: Face detection

• Frontal faces are a good example of a class where global appearance models + a sliding window detection approach fit well:

Regular 2D structure

Center of face almost shaped like a “patch”/window

• Now we’ll take AdaBoost and see how the Viola-Jones face detector works

29K. Grauman, B. Leibe

Page 49: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Feature extraction

30K. Grauman, B. Leibe

Feature output is difference between adjacent regions

Viola & Jones, CVPR 2001

Efficiently computable with integral image: any sum can be computed in constant time

Avoid scaling images scale features directly for same cost

“Rectangular” filters

Value at (x,y) is sum of pixels above and to the left of (x,y)

Integral image

Page 50: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Large library of filters

Considering all possible filter parameters: position, scale, and type:

180,000+ possible features associated with each 24 x 24 window

Use AdaBoost both to select the informative features and to form the classifier

Viola & Jones, CVPR 2001

Page 51: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

AdaBoost for feature+classifier selection• Want to select the single rectangle feature and threshold

that best separates positive (faces) and negative (non-faces) training examples, in terms of weighted error.

Outputs of a possible rectangle feature on faces and non-faces.

Resulting weak classifier:

For next round, reweight the examples according to errors, choose another filter/threshold combo.

Viola & Jones, CVPR 2001

Page 52: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Viola-Jones Face Detector: Summary

• Train with 5K positives, 350M negatives• Real-time detector using 38 layer cascade• 6061 features in final layer• [Implementation available in OpenCV:

http://www.intel.com/technology/computing/opencv/]33

K. Grauman, B. Leibe

Faces

Non-faces

Train cascade of classifiers with

AdaBoost

Selected features, thresholds, and weights

New image

Appl

y to

eac

h

subw

indo

w

Page 53: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Viola-Jones Face Detector: Results

34K. Grauman, B. Leibe

First two features selected

Page 54: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Viola-Jones Face Detector: Results

Page 55: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Viola-Jones Face Detector: Results

Page 56: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Viola-Jones Face Detector: Results

Page 57: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Profile Features

Detecting profile faces requires training separate detector with profile examples.

Page 58: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Paul Viola, ICCV tutorial

Viola-Jones Face Detector: Results

Page 59: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

40K. Grauman, B. Leibe

Everingham, M., Sivic, J. and Zisserman, A."Hello! My name is... Buffy" - Automatic naming of characters in TV video,BMVC 2006. http://www.robots.ox.ac.uk/~vgg/research/nface/index.html

Example application

Frontal faces detected and then tracked, character names inferred with alignment of script and subtitles.

Page 60: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Pedestrian detection• Detecting upright, walking humans also possible using sliding

window’s appearance/texture; e.g.,

K. Grauman, B. Leibe

SVM with Haar wavelets [Papageorgiou & Poggio, IJCV 2000]

Space-time rectangle features [Viola, Jones & Snow, ICCV 2003]

SVM with HoGs [Dalal & Triggs, CVPR 2005]

Page 61: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Highlights

• Sliding window detection and global appearance descriptors:

Simple detection protocol to implementGood feature choices criticalPast successes for certain classes

42K. Grauman, B. Leibe

Page 62: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Limitations

• High computational complexity For example: 250,000 locations x 30 orientations x 4 scales = 30,000,000 evaluations!If training binary detectors independently, means cost increaseslinearly with number of classes

• With so many windows, false positive rate better be low

43K. Grauman, B. Leibe

Page 63: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Limitations (continued)

• Not all objects are “box” shaped

44K. Grauman, B. Leibe

Page 64: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Limitations (continued)

• Non-rigid, deformable objects not captured well with representations assuming a fixed 2d structure; or must assume fixed viewpoint

• Objects with less-regular textures not captured well with holistic appearance-based descriptions

45K. Grauman, B. Leibe

Page 65: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Limitations (continued)

• If considering windows in isolation, context is lost

46K. Grauman, B. LeibeFigure credit: Derek Hoiem

Sliding window Detector’s view

Page 66: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Limitations (continued)

• In practice, often entails large, cropped training set (expensive)

• Requiring good match to a global appearance description can lead to sensitivity to partial occlusions

47K. Grauman, B. LeibeImage credit: Adam, Rivlin, & Shimshoni

Page 67: AAAI08 tutorial: visual object recognition

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

48K. Grauman, B. Leibe

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 68: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual Object Recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Bastian Leibe &

Computer Vision LaboratoryETH Zurich

Chicago, 14.07.2008

Kristen Grauman

Department of Computer SciencesUniversity of Texas in Austin

Page 69: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

2K. Grauman, B. Leibe

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 70: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Motivation

• Global representations have major limitations

• Instead, describe and match only local regions

• Increased robustness to

� Occlusions

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

� Articulation

� Intra-category variations

3K. Grauman, B. Leibe

θq

φ

dq

φ

θ

d

Page 71: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Approach

A1

A2 A3

1. Find a set of distinctive key-points

2. Define a region around each keypoint

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

4K. Grauman, B. Leibe

Npixels

N pixels

Similarity

measureAf

e.g. color

Bf

e.g. color

Tffd BA <),(

3. Extract and normalize the region content

4. Compute a local descriptor from the normalized region

5. Match local descriptors

Page 72: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Requirements

• Region extraction needs to be repeatable and precise

� Translation, rotation, scale changes

� (Limited out-of-plane (≈≈≈≈affine) transformations)

� Lighting variations

• We need a sufficient number of regions to cover the

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

• We need a sufficient number of regions to cover the object

• The regions should contain “interesting” structure

5K. Grauman, B. Leibe

Page 73: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Many Existing Detectors Available

• Hessian & Harris [Beaudet ‘78], [Harris ‘88]

• Laplacian, DoG [Lindeberg ‘98], [Lowe 1999]

• Harris-/Hessian-Laplace [Mikolajczyk & Schmid ‘01]

• Harris-/Hessian-Affine [Mikolajczyk & Schmid ‘04]

• EBR and IBR [Tuytelaars & Van Gool ‘04]

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

• MSER [Matas ‘02]

• Salient Regions [Kadir & Brady ‘01]

• Others…

6K. Grauman, B. Leibe

Page 74: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Keypoint Localization

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• Goals:

� Repeatable detection

� Precise localization

� Interesting content

⇒⇒⇒⇒ Look for two-dimensional signal changes

7K. Grauman, B. Leibe

Page 75: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Hessian Detector [Beaudet78]

• Hessian determinant

=

yyxy

xyxx

II

IIIHessian )(

Ixx

I

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

8K. Grauman, B. Leibe

Iyy

Ixy

Intuition: Search for strongderivatives in two orthogonal directions

Page 76: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Hessian Detector [Beaudet78]

• Hessian determinant

Ixx

I

=

yyxy

xyxx

II

IIIHessian )(

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

9K. Grauman, B. Leibe

IyyIxy

2))(det( xyyyxx IIIIHessian −=

2)^(. xyyyxx III −∗

In Matlab:

Page 77: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Hessian Detector – Responses [Beaudet78]

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

10

Effect: Responses mainly on corners and strongly textured areas.

Page 78: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Hessian Detector – Responses [Beaudet78]

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

11

Page 79: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Harris Detector [Harris88]

• Second moment matrix(autocorrelation matrix)

∗=

)()(

)()()(),(

2

2

DyDyx

DyxDx

IDIIII

IIIg

σσ

σσσσσµ

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

12K. Grauman, B. Leibe

Intuition: Search for local neighborhoods where the image content has two main directions (eigenvectors).

Page 80: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Harris Detector [Harris88]

• Second moment matrix(autocorrelation matrix)

I I

∗=

)()(

)()()(),(

2

2

DyDyx

DyxDx

IDIIII

IIIg

σσ

σσσσσµ

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

13K. Grauman, B. Leibe

1. Image derivatives

gx(σD), gy(σD),

IxIy

Page 81: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Harris Detector [Harris88]

• Second moment matrix(autocorrelation matrix)

∗=

)()(

)()()(),(

2

2

DyDyx

DyxDx

IDIIII

IIIg

σσ

σσσσσµ

I I

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

14K. Grauman, B. Leibe

1. Image derivatives

gx(σD), gy(σD),

IxIy

14

2. Square of

derivatives

Ix2 Iy

2 IxIy

Page 82: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Harris Detector [Harris88]

• Second moment matrix(autocorrelation matrix)

I

∗=

)()(

)()()(),(

2

2

DyDyx

DyxDx

IDIIII

IIIg

σσ

σσσσσµ

1. Image

derivatives

Ix Iy

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

1. Image derivatives

gx(σD), gy(σD),

2. Square of

derivatives

Iy

2. Square of

derivatives

3. Gaussian

filter g(σI)

Ix2 Iy

2 IxIy

g(Ix2) g(Iy

2) g(IxIy)

15

Page 83: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Harris Detector [Harris88]

• Second moment matrix(autocorrelation matrix)

I

∗=

)()(

)()()(),(

2

2

DyDyx

DyxDx

IDIIII

IIIg

σσ

σσσσσµ

1. Image

derivatives

2. Square of

derivatives

Ix Iy

Ix2 Iy

2 IxIy

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Iy

g(IxIy)

16

derivatives

3. Gaussian

filter g(σI)g(Ix

2) g(Iy2) g(IxIy)

222222)]()([)]([)()( yxyxyx IgIgIIgIgIg +−− α

=−= ))],([trace()],(det[ DIDIhar σσµασσµ

4. Cornerness function – both eigenvalues are strong

har5. Non-maxima suppression

Page 84: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Harris Detector – Responses [Harris88]

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

17

Effect: A very precise corner detector.

Page 85: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Harris Detector – Responses [Harris88]

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

18

Page 86: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Automatic Scale Selection

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

19K. Grauman, B. Leibe

)),(( )),((11

σσ ′′= xIfxIfmm iiii KK

Same operator responses if the patch contains the same image up to scale factor

How to find corresponding patch sizes?

Page 87: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Automatic Scale Selection

• Function responses for increasing scale (scale signature)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

20K. Grauman, B. Leibe

)),((1

σxIfmii K

)),((1

σxIfmii

′K

Page 88: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Automatic Scale Selection

• Function responses for increasing scale (scale signature)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

21K. Grauman, B. Leibe

)),((1

σxIfmii K

)),((1

σxIfmii

′K

Page 89: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Automatic Scale Selection

• Function responses for increasing scale (scale signature)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

22K. Grauman, B. Leibe

)),((1

σxIfmii K

)),((1

σxIfmii

′K

Page 90: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Automatic Scale Selection

• Function responses for increasing scale (scale signature)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

23K. Grauman, B. Leibe

)),((1

σxIfmii K

)),((1

σxIfmii

′K

Page 91: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Automatic Scale Selection

• Function responses for increasing scale (scale signature)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

24K. Grauman, B. Leibe

)),((1

σxIfmii K

)),((1

σxIfmii

′K

Page 92: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Automatic Scale Selection

• Function responses for increasing scale (scale signature)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

25K. Grauman, B. Leibe

)),((1

σxIfmii K

)),((1

σ ′′xIfmii K

Page 93: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

What Is A Useful Signature Function?

• Laplacian-of-Gaussian = “blob” detector

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

26K. Grauman, B. Leibe

Page 94: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Laplacian-of-Gaussian (LoG)

• Local maxima in scale space of Laplacian-of-Gaussian

)()( σσ LL +

σσσσ4444

σσσσ5555

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

27K. Grauman, B. Leibe

)()( σσ yyxx LL +

σσσσ

σσσσ2222

σσσσ3333

⇒⇒⇒⇒ List of(x, y, s)

Page 95: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Results: Laplacian-of-Gaussian

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

28K. Grauman, B. Leibe

Page 96: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Difference-of-Gaussian (DoG)

• Difference of Gaussians as approximation of theLaplacian-of-Gaussian

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

29K. Grauman, B. Leibe

- =

Page 97: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

DoG – Efficient Computation

• Computation in Gaussian scale pyramid

Sampling with

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

30K. Grauman, B. Leibe

σσσσ

Original image4

1

2=σ

Sampling withstep σσσσ4444 =2

σσσσ

σσσσ

σσσσ

Page 98: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Results: Lowe’s DoG

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

31K. Grauman, B. Leibe

Page 99: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Harris-Laplace [Mikolajczyk ‘01]

1. Initialization: Multiscale Harris corner detection

σσσσ4444

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

32

σσσσ

σσσσ2222

σσσσ3333

Computing Harris function Detecting local maxima

Page 100: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Harris-Laplace [Mikolajczyk ‘01]

1. Initialization: Multiscale Harris corner detection

2. Scale selection based on Laplacian(same procedure with Hessian ⇒⇒⇒⇒ Hessian-Laplace)

Harris points

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

33K. Grauman, B. Leibe

Harris-Laplace points

Page 101: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Maximally Stable Extremal Regions [Matas ‘02]

• Based on Watershed segmentation algorithm

• Select regions that stay stable over a large parameter range

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

34K. Grauman, B. Leibe

Page 102: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Example Results: MSER

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

35K. Grauman, B. Leibe

Page 103: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

You Can Try It At Home…

• For most local feature detectors, executables are available online:

• http://robots.ox.ac.uk/~vgg/research/affine

• http://www.cs.ubc.ca/~lowe/keypoints/

• http://www.vision.ee.ethz.ch/~surf

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

36K. Grauman, B. Leibe

Page 104: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Orientation Normalization

• Compute orientation histogram

• Select dominant orientation

• Normalize: rotate to fixed orientation

[Lowe, SIFT, 1999]

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

T. Tuytelaars, B. Leibe

370 2π

Page 105: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Local Descriptors

• The ideal descriptor should be

� Repeatable

� Distinctive

� Compact

� Efficient

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

• Most available descriptors focus on edge/gradient information

� Capture texture information

� Color still relatively seldomly used (more suitable for homogenous regions)

38K. Grauman, B. Leibe

Page 106: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Local Descriptors: SIFT Descriptor

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

[Lowe, ICCV 1999]

Histogram of oriented gradients

• Captures important texture information

• Robust to small translations /affine deformations

K. Grauman, B. Leibe

Page 107: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Local Descriptors: SURF

• Fast approximation of SIFT idea

� Efficient computation by 2D box filters & integral images⇒⇒⇒⇒ 6 times faster than SIFT

� Equivalent quality for object identification

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

40K. Grauman, B. Leibe

[Bay, ECCV’06], [Cornelis, CVGPU’08]

• GPU implementation available

� Feature extraction @ 100Hz(detector + descriptor, 640×480 img)

� http://www.vision.ee.ethz.ch/~surf

Page 108: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Local Descriptors: Shape Context

Count the number of points inside each bin, e.g.:

Count = 4

Count = 10...

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Count = 10

Log-polar binning: more precision for nearby points, more flexibility for farther points.

Belongie & Malik, ICCV 2001K. Grauman, B. Leibe

Page 109: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Local Descriptors: Geometric Blur

Compute edges

at four

orientations

Extract a patch

in each channel

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Example descriptor

~

in each channel

Apply spatially varying

blur and sub-sample

(Idealized signal)

Berg & Malik, CVPR 2001K. Grauman, B. Leibe

Page 110: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

So, What Local Features Should I Use?

• There have been extensive evaluations/comparisons

� [Mikolajczyk et al., IJCV’05, PAMI’05]

� All detectors/descriptors shown here work well

• Best choice often application dependent

� MSER works well for buildings and printed things

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

� MSER works well for buildings and printed things

� Harris-/Hessian-Laplace/DoG work well for many natural categories

• More features are better

� Combining several detectors often helps

43K. Grauman, B. Leibe

Page 111: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

44K. Grauman, B. Leibe

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 112: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual Object Recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Bastian Leibe &

Computer Vision LaboratoryETH Zurich

Chicago, 14.07.2008

Kristen Grauman

Department of Computer SciencesUniversity of Texas in Austin

Page 113: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

2K. Grauman, B. Leibe

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 114: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Recognition with Local Features

• Image content is transformed into local features that are invariant to translation, rotation, and scale

• Goal: Verify if they belong to a consistent configuration

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

3K. Grauman, B. Leibe

Local Features, e.g. SIFT

Slide credit: David Lowe

Page 115: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Finding Consistent Configurations

• Global spatial models

� Generalized Hough Transform [Lowe99]

� RANSAC [Obdrzalek02, Chum05, Nister06]

� Basic assumption: object is planar

• Assumption is often justified in practice

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

4K. Grauman, B. Leibe

• Assumption is often justified in practice

� Valid for many structures on buildings

� Sufficient for small viewpoint variations on 3D objects

Page 116: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Hough Transform

• Origin: Detection of straight lines in clutter� Basic idea: each candidate point votes

for all lines that it is consistent with.

� Votes are accumulated in quantized array

� Local maxima correspond to candidate lines

• Representation of a line

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

5K. Grauman, B. Leibe

• Representation of a line� Usual form y = a x + b has a singularity around 90º.

� Better parameterization: x cos(θθθθ) + y sin(θθθθ) = ρ

ρ

θx

y

θ

ρ

x

y

Page 117: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Hough Transform: Noisy Line

ρ

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

7K. Grauman, B. Leibe

• Problem: Finding the true maximum

Tokens Votesθ

Slide credit: David Lowe

Page 118: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Hough Transform: Noisy Input

ρ

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

8K. Grauman, B. Leibe

• Problem: Lots of spurious maxima

Tokens Votes

Slide credit: David Lowe

θ

Page 119: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Generalized Hough Transform [Ballard81]

• Generalization for an arbitrary contour or shape

� Choose reference point for the contour (e.g. center)

� For each point on the contour remember where it is located w.r.t. to the reference point

� Remember radius r and angle φrelative to the contour tangent

Recognition: whenever you find

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

9K. Grauman, B. Leibe

� Recognition: whenever you find a contour point, calculate the tangent angle and ‘vote’ for all possible reference points

� Instead of reference point, can also vote for transformation

⇒⇒⇒⇒ The same idea can be used with local features!

Slide credit: Bernt Schiele

Page 120: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Gen. Hough Transform with Local Features

• For every feature, store possible “occurrences”

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

10K. Grauman, B. Leibe

– Object identity

– Pose

– Relative position

• For new image, let the matched features vote for possible object positions

Page 121: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

3D Object Recognition

• Gen. HT for Recognition

� Typically only 3 feature matches needed for recognition

� Extra matches provide robustness

� Affine model can be used for planar objects

[Lowe99]

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

12K. Grauman, B. Leibe Slide credit: David Lowe

Page 122: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

View Interpolation

• Training

� Training views from similar viewpoints are clusteredbased on feature matches.

� Matching features between adjacent views are linked.

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

13K. Grauman, B. Leibe

• Recognition

� Feature matches may bespread over several training viewpoints.

⇒⇒⇒⇒ Use the known links to “transfer votes” to other viewpoints.

Slide credit: David Lowe

[Lowe01]

Page 123: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Recognition Using View Interpolation

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

14K. Grauman, B. Leibe Slide credit: David Lowe

[Lowe01]

Page 124: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Location Recognition

Training

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

15K. Grauman, B. Leibe Slide credit: David Lowe

Training

[Lowe04]

Page 125: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Applications

• Sony Aibo(Evolution Robotics)

• SIFT usage

� Recognize docking station

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

16K. Grauman, B. Leibe

docking station

� Communicate with visual cards

• Other uses

� Place recognition

� Loop closure in SLAM

Slide credit: David Lowe

Page 126: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

RANSAC (RANdom SAmple Consensus) [Fischler81]

• Randomly choose a minimal subset of data points necessary to fit a model (a sample)

• Points within some distance threshold t of model are a consensus set. Size of consensus set is model’s support.

• Repeat for N samples; model with biggest support is most robust fit

Points within distance of best model are inliers

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

17K. Grauman, B. Leibe

� Points within distance t of best model are inliers

� Fit final model to all inliers

Slide credit: David Lowe

Page 127: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

RANSAC: How many samples?

• How many samples are needed?� Suppose w is fraction of inliers (points from line).

� n points needed to define hypothesis (2 for lines)

� k samples chosen.

• Prob. that a single sample of n points is correct:

• Prob. that all samples fail is:

nw

knw )1( −

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

19K. Grauman, B. Leibe

• Prob. that all samples fail is:

⇒⇒⇒⇒ Choose k high enough to keep this below desired failure rate.

knw )1( −

Slide credit: David Lowe

Page 128: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

After RANSAC

• RANSAC divides data into inliers and outliers and yields estimate computed from minimal set of inliers

• Improve this initial estimate with estimation over all inliers (e.g. with standard least-squares minimization)

• But this may change inliers, so alternate fitting with re-classification as inlier/outlier

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

21K. Grauman, B. Leibe

classification as inlier/outlier

Slide credit: David Lowe

Page 129: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Finding Feature Matches

• Find best stereo match within a square search window (here 300 pixels2)

• Global transformation model: epipolar geometry

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

22K. Grauman, B. Leibe

from Hartley & Zisserman

Slide credit: David Lowe

Page 130: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Finding Feature Matches

• Find best stereo match within a square search window (here 300 pixels2)

• Global transformation model: epipolar geometry

before RANSAC after RANSAC

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

23K. Grauman, B. Leibe

from Hartley & Zisserman

Slide credit: David Lowe

Page 131: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Comparison

Gen. Hough Transform

• Advantages

� Very effective for recognizing arbitrary shapes or objects

� Can handle high percentage of outliers (>95%)

� Extracts groupings from clutter in linear time

RANSAC

• Advantages

� General method suited to large range of problems

� Easy to implement

� Independent of number of dimensions

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

24K. Grauman, B. Leibe

linear time

• Disadvantages

� Quantization issues

� Only practical for small number of dimensions (up to 4)

• Improvements available

� Probabilistic Extensions

� Continuous Voting Space

• Disadvantages

� Only handles moderate number of outliers (<50%)

• Many variants available, e.g.

� PROSAC: Progressive RANSAC [Chum05]

� Preemptive RANSAC [Nister05]

[Leibe08]

Page 132: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example Applications

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

25B. Leibe

Mobile tourist guide• Self-localization• Object/building recognition• Photo/video augmentation

[Quack, Leibe, Van Gool, CIVR’08]

Page 133: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Web Demo: Movie Poster Recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

26K. Grauman, B. Leibe

http://www.kooaba.com/en/products_engine.html#

50’000 movie

posters indexed

Query-by-image

from mobile phone

available in Switzer-

land

Page 134: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Application: Large-Scale Retrieval

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

27K. Grauman, B. Leibe [Philbin CVPR’07]

Query Results from 5k Flickr images (demo available for 100k set)

Page 135: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

Application: Image Auto-AnnotationMoulin Rouge

Tour Montparnasse Colosseum

Old Town Square (Prague)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lV

isu

al

Ob

jec

t R

ec

og

nit

ion

Tu

tori

al

28K. Grauman, B. Leibe

Left: Wikipedia image

Right: closest match from Flickr

[Quack CIVR’08]

Colosseum

Viktualienmarkt

Maypole

Page 136: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

29K. Grauman, B. Leibe

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 137: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual Object Recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Bastian Leibe &

Computer Vision LaboratoryETH Zurich

Chicago, 14.07.2008

Kristen Grauman

Department of Computer SciencesUniversity of Texas in Austin

Page 138: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

2K. Grauman, B. Leibe

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Feature Sets

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 139: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Global representations: limitations

• Success may rely on alignment -> sensitive to viewpoint

• All parts of the image or window impact the description -> sensitive to occlusion, clutter

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

3K. Grauman, B. Leibe

Page 140: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Local representations

• Describe component regions or patches separately.

• Many options for detection & description…

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

4K. Grauman, B. Leibe

Superpixels

[Ren et al.]

Shape context

[Belongie 02]

Maximally Stable

Extremal Regions

[Matas 02]

Geometric Blur

[Berg 05]

SIFT [Lowe 99]

Salient regions

[Kadir 01]

Harris-Affine

[Mikolajczyk 04]Spin images

[Johnson 99]

Page 141: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Recall: Invariant local features

Subset of local feature types designed to be invariant to

� Scale

� Translation

� Rotation

� Affine transformations

y1 y2…

yd

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

� Affine transformations

� Illumination

1) Detect interest points

2) Extract descriptors

x1 x2…

xd

[Mikolajczyk01, Matas02, Tuytelaars04, Lowe99, Kadir01,… ]

K. Grauman, B. Leibe

Page 142: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Recognition with local feature sets

• Previously, we saw how to use local invariant features + a global spatial model to recognize specific objects, using a planar object assumption.

• Now, we’ll use local features for

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• Now, we’ll use local features for

� Indexing-based recognition

� Bags of words representations

� Correspondence / matching kernels

6K. Grauman, B. Leibe

Page 143: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Basic flow

…Index each one into pool of descriptors from previously seen images

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

7K. Grauman, B. Leibe

Detect or sample features

Describe features

List of positions,

scales,

orientations

Associated list of

d-dimensional

descriptors

Page 144: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Indexing local features

• Each patch / region has a descriptor, which is a point in some high-dimensional feature space (e.g., SIFT)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

K. Grauman, B. Leibe

Page 145: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Indexing local features

• When we see close points in feature space, we have similar descriptors, which indicates similar local content.

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Figure credit: A. Zisserman K. Grauman, B. Leibe

Page 146: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Indexing local features

• We saw in the previous section how to use voting and pose clustering to identify objects using local features

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

10K. Grauman, B. Leibe

Figure credit: David Lowe

Page 147: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Indexing local features

• With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image?

� Low-dimensional descriptors : can use standard efficient

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

� Low-dimensional descriptors : can use standard efficient data structures for nearest neighbor search

� High-dimensional descriptors: approximate nearest neighbor search methods more practical

� Inverted file indexing schemes

11K. Grauman, B. Leibe

Page 148: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Indexing local features: approximate nearest neighbor search

Best-Bin First (BBF), a variant of k-d trees that uses priority queue to examine most promising branches first [Beis & Lowe, CVPR 1997]

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

12K. Grauman, B. Leibe

Locality-Sensitive Hashing (LSH), a randomized hashing technique using hash functions that map similar points to the same bin, with high probability [Indyk & Motwani, 1998]

Page 149: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• For text documents, an efficient way to find all pages on which a word occurs is to use an index…

• We want to find all

Indexing local features: inverted file index

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• We want to find all images in which a feature occurs.

• To use this idea, we’ll need to map our features to “visual words”.

13K. Grauman, B. Leibe

Page 150: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual words: main idea

• Extract some local features from a number of images …

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

14K. Grauman, B. Leibe

e.g., SIFT descriptor space: each

point is 128-dimensional

Slide credit: D. Nister

Page 151: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual words: main idea

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

15K. Grauman, B. LeibeSlide credit: D. Nister

Page 152: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual words: main idea

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

16K. Grauman, B. LeibeSlide credit: D. Nister

Page 153: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual words: main idea

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

17K. Grauman, B. LeibeSlide credit: D. Nister

Page 154: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lPerceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

18K. Grauman, B. LeibeSlide credit: D. Nister

Page 155: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lPerceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

19K. Grauman, B. LeibeSlide credit: D. Nister

Page 156: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual words: main idea

Map high-dimensional descriptors to tokens/words by quantizing the feature space

• Quantize via

clustering, let

cluster centers be

the prototype

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

20K. Grauman, B. Leibe

the prototype

“words”

Descriptor space

Page 157: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual words: main idea

Map high-dimensional descriptors to tokens/words by quantizing the feature space

• Determine which

word to assign to

each new image

region by finding

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

21K. Grauman, B. Leibe

region by finding

the closest cluster

center.

Descriptor space

Page 158: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual words

• Example: each group of patches belongs to the same visual word

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

22K. Grauman, B. Leibe

Figure from Sivic & Zisserman, ICCV 2003

Page 159: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual words

• First explored for texture and material representations

• Texton = cluster center of filter responses over collection of images

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

collection of images

• Describe textures and materials based on distribution of prototypical texture elements.

Leung & Malik 1999; Varma &

Zisserman, 2002; Lazebnik,

Schmid & Ponce, 2003;

Page 160: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual words

• More recently used for describing scenes and objects for the sake of indexing or classification.

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

24K. Grauman, B. Leibe

Sivic & Zisserman 2003;

Csurka, Bray, Dance, & Fan

2004; many others.

Page 161: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Inverted file index for images comprised of visual words

Word number

List of image numbers

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Image credit: A. Zisserman K. Grauman, B. Leibe

Page 162: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Bags of visual words

• Summarize entire image based on its distribution (histogram) of word occurrences.

• Analogous to bag of words representation commonly

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

representation commonly used for documents.

26K. Grauman, B. LeibeImage credit: Fei-Fei Li

Page 163: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Video Google System

1. Collect all words within query region

2. Inverted file index to find relevant frames

3. Compare word counts

4. Spatial verification

Query

region

Retrie

ved fra

mes

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

4. Spatial verification

Sivic & Zisserman, ICCV 2003

• Demo online at : http://www.robots.ox.ac.uk/~vgg/research/vgoogle/index.html

27K. Grauman, B. Leibe

Retrie

ved fra

mes

Page 164: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Basic flow

…Index each one into pool of descriptors from previously seen images

or

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

28K. Grauman, B. Leibe

Detect or sample features

Describe features

List of positions,

scales,

orientations

Associated list of

d-dimensional

descriptors

Quantize to form bag of words vector for the image

Page 165: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual vocabulary formation

Issues:

• Sampling strategy

• Clustering / quantization algorithm

• Unsupervised vs. supervised

• What corpus provides features (universal vocabulary?)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• Vocabulary size, number of words

29K. Grauman, B. Leibe

Page 166: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Sampling strategies

Dense, uniformly Sparse, at

interest pointsRandomly

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

30K. Grauman, B. LeibeImage credits: F-F. Li, E. Nowak, J. Sivic

interest points

Multiple interest

operators

• To find specific, textured objects, sparse

sampling from interest points often more

reliable.

• Multiple complementary interest operators

offer more image coverage.

• For object categorization, dense sampling

offers better coverage.

[See Nowak, Jurie & Triggs, ECCV 2006]

Page 167: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Clustering / quantization methods

• k-means (typical choice), agglomerative clustering, mean-shift,…

• Hierarchical clustering: allows faster insertion / word assignment while still allowing large vocabularies

Vocabulary tree [Nister & Stewenius, CVPR 2006]

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

� Vocabulary tree [Nister & Stewenius, CVPR 2006]

31K. Grauman, B. Leibe

Page 168: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Recognition with Vocabulary Tree

• Tree construction:

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

32K. Grauman, B. Leibe Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 169: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Vocabulary Tree

• Training: Filling the tree

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

33K. Grauman, B. Leibe Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 170: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Vocabulary Tree

• Training: Filling the tree

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

34K. Grauman, B. Leibe Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 171: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Vocabulary Tree

• Training: Filling the tree

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

35K. Grauman, B. Leibe Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 172: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Vocabulary Tree

• Training: Filling the tree

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

36K. Grauman, B. Leibe Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 173: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Vocabulary Tree

• Training: Filling the tree

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

37K. Grauman, B. Leibe Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 174: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Vocabulary Tree

• Recognition

RANSAC

verification

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

38K. Grauman, B. Leibe Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 175: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Vocabulary Tree: Performance

• Evaluated on large databases

� Indexing with up to 1M images

• Online recognition for databaseof 50,000 CD covers

Retrieval in ~1s

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

39K. Grauman, B. Leibe

� Retrieval in ~1s

• Find experimentally that large vocabularies can be beneficial for recognition

[Nister & Stewenius, CVPR’06]

Page 176: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Vocabulary formation

• Ensembles of trees provide additional robustness

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Figure credit: F. Jurie K. Grauman, B. Leibe

Moosmann, Jurie, & Triggs 2006; Yeh, Lee, & Darrell 2007;

Bosch, Zisserman, & Munoz 2007; …

Page 177: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Supervised vocabulary formation

• Recent work considers how to leverage labeled images when constructing the vocabulary

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

41K. Grauman, B. Leibe

Perronnin, Dance, Csurka, & Bressan, Adapted Vocabularies for

Generic Visual Categorization, ECCV 2006.

Page 178: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Supervised vocabulary formation

• Merge words that don’t aid in discriminability

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Winn, Criminisi, & Minka, Object Categorization by Learned

Universal Visual Dictionary, ICCV 2005

Page 179: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Supervised vocabulary formation

• Consider vocabulary and classifier construction jointly.

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

43K. Grauman, B. Leibe

Yang, Jin, Sukthankar, & Jurie, Discriminative Visual Codebook Generation

with Classifier Training for Object Category Recognition, CVPR 2008.

Page 180: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Learning and recognition with bag of words histograms

• Bag of words representation makes it possible to describe the unordered point set with a single vector (of fixed dimension across image examples)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• Provides easy way to use distribution of feature types with various learning algorithms requiring vector input.

44K. Grauman, B. Leibe

Page 181: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• …including unsupervised topic models designed for documents.

• Hierarchical Bayesian text models (pLSA and LDA)

– Hoffman 2001, Blei, Ng & Jordan, 2004

– For object and scene categorization: Sivic et al. 2005,

Learning and recognition with bag of words histograms

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

– For object and scene categorization: Sivic et al. 2005, Sudderth et al. 2005, Quelhas et al. 2005, Fei-Fei et al. 2005

45K. Grauman, B. LeibeFigure credit: Fei-Fei Li

Page 182: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• …including unsupervised topic models designed for documents.

Learning and recognition with bag of words histograms

Probabilistic Latent

Semantic Analysis

(pLSA)wN

d z

D

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

46K. Grauman, B. Leibe

D

“face”

Sivic et al. ICCV 2005

[pLSA code available at: http://www.robots.ox.ac.uk/~vgg/software/]

Figure credit: Fei-Fei Li

Page 183: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Bags of words: pros and cons

+ flexible to geometry / deformations / viewpoint

+ compact summary of image content

+ provides vector representation for sets

+ has yielded good recognition results in practice

-

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

- basic model ignores geometry – must verify afterwards, or encode via features

- background and foreground mixed when bag covers whole image

- interest points or sampling: no guarantee to capture object-level parts

- optimal vocabulary formation remains unclear

47K. Grauman, B. Leibe

Page 184: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

48K. Grauman, B. Leibe

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Feature Sets

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 185: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual Object Recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Bastian Leibe &

Computer Vision LaboratoryETH Zurich

Chicago, 14.07.2008

Kristen Grauman

Department of Computer SciencesUniversity of Texas in Austin

Page 186: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

2K. Grauman, B. Leibe

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Feature Sets

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 187: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Basic flow

…Index each one into pool of descriptors from previously seen images

or

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

3K. Grauman, B. Leibe

Detect or sample features

Describe features

List of positions,

scales,

orientations

Associated list of

d-dimensional

descriptors

Compute match with another image

or

Quantize to form bag of words vector for the image

Page 188: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Local feature correspondences

• The matching between sets of local features helps to establish overall similarity between objects or shapes.

• Assigned matches also useful for localization

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

4K. Grauman, B. Leibe

Shape context

[Belongie &

Malik 2001]

Low-distortion matching [Berg & Malik 2005] Match kernel

[Wallraven,

Caputo & Graf

2003]

Page 189: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Local feature correspondences

• Least cost match: minimize total cost between matched points

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• Least cost partial match: match all of smaller set to some portion of larger set.

∑∈

Xx

ii

YXi

xx )(min:

ππ

Page 190: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Pyramid match kernel (PMK)

• Optimal matching expensive relative to number of features per image (m).

• PMK is approximate partial match for efficient discriminative learning from sets of local features.

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

6K. Grauman, B. Leibe

Optimal match: O(m3)Greedy match: O(m2 log m)Pyramid match: O(m)

[Grauman & Darrell, ICCV 2005]

Page 191: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Pyramid match kernel: pyramid extraction

,

Histogram

pyramid:

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

7K. Grauman

pyramid:

level i has bins

of size

Page 192: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Pyramid match kernel: counting matches

Histogram intersection

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

8K. Grauman

Page 193: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Pyramid match kernel: counting new matches

matches at this level matches at previous level

Histogram intersection

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

9K. Grauman

Difference in histogram intersections across

levels counts number of new pairs matched

Page 194: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Pyramid match kernel

histogram pyramids

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

10K. Grauman

• For similarity, weights inversely proportional to bin size (or may be learned discriminatively)

• Normalize kernel values to avoid favoring large sets

measure of difficulty of a match at level i

number of newly matched pairs at level i

Page 195: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example pyramid match

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

11K. Grauman

Page 196: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example pyramid match

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

12K. Grauman

Page 197: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example pyramid match

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

13K. Grauman

Page 198: AAAI08 tutorial: visual object recognition

pyramid match

Example pyramid match

optimal match

K. Grauman

Page 199: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Pyramid match kernel

• Forms a Mercer kernel -> allows classification with SVMs, use of other kernel methods

• Bounded error relative to optimal partial match

• Linear time -> efficient learning with large feature sets

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

K. Grauman, B. Leibe

Page 200: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Pyramid match kernel

• Forms a Mercer kernel -> allows classification with SVMs, use of other kernel methods

• Bounded error relative to optimal partial match

• Linear time -> efficient learning with large feature sets

Accu

racy

ET

H

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Accu

racy

Mean number of featuresT

ime (

s)

Mean number of features

ET

H-8

0 d

ata

set

Pyramid match

Match [Wallraven et al.]O(m2)

O(m)

Page 201: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Pyramid match kernel

• Forms a Mercer kernel -> allows classification with SVMs, use of other kernel methods

• Bounded error relative to optimal partial match

• Linear time -> efficient learning with large feature sets

• Use data-dependent pyramid partitions for high-d feature spaces

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

feature spaces

Uniform pyramid bins Vocabulary-guided

pyramid bins

Code for PMK: http://people.csail.mit.edu/jjl/libpmk/

Page 202: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Matching smoothness & local geometry

• Solving for linear assignment means (non-overlapping) features can be matched independently, ignoring relative geometry.

• One alternative: simply expand feature vectors to include spatial information before matching.

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

18K. Grauman, B. Leibe

[ f1,…,f128, ]

xa

yaxa, ya

Page 203: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Spatial pyramid match kernel

• First quantize descriptors into words, then do one pyramid match per word in image coordinate space.

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Lazebnik, Schmid & Ponce, CVPR 2006

K. Grauman, B. Leibe

Page 204: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Matching smoothness & local geometry

• Use correspondence to estimate parameterized transformation, regularize to enforce smoothness

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Shape context matching [Belongie, Malik, & Puzicha 2001]

K. Grauman, B. Leibe

Code: http://www.eecs.berkeley.edu/Research/Projects/CS/vision/shape/sc_digits.html

Page 205: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Matching smoothness & local geometry

• Let matching cost include term to penalize distortion between pairs of matched features.

j j'QueryTemplate

Rij

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Approximate for efficient solutions: Berg & Malik, CVPR 2005;

Leordeanu & Hebert, ICCV 2005

i i 'i i'

RijSi'j'

Figure credit: Alex Berg K. Grauman, B. Leibe

Page 206: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Matching smoothness & local geometry

• Compare “semi-local” features: consider configurations or neighborhoods and co-occurrence relationships

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

K. Grauman, B. Leibe

Hyperfeatures: Agarwal &

Triggs, ECCV 2006]

Correlograms of

visual words

[Savarese, Winn, &

Criminisi, CVPR 2006]

Proximity

distribution kernel

[Ling & Soatto, ICCV

2007]

Feature neighborhoods [Sivic

& Zisserman, CVPR 2004]

Tiled neighborhood [Quack, Ferrari,

Leibe, van Gool ICCV 2007]

Page 207: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Matching smoothness & local geometry

• Learn or provide explicit object-specific shape model [Next in the tutorial : part-based models]

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

x1

x3

x4

x6

x5

x2

Page 208: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Summary

• Local features are a useful, flexible representation

� Invariance properties - typically built into the descriptor

� Distinctive, especially helpful for identifying specific textured objects

� Breaking image into regions/parts gives tolerance to occlusions and clutter

Mapping to visual words forms discrete tokens from image

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

� Mapping to visual words forms discrete tokens from image regions

• Efficient methods available for

� Indexing patches or regions

� Comparing distributions of visual words

� Matching features

24K. Grauman, B. Leibe

Page 209: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

25K. Grauman, B. Leibe

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Feature Sets

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 210: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual Object Recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Bastian Leibe &

Computer Vision LaboratoryETH Zurich

Chicago, 14.07.2008

Kristen Grauman

Department of Computer SciencesUniversity of Texas in Austin

Page 211: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

2K. Grauman, B. Leibe

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 212: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Recognition of Object Categories

• We no longer have exact correspondences…

• On a local level, wecan still detect similar parts.

• Represent objects

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

3T. Tuytelaars, B. Leibe

• Represent objectsby their parts

⇒⇒⇒⇒ Bag-of-features

• How can weimprove on this?

� Encode structure

Slide credit: Rob Fergus

Page 213: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Part-Based Models

• Fischler & Elschlager 1973

• Model has two components

� parts (2D image fragments)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

4K. Grauman, B. Leibe

(2D image fragments)

� structure (configuration of parts)

Page 214: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Different Connectivity Structures

Fergus et al. ’03Fei-Fei et al. ‘03

Leibe et al. ’04, ‘08Crandall et al. ‘05

Crandall et al. ‘05 Felzenszwalb & Huttenlocher ‘05

O(N6) O(N2) O(N3) O(N2)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

5K. Grauman, B. Leibe

Fei-Fei et al. ‘03 Crandall et al. ‘05Fergus et al. ’05

Huttenlocher ‘05

Bouchard & Triggs ‘05 Carneiro & Lowe ‘06Csurka ’04Vasconcelos ‘00

from [Carneiro & Lowe, ECCV’06]

Page 215: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Spatial Models Considered Here

x1

x6 x2

“Star” shape model

x1

x6 x2

Fully connected shape model

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

6K. Grauman, B. Leibe

x3

x4

x5x3

x4

x5

� e.g. Constellation Model

� Parts fully connected

� Recognition complexity: O(NP)

� Method: Exhaustive search

� e.g. ISM

� Parts mutually independent

� Recognition complexity: O(NP)

� Method: Gen. Hough Transform

Slide credit: Rob Fergus

Page 216: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Constellation Model

• Joint model for appearance and shape

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

7K. Grauman, B. Leibe

Gaussian shape pdf

Prob. of detection

Gaussian part appearance pdf Gaussian

relative scale pdf

Log(scale)

0.8 0.75 0.9

Page 217: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Constellation ModelGaussian shape pdf

Prob. of detection

Gaussian part appearance pdf Gaussian

relative scale pdf

Log(scale)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

8K. Grauman, B. Leibe

0.8 0.75 0.9

Uniform shape pdf

Clutter model

Gaussian appearance pdf

Poission pdf on # detections

Uniform

relative scale pdf

Log(scale)

Page 218: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• Goal: Find regions & their location, scale & appearance

• Initialize model parameters

• Use EM and iterate to convergence

� E-step: Compute assignments for which regions are foreground/background

� M-step: Update model parameters

Constellation Model: Learning Procedure

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

9K. Grauman, B. Leibe

• Trying to maximize likelihood – consistency in shape & appearance

Page 219: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Motorbikes

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

10K. Grauman, B. Leibe

Page 220: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Motorbikes (2)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

11K. Grauman, B. Leibe

Page 221: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Spotted Cats

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

12K. Grauman, B. Leibe

Page 222: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Discussion: Constellation Model

• Advantages� Works well for many different object categories

� Can adapt well to categories where– Shape is more important

– Appearance is more important

� Everything is learned from training data

� Weakly-supervised training possible

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

13K. Grauman, B. Leibe

� Weakly-supervised training possible

• Disadvantages� Model contains many parameters that need to be estimated

� Cost increases exponentially with increasing number of parameters

⇒⇒⇒⇒ Fully connected model restricted to small number of parts.

Page 223: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Implicit Shape Model (ISM)

• Basic ideas

� Learn an appearance codebook

� Learn a star-topology structural model

– Features are considered independent given obj. center

• Algorithm: probabilistic Gen. Hough Transform

x1

x3

x4

x6

x5

x2

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

14K. Grauman, B. Leibe

• Algorithm: probabilistic Gen. Hough Transform

� Exact correspondences →→→→ Prob. match to object part

� NN matching →→→→ Soft matching

� Feature location on obj. →→→→ Part location distribution

� Uniform votes →→→→ Probabilistic vote weighting

� Quantized Hough array →→→→ Continuous Hough space

Page 224: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Codebook Representation

• Extraction of local object features� Interest Points (e.g. Harris detector)

� Sparse representation of the object appearance

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

15K. Grauman, B. Leibe

• Collect features from whole training set

• Example:

Page 225: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Gen. Hough Transform with Local Features

• For every feature, store possible “occurrences”

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

18K. Grauman, B. Leibe

– Object identity

– Pose

– Relative position

• For new image, let the matched features vote for possible object positions

Page 226: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Implicit Shape Model - Representation

Training images(+reference segmentation)

Appearance codebook…………

………………………………

…………

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

19B. Leibe

• Learn appearance codebook

� Extract local features at interest points

� Agglomerative clustering ⇒⇒⇒⇒ codebook

• Learn spatial distributions

� Match codebook to training images

� Record matching positions on object

Spatial occurrence distributionsx

y

sx

y

s

x

y

s

x

y

s

+ local figure-ground labels

Page 227: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Implicit Shape Model - Recognition

Interest Points Matched Codebook

Entries

Probabilistic

Voting

yObject Image Feature Interpretation

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

21

3D Voting Space

(continuous)

xs

Object

Position

o,x

Image Feature

f

Interpretation

(Codebook match)

Ci

)( fCp i ),,( lin Cxop

∑=i

inin CxopfCpfxop ),,()(),,( ll

[Leibe04, Leibe08]

Page 228: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Implicit Shape Model - Recognition

Interest Points Matched Codebook

Entries

Probabilistic

Voting

y

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

22

Backprojected

Hypotheses

3D Voting Space

(continuous)

xs

Backprojection

of Maxima

[Leibe04, Leibe08]

Page 229: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Results on Cows

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

24K. Grauman, B. Leibe

Original image

Page 230: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Results on Cows

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

25K. Grauman, B. Leibe

Original imageInterest points

Page 231: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Results on Cows

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

26K. Grauman, B. Leibe

Original imageOriginal imageOriginal imageOriginal imageInterest pointsInterest pointsInterest pointsInterest pointsMatched patches

Page 232: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Results on Cows

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

27K. Grauman, B. Leibe

Original imageOriginal imageOriginal imageOriginal imageInterest pointsInterest pointsInterest pointsInterest pointsMatched patchesMatched patchesMatched patchesMatched patchesProb. Votes

Page 233: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Results on Cows

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

28K. Grauman, B. Leibe

1st hypothesis

Page 234: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Results on Cows

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

29K. Grauman, B. Leibe

2nd hypothesis

Page 235: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Results on Cows

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

30K. Grauman, B. Leibe

3rd hypothesis

Page 236: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• Scale-invariant feature selection

� Scale-invariant interest points

� Rescale extracted patches

� Match to constant-size codebook

• Generate scale votes

Scale Invariant Voting

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

31K. Grauman, B. Leibe

• Generate scale votes

� Scale as 3rd dimension in voting space

� Search for maxima in 3D voting space

Search window

x

y

s

Page 237: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Scale Voting: Efficient Computation

y

s

Binned

y

s

x

Refinement

y

s

x

Candidate

y

s

Scale votes

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

33K. Grauman, B. Leibe

• Mean-Shift formulation for refinement

� Scale-adaptive balloon density estimator

Binned

accum. array

Refinement

(MSME)

Candidate

maxima

Scale votes

Page 238: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Detection Results

• Qualitative Performance

� Recognizes different kinds of objects

� Robust to clutter, occlusion, noise, low contrast

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

35K. Grauman, B. Leibe

Page 239: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Figure-Ground Segregation

• Problem extensively studied in Psychophysics

• Experiments with ambiguousfigure-ground stimuli

• Results:

Evidence that object recognition can

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

36K. Grauman, B. Leibe

� Evidence that object recognition canand does operate before figure-ground organization

� Interpreted as Gestalt cue familiarity.

M.A. Peterson, “Object Recognition Processes Can and Do Operate Before Figure-

Ground Organization”, Cur. Dir. in Psych. Sc., 3:105-111, 1994.

Page 240: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

ISM – Top-Down Segmentation

Interest Points Matched Codebook

Entries

Probabilistic

Voting

y

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

37K. Grauman, B. Leibe

Backprojected

Hypotheses

Segmentation3D Voting Space

(continuous)

xs

Backprojection

of Maximap(figure)

Probabilities

[Leibe04, Leibe08]

Page 241: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Segmentation: Probabilistic Formulation

• Influence of patch on object hypothesis (vote weight)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

38K. Grauman, B. Leibe

( )( ) ( ) ( )

( )xop

f,pfCpCxopxofp

n

i iin

n,

||,,,

∑=

ll

( ) ( ) ( )∑∈

===),(

,|,,,,|,|l

ll

f

nnn xofpxoffigurepxofigurepp

pp

• Backprojection to features ff and pixels pp:

Segmentationinformation

Influence on object hypothesis

[Leibe04, Leibe08]

Page 242: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Segmentation

p(figure)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

46K. Grauman, B. Leibe

• Interpretation of p(figure) map

� per-pixel confidence in object hypothesis

� Use for hypothesis verification

p(figure)

p(ground)

Segmentation

p(ground)

Original image

[Leibe04, Leibe08]

Page 243: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example Results: Motorbikes

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

47K. Grauman, B. Leibe

Page 244: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example Results: Cows

• Training

� 112 hand-segmented images

• Results on novel sequences:

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

48K. Grauman, B. Leibe

Single-frame recognition - No temporal continuity used!

[Leibe04, Leibe08]

Page 245: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example Results: Chairs

Dining room chairs

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

49B. Leibe

Office chairs

Page 246: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Inferring Other Information: Part Labels

TrainingTraining

TestTest OutputOutput

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

50[Thomas07]

Page 247: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Inferring Other Information: Part Labels (2)

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

51[Thomas07]

Page 248: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Inferring Other Information: Depth Maps

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

52

“Depth from a single image”

[Thomas07]

Page 249: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• Estimating Articulation

Application for Pedestrian Detection

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

53B. Leibe

• Rotation-Invariant Detection

[Leibe, Seemann, Schiele, CVPR’05]

[Mikolajczyk, Leibe, Schiele, CVPR’06]

θq

φ

dq

φ

θ

d

Page 250: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

54K. Grauman, B. Leibe

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 251: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Visual Object Recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Bastian Leibe &

Computer Vision LaboratoryETH Zurich

Chicago, 14.07.2008

Kristen Grauman

Department of Computer SciencesUniversity of Texas in Austin

Page 252: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

2K. Grauman, B. Leibe

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Highlight of some research topics not covered in the main tutorial

Page 253: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Benchmark Data

• What degree of difficulty do current datasets have?

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Page 254: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Caltech-101

A dataset that has been about mastered…

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

K. Grauman, B. Leibe

Images from the Caltech-101:

101-way multi-class classification problem

Page 255: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Caltech256

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

K. Grauman, B. Leibe

Images from the Caltech-256:

256 multi-class recognition problem

Page 256: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: Pascal Visual Object Classes Challenge

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

K. Grauman, B. Leibe

Pascal VOC 2007:

Binary detection problems

http://pascallin.ecs.soton.ac.uk/challenges/VOC/

Page 257: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Example: LabelMe

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

http://labelme.csail.mit.edu/

K. Grauman, B. Leibe

Page 258: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Current challenges & ongoing research

• Multi-cue integration

• Finer level categorization

• View invariant recognition

• Unsupervised category discovery

• Learning from noisily labeled images

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• Integration of segmentation and recognition

• Learning with text and images/video

• Use of video

• Context and scene layout

Page 259: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Multi-cue integration

• Single cues often not sufficient.

• Integrate multiple local and global cues.

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

9K. Grauman, B. Leibe

Page 260: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Multi-Category Discrimination

• Distinguish similar categories.

• Need to look at specific details!

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

10K. Grauman, B. Leibe

Page 261: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• Detectors for different viewpoints ⇒⇒⇒⇒ How can this be improved?

Multi-Aspect Recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

11K. Grauman, B. Leibe

Page 262: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Multi-Aspect Recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

12K. Grauman, B. Leibe

[Thomas et al., CVPR’06][Hoiem, Rother, Winn, CVPR’07]

Page 263: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Multi-Aspect Recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

13K. Grauman, B. Leibe

[Rothganger et al., CVPR’03]

[Savarese & Fei-Fei, ICCV’07]

Page 264: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Unsupervised, semi-supervised category discovery

Probabilistic Latent Semantic Analysis (pLSA)

“face”

Topic models for images

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

wN

c zD

π

“beach”Latent Dirichlet Allocation (LDA)

Sivic et al. ICCV 2005, Fei-Fei et al. ICCV 2005Figure credit: Fei-Fei Li

Page 265: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Unsupervised, semi-supervised category discovery

Clustering cluttered images

Learning from noisy keyword-based image search results

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Grauman & Darrell, CVPR 2006

Fergus et al. ECCV 2004, ICCV 2005

Li & Fei-Fei, CVPR 2007

Page 266: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Learning with text and images/video

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Berg, Berg, Edwards,

& Forsyth, NIPS 2006

Barnard et al. JMLR 2003

Gupta et al. ECML 2008

Page 267: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Integrating segmentation + recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Kumar et al. CVPR 2005Borenstein & Ullman, ECCV 2002

Kannan, Winn, & Rother, NIPS 2006Tu, Chen, Yuille, Zhu, ICCV 2003

Page 268: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Role of context, understanding scene layout

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Antonio Torralba, IJCV 2003

Page 269: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Role of context, understanding scene layout

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Image World

Hoiem, Efros, & Hebert, CVPR 2006

Page 270: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Integration with Scene Geometry

• Goal: Find the ground plane

� Restrict object location

� Assume Gaussian size prior

⇒⇒⇒⇒ Significantly reduced search space

Structure-from-Motion

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

20B. Leibe

Dense stereo

Structure-from-Motion

x

s

y Search corridor

Hough Volume

Page 271: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Extensions

• Combination with 3D Geometry

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

21K. Grauman, B. Leibe

• Mobile Pedestrian Detection

[Leibe, Cornelis, Cornelis, Van Gool, CVPR’07]

[Ess, Leibe, Van Gool, ICCV’07]21

Page 272: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Detections Using Ground Plane Constraints

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

22B. Leibe

left camera

1175 frames

[Leibe et al. CVPR’07]

Page 273: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Extensions: Tracking-by-Detection

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

23

• Spacetime trajectory analysis

� Link up detections to form physically plausible ST trajectories

� Select set of ST trajectories that best explain the data

[Leibe et al. CVPR’07]

Page 274: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Dynamic Scene Analysis Results

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

24B. Leibe [Leibe et al. CVPR’07]

Page 275: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Extensions (2)

• Combination 3D Reconstruction

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

25K. Grauman, B. Leibe

[Cornelis, Leibe, Cornelis, Van Gool, 3DPVT’06]

Page 276: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Textured 3D Model

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

26B. Leibe

• Run-times� SfM + Bundle adjustment: 27-30 fps on CPU

� Dense reconstruction: 36 fps on GPU

Original 3D Reconstruction

[Cornelis, Cornelis, Van Gool, CVPR’06]

Page 277: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Improved 3D City Model

Enhancing your driving experience…

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

27

Original 3D Reconstruction

[Cornelis, Leibe, Cornelis, Van Gool, 3DPVT’06]

Page 278: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Putting It All Together…

π

1..nπd oi

di

I D

x

y

s

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

28B. Leibe

x

t

z

itiH

,

H1 H2

Q

S

V

VT

Page 279: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Mobile Pedestrian Tracking

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

29[Ess, Leibe, Schindler, Van Gool, CVPR’08]

Page 280: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Mobile Tracking Through Crowds

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

30[Ess, Leibe, Schindler, Van Gool, CVPR’08]

Page 281: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Extension: Recovering Articulations1...N

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

31B. Leibe

• Idea: Only perform articulated tracking where it’s easy!

• Multi-person tracking

� Solves hard data association problem

• Articulated tracking

� Only on individual “tracklets” between occlusions

[Gammeter, Ess, Jaeggli, Schindler, Leibe, Van Gool, ECCV’08]

Page 282: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Articulated Multi-Person Tracking

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

32B. Leibe

• Multi-Person tracking� Recovers trajectories and solves data association

� Estimates 3D walking direction and speed

� Detects occlusion events

[Gammeter, Ess, Jaeggli, Schindler, Leibe, Van Gool, ECCV’08]

Page 283: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Articulated Tracking under Egomotion

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

33B. Leibe

[Gammeter, Ess, Jaeggli, Schindler, Leibe, Van Gool, ECCV’08]

Page 284: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

lPerceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

34K. Grauman, B. Leibe

Page 285: AAAI08 tutorial: visual object recognition

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

Summary

• Visual recognition is a challenging and very active research area.

• We’ve covered some basic models and representations that have been shown to be effective, and highlighted some ongoing issues.

Perceptual and Sensory Augmented Computing

Vis

ua

l O

bje

ct

Re

co

gn

itio

n T

uto

ria

l

• See tutorial website for slides, links, references.http://www.vision.ee.ethz.ch/~bleibe/teaching/tutorial-aaai08/

Thank you!

K. Grauman, B. Leibe