On Visual Recognition

36
Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley

description

On Visual Recognition. Jitendra Malik UC Berkeley. Water. back. Grass. Tiger. Tiger. Sand. head. eye. legs. tail. mouse. shadow. From Pixels to Perception. outdoor wildlife. Object Category Recognition. Defining Categories. What is a “visual category”? Not semantic - PowerPoint PPT Presentation

Transcript of On Visual Recognition

Page 1: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

On Visual Recognition

Jitendra Malik UC Berkeley

Page 2: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

From Pixels to Perception

TigerGrass

Water

Sand

outdoorwildlife

Tiger

tail

eye

legs

head

back

shadow

mouse

Page 3: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Object Category Recognition

Page 4: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Defining Categories

• What is a “visual category”?– Not semantic

– Working hypothesis: Two instances of the same category must have “correspondence” (i.e. one can be morphed into the other)

• e.g. Four-legged animals

– Biederman’s estimate of 30,000 basic visual categories

Page 5: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Facts from Biological Vision

• Timing

• Abstraction/Generalization

• Taxonomy and Partonomy

Page 6: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Detection can be very fast

• On a task of judging animal vs no animal, humans can make mostly correct saccades in 150 ms (Kirchner & Thorpe, 2006)

– Comparable to synaptic delay in the retina, LGN, V1, V2, V4, IT pathway.

– Doesn’t rule out feed back but shows feed forward only is very powerful

Page 7: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

-100

102030405060708090

100

0 50 100 150 200

exposure [ms]

accuracy

(corrected for guessing)

detection: obj vs. texture

categorization: car vs. obj

identification: jeep vs. car

As Soon as You Know It Is There, You Know What It Is

Grill-Spector & Kanwisher, Psychological Science, 2005

Page 8: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Abstraction/Generalization

• Configurations of oriented contours

• Considerable toleration for small deformations

Page 9: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Attneave’s Cat (1954)Line drawings convey most of the information

Page 10: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Taxonomy and Partonomy

• Taxonomy: E.g. Cats are in the order Felidae which in turn is in the class Mammalia– Recognition can be at multiple levels of categorization, or be identification

at the level of specific individuals , as in faces.

• Partonomy: Objects have parts, they have subparts and so on. The human body contains the head, which in turn contains the eyes.

• These notions apply equally well to scenes and to activities.

• Psychologists have argued that there is a “basic-level” at which categorization is fastest (Eleanor Rosch et al).

• In a partonomy each level contributes useful information fro recognition.

Page 11: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Matching with Exemplars

• Use exemplars as templates

• Correspond features between query and exemplar

• Evaluate similarity score

Query

Image

Database of

Templates

Page 12: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Matching with Exemplars

• Use exemplars as templates

• Correspond features between query and exemplar

• Evaluate similarity score

Query

Image

Database of

Templates

Best matching template is a helicopter

Page 13: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

3D objects using multiple 2D views

View selection algorithm from Belongie, Malik & Puzicha (2001)

Page 14: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Error vs. Number of Views

Page 15: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Three Big Ideas

• Correspondence based on local shape/appearance descriptors

• Deformable Template Matching

• Machine learning for finding discriminative features

Page 16: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Three Big Ideas

• Correspondence based on local shape/appearance descriptors

• Deformable Template Matching

• Machine learning for finding discriminative features

Page 17: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Comparing Pointsets

Page 18: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Shape ContextCount the number of points inside each bin, e.g.:

Count = 4

Count = 10

...

Compact representation of distribution of points relative to each point

(Belongie, Malik & Puzicha, 2001)

Page 19: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Shape Context

Page 20: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Geometric Blur(Local Appearance Descriptor)

Geometric Blur Descriptor

~

Compute sparse

channels from image

Extract a patch

in each channel

Apply spatially varying

blur and sub-sample

(Idealized signal)

Descriptor is robust to small affine distortions

Berg & Malik '01

Page 21: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Three Big Ideas

• Correspondence based on local shape/appearance descriptors

• Deformable Template Matching

• Machine learning for finding discriminative features

Page 22: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Modeling shape variation in a category

• D’Arcy Thompson: On Growth and Form, 1917– studied transformations between shapes of organisms

Page 23: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

MatchingExample

model target

Page 24: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Handwritten Digit Recognition

• MNIST 60 000: – linear: 12.0%

– 40 PCA+ quad: 3.3%

– 1000 RBF +linear: 3.6%

– K-NN: 5%

– K-NN (deskewed): 2.4%

– K-NN (tangent dist.): 1.1%

– SVM: 1.1%

– LeNet 5: 0.95%

• MNIST 600 000 (distortions): – LeNet 5: 0.8%– SVM: 0.8%– Boosted LeNet 4: 0.7%

• MNIST 20 000: – K-NN, Shape Context

matching: 0.63%

Page 25: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Page 26: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

EZ-Gimpy Results

• 171 of 192 images correctly identified: 92 %

horse

smile

canvas

spade

join

here

Page 27: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Three Big Ideas

• Correspondence based on local shape/appearance descriptors

• Deformable Template Matching

• Machine learning for finding discriminative features

Page 28: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Discriminative learning(Frome, Singer, Malik, 2006)

weights on patch features in training images

distance functions from training images to any other images

browsing, retrieval, classification

83/40079/400

Page 29: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

triplets

•learn from relative similarity

image iimage j image k

want:

image-to-image distances based on feature-to-image distances

compare image-to-image distances

Page 30: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

focal image version

image i (focal)

0.3

0.8

0.4

0.2

image j

image k

-...

0.8 0.2

...

0.3 0.4

=xijk...

0.5 -0.2

dik

dij

Page 31: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

large-margin formulation

slack variables like soft-margin SVMw constrained to be positiveL2 regularization

Page 32: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Caltech-101 [Fei-Fei et al. 04]

• 102 classes, 31-300 images/class

Page 33: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

retrieval examplequery image

retrieval results:

Page 34: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Caltech 101 classification results

(see Manik Verma’s talks for the best yet..)

Page 35: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

15 training/class, 63.2%

Page 36: On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Conclusion

• Correspondence based on local shape/appearance descriptors

• Deformable Template Matching

• Machine learning for finding discriminative features

• Integrating Perceptual Organization and Recognition