Review: Intro to recognition

30
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear classifiers

description

Review: Intro to recognition. Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear classifiers. Image features. Spatial support:. Pixel or local patch. Segmentation region. Bounding box. Whole image. Image features. - PowerPoint PPT Presentation

Transcript of Review: Intro to recognition

Page 1: Review: Intro to recognition

Review: Intro to recognition• Recognition tasks• Machine learning approach: training, testing,

generalization• Example classifiers

• Nearest neighbor• Linear classifiers

Page 2: Review: Intro to recognition

Image features

• Spatial support:

Pixel or local patch Segmentation region

Bounding box Whole image

Page 3: Review: Intro to recognition

Image features• We will focus mainly on global image

features for whole-image classification tasks• GIST descriptors• Bags of features• Spatial pyramids

Page 4: Review: Intro to recognition

GIST descriptors• Oliva & Torralba (2001)

http://people.csail.mit.edu/torralba/code/spatialenvelope/

Page 5: Review: Intro to recognition

Bags of features

Page 6: Review: Intro to recognition

Origin 1: Texture recognition

• Texture is characterized by the repetition of basic elements or textons

• For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Page 7: Review: Intro to recognition

Origin 1: Texture recognition

Universal texton dictionary

histogram

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Page 8: Review: Intro to recognition

Origin 2: Bag-of-words models• Orderless document representation: frequencies of words

from a dictionary Salton & McGill (1983)

Page 9: Review: Intro to recognition

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloudhttp://chir.ag/projects/preztags/

• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

Page 10: Review: Intro to recognition

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloudhttp://chir.ag/projects/preztags/

• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

Page 11: Review: Intro to recognition

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloudhttp://chir.ag/projects/preztags/

• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

Page 12: Review: Intro to recognition

1. Extract local features2. Learn “visual vocabulary”3. Quantize local features using visual vocabulary 4. Represent images by frequencies of “visual words”

Bag-of-features steps

Page 13: Review: Intro to recognition

1. Local feature extraction

• Regular grid or interest regions

Page 14: Review: Intro to recognition

Normalize patch

Detect patches

Compute descriptor

Slide credit: Josef Sivic

1. Local feature extraction

Page 15: Review: Intro to recognition

1. Local feature extraction

Slide credit: Josef Sivic

Page 16: Review: Intro to recognition

2. Learning the visual vocabulary

Slide credit: Josef Sivic

Page 17: Review: Intro to recognition

2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

Page 18: Review: Intro to recognition

2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

Visual vocabulary

Page 19: Review: Intro to recognition

Review: K-means clustering• Want to minimize sum of squared Euclidean

distances between features xi and their nearest cluster centers mk

Algorithm:• Randomly initialize K cluster centers• Iterate until convergence:

• Assign each feature to the nearest center• Recompute each cluster center as the mean of all features

assigned to it

k

ki

kiMXDcluster

clusterinpoint

2)(),( mx

Page 20: Review: Intro to recognition

Example codebook

Source: B. Leibe

Appearance codebook

Page 21: Review: Intro to recognition

Another codebook

Appearance codebook…

Source: B. Leibe

Page 22: Review: Intro to recognition

1. Extract local features2. Learn “visual vocabulary”3. Quantize local features using visual vocabulary 4. Represent images by frequencies of “visual words”

Bag-of-features steps

Page 23: Review: Intro to recognition

Visual vocabularies: Details• How to choose vocabulary size?

• Too small: visual words not representative of all patches• Too large: quantization artifacts, overfitting• Right size is application-dependent

• Improving efficiency of quantization• Vocabulary trees (Nister and Stewenius, 2005)

• Improving vocabulary quality• Discriminative/supervised training of codebooks• Sparse coding, non-exclusive assignment to codewords

• More discriminative bag-of-words representations• Fisher Vectors (Perronnin et al., 2007), VLAD (Jegou et al., 2010)

• Incorporating spatial information

Page 24: Review: Intro to recognition

Bags of features for action recognition

Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.

Space-time interest points

Page 25: Review: Intro to recognition

Bags of features for action recognition

Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.

Page 26: Review: Intro to recognition

Spatial pyramids

level 0

Lazebnik, Schmid & Ponce (CVPR 2006)

Page 27: Review: Intro to recognition

Spatial pyramids

level 0 level 1

Lazebnik, Schmid & Ponce (CVPR 2006)

Page 28: Review: Intro to recognition

Spatial pyramids

level 0 level 1 level 2

Lazebnik, Schmid & Ponce (CVPR 2006)

Page 29: Review: Intro to recognition

Results: Scene category dataset

Multi-class classification results(100 training images per class)

Page 30: Review: Intro to recognition

Results: Caltech101 dataset

Multi-class classification results (30 training images per class)