Visual’Codebook - University of...
Transcript of Visual’Codebook - University of...
Visual Codebook
Tae-‐Kyun Kim
Sidney Sussex College
1
Visual WordsVisual words are base elements to describe an image.
Interest points are detected from an imageCorners, Blob detector, SIFT detector
Image patches are represented by descriptorSIFT (Scale-‐Invariant Feature Transform) or Raw pixel intensity
2
dim
ensi
on Dor raw pixels
Building a Visual Codebook (Dictionary)
Visual words (real-‐valued vectors) can be compared using Euclidean distance:
These vectors are divided into groups which are similar, essentially clustered together, to form a codebook.
3
CodebookVisual Codewords
K-‐means ClusteringThe two steps repeated until no vector changes membership:
1. Compute a cluster center for each cluster as the mean of the cluster members
2. Reassign each data point to the cluster whose center is nearest
The cluster centers (mean values) now form a visual dictionary.
4
K codewords
Histogram of Visual Words
5
Nearest NeighbourMatching
freq
uency
Every visual word is compared with codewords and assigned to the nearest codeword.
Histogram bins are codewords and each bin counts the number of words assigned to the codeword.
K
categorydecision
Learning
feature detection& representation
codewords dictionary
image representation
category models(and/or) classifiers
Recognition
6
Categorisation
CategorisationHistogram matching of a pair of images P and Q is
Based on HMK, we can use various classifiers such as kNN, SVM, Naïve-‐Bayes classifiers or pLSA, LDA.
categorydecision
kNNClassifier
Class assignment by
Class 1 Class C
P Q
Summary of K-‐means Codebook
The histogram is 1-‐D, a highly compact and robust representation of images.
The histogram representation greatly facilitates modelling images and their categories.
Codeword assignment (quantisation process) by K-‐means is time-‐demanding.
K-‐means is an unsupervised learning method.
8
Case Study: Video (action) Categorisation
by RF Codebook
In collaboration with Tsz Ho Yu
9
Demo video: Action categorisation
10
11
categorydecision
Learning
Interest point detection
Bag of semantic textons
Recognition
FASTEST Corner
Spatiotemporal Semantic Texton Forest
kNN Classifier
Descriptor/Codebook
Classification
FASTEST Corner
Powerful Discriminative CodebookExtremely Fast
Relation to the previous
12
Input Images Videos
Interest Point Corners 3D corners
Descriptor Patches Space-‐time volumes
SIFT Raw pixels
CodebookK-‐means
Randomised Decision Forests
Representation Histogram of visual words
Classifier kNN classifierSkip
Relation to the previous-‐Time Volumes
13
Image (2D)
Patch
Video (3D)
Cuboid
or
Data Set and Interest Point
14
Action data set
15
http://www.nada.kth.se/cvap/actions/
25 subjects, 2391 sequences the largest public action DB
walking jogging running boxingHand waving
Hand clapping
s1
s2
s3
s4
Space-‐Time Interest Point
16
In Dollar et al. 2005, separable linear filters are applied.
The response function is
is 2D Gaussian smoothing kernel
hev / hod are a pair of 1D Gabor filters temporallyapplied as
Space-‐Time Interest Point
17
For multiple scales, it runs the detector over a set of spatial and temporal scales.
Interest points are detected as local maxima of the response-‐function.
Matlab Demo:Space-‐Time Interest Points
18
Background: Randomised Decision Forest
19
Binary Decision Trees
1
2 3
6 74
9
5
8
category c
split nodesleaf nodesv
10 11 12 13
14 15 16 17
feature vector vsplit functions fn(v)thresholds tnClassifications Pn(c)
<
<
Slide credits to J. Shotton
Toy Learning Example
x
y
feature vectors are x, y coordinates: v = [x, y]Tsplit functions are lines with parameters a, b: fn(v) = ax + bythreshold determines intercepts: tnfour classes: purple, blue, red, green
Try several lines, chosen at random
Keep line that best separates datainformation gain
Recurse
Slide credits to J. Shotton
x
y
feature vectors are x, y coordinates: v = [x, y]Tsplit functions are lines with parameters a, b: fn(v) = ax + bythreshold determines intercepts: tnfour classes: purple, blue, red, green
Try several lines, chosen at random
Keep line that best separates datainformation gain
Recurse
Toy Learning Example
Slide credits to J. Shotton
x
y
feature vectors are x, y coordinates: v = [x, y]Tsplit functions are lines with parameters a, b: fn(v) = ax + bythreshold determines intercepts: tnfour classes: purple, blue, red, green
Try several lines, chosen at random
Keep line that best separates datainformation gain
Recurse
Toy Learning Example
Slide credits to J. Shotton
x
y
feature vectors are x, y coordinates: v = [x, y]Tsplit functions are lines with parameters a, b: fn(v) = ax + bythreshold determines intercepts: tnfour classes: purple, blue, red, green
Try several lines, chosen at random
Keep line that best separates data
information gain
Recurse
Toy Learning Example
Slide credits to J. Shotton
Generalisation
Overfit Reasonably Smooth
vv
Generalisation
Reasonably Smooth
x
tree t1 tree tTtree t2
Train data:
Recursive algorithmset In of training examples that reach node n is split:
Features f and thresholds t chosen at randomChoose f and t to maximize gain in information
Randomized Tree Learning
left split
right split
category c
In
Il Ir
category c category c
Forest is an ensemble of decision trees
Bagging (Bootstrap AGGregatING)For each set, it randomly draws examples from the uniform dist. allowing duplication and missing.
A Forest of Trees : Learning
tree t1 tree tT
DS1 DST
DS
Forest is an ensemble of decision trees
classification is
A Forest of Trees : Recognition
tree t1 tree tT
category ccategory c
split nodes
leaf nodes
v v
Slide credits to J. Shotton
Fast Evaluation
6
8
11
16
13
2
2
3
2
1
4
7
5 times speed up
Flat structure
Tree structure
v v
Demo video: Fast evaluation
31
Random ForestCodebook
32
Video Cuboid Features
pi
Video pixel i gives cuboid p(13x13x19 pixels in experiments)
tree split function
i
If f(p) threshold>goto Left
Elsegoto Right
Randomized Forest Codebook
543
1
2
8 96 7
42 61 3
98
tree t1 tree tT
75
Example Cuboids
6
Example Cuboids
6
FASTEST Corner
Histogram of Randomized Forest Codewords
543
1
2
8 96 7
42 61 3
98
tree t1 tree tT
75
freq
uency
tree t1 tree tT
node index
Extremely FastPowerful Discriminative
Codebook
36
Matlab Demo:Random Forest Codebook
Action Categorisation Result
37
Action categorization accuracy (%) on KTH dataset6 types of actions, 25 subjects of 4 scenarios by leave-‐one-‐out protocol
38
Matlab Demo:Action Categorisation
Take-‐home Message
Use of visual codebook greatly facilitates image and video categorisation tasks.The RF codebook is extremely fast as it only uses a small number of features in trees.
c.f. K-‐means codebook is in a flat structure. It compares an input with every codewords, which is time-‐demanding.
The RF codebook is also a powerful discriminative codebook. It combines multiple decision trees showing good generalisation.
c.f. K-‐means is an unsupervised learning method.
39
Questions?
40
Action Recognition Result
41
quantisation effect!
Action categorization accuracy (%) on KTH dataset6 types of actions, 25 subjects of 4 scenarios by leave-‐one-‐out protocol
Pyramid Match Kernel
PMK acts on semantic texton histogrammatches P and Q in learned hierarchical histogram spacedeeper node matches are more important
depth weight
increased similarity at depth d
d=1
d=2
d=3
split nodesleaf nodes
CategorisationHistogram matching of a pair of videos P and Q is
Based on HMK, we can use various classifiers such as kNN, SVM, Naïve-‐Bayes classifiers or pLSA, LDA.
categorydecision
kNNClassifier
NN classification by
Class 1 Class M
P Q
Action Recognition ResultAction categorization accuracy (%) on KTH dataset
6 types of actions, 25 subjects of 4 scenarios by leave-‐one-‐out protocol