Categorization of natural images
-
Upload
amit-prabhudesai -
Category
Technology
-
view
118 -
download
3
description
Transcript of Categorization of natural images
Novel Approaches to Natural Scene Categorization
Amit Prabhudesai
Roll No. 04307002
M.Tech Thesis Defence
Under the guidance of
Prof. Subhasis Chaudhuri
Indian Institute of Technology, Bombay
Natural Scene Categorization – p.1/32
Overview of topics to be covered
• Natural Scene Categorization: Challenges• Our contribution
◦ Qualitative visual environment description• Portable, real-time system to aid the visually impaired• System has peripheral vision!
◦ Model-based approaches• Use of stochastic models to capture semantics• pLSA and maximum entropy models
• Conclusions and Future Work
Natural Scene Categorization – p.2/32
Natural Scene Categorization
• Interesting application of a CBIR system• Images from a broad image domain: diverse and often
ambiguous• Bridging the semantic gap• Grouping scenes into semantically meaningful categories
could aid further retrieval• Efficient schemes for grouping images into semantic
categories
Natural Scene Categorization – p.3/32
Qualitative Visual Environment Retrieval
SKYBUILDING
WO
OD
S
LAWN
RB
RTFR
LT
LB
WATER BODY
P1
P2 P3
• Use of omnidirectional images• Challenges
◦ Unstructured environment◦ No prior learning (unlike navigation/localization)
• Target application and objective◦ Wearable computing community, emphasis on visually
challenged people◦ Real-time operation
Natural Scene Categorization – p.4/32
Qualitative Visual Environment System: Overview
• Environment representation• Environment retrieval
◦ View partitioning◦ Feature extraction◦ Node annotation◦ Dynamic node annotation◦ Real-time operation
• Results
Natural Scene Categorization – p.5/32
System Overview (contd.)
• Environment representation◦ Image database containing images belonging to 6
classes: Lawns(L), Woods(W), Buildings(B),Waterbodies(H), Roads(R) and Traffic(T)
◦ Moderately large intra-class variance (in the featurespace) in images of each category
◦ Description relative to the person using the system: e.g.,‘to left of’, ‘in the front’, etc.
◦ Topological relationships indicated by a graph◦ Each node annotated by an identifier associated with a
class
Natural Scene Categorization – p.6/32
System Overview (contd.)
• Environment Retrieval◦ View Partitioning
RT
RBLB
LT
XX
FR
BS
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � �
� � � � �
� � � � �
� � � � �
� � � � �� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
FORWARD DIRECTION
BACKWARD DIRECTION
RT
FR
LT
LB
XX
RB
BS
View Partitioning Graphical representation
◦ Feature Extraction• Feature invariant to scaling, viewpoint, illumination
changes, and geometric warping introduced byomnicam images
• Colour histogram selected as the feature forperforming CBIR
Natural Scene Categorization – p.7/32
System Overview (contd.)
• Environment Retrieval◦ View Partitioning
RT
RBLB
LT
XX
FR
BS
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � �
� � � � �
� � � � �
� � � � �
� � � � �� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
FORWARD DIRECTION
BACKWARD DIRECTION
RT
FR
LT
LB
XX
RB
BS
View Partitioning Graphical representation
◦ Feature Extraction• Feature invariant to scaling, viewpoint, illumination
changes, and geometric warping introduced byomnicam images
• Colour histogram selected as the feature forperforming CBIR
Natural Scene Categorization – p.7/32
System Overview (contd.)
• Environment Retrieval◦ Node annotation
• Objective: Robust retrieval against illuminationchanges and intra-class variations
• Solution: Annotation decided by a simple votingscheme
◦ Dynamic node annotation• Temporal evolution of graph Gn with time tn• Complete temporal evolution of the graph given by G,
obtained by concatenating the subgraphs Gn,i.e.,G = {G1, G2, . . . , Gk, . . .}
Natural Scene Categorization – p.8/32
System Overview (contd.)
• Environment Retrieval◦ Node annotation
• Objective: Robust retrieval against illuminationchanges and intra-class variations
• Solution: Annotation decided by a simple votingscheme
◦ Dynamic node annotation• Temporal evolution of graph Gn with time tn• Complete temporal evolution of the graph given by G,
obtained by concatenating the subgraphs Gn,i.e.,G = {G1, G2, . . . , Gk, . . .}
Natural Scene Categorization – p.8/32
System Overview (contd.)
• Environment Retrieval◦ Real-time operation
• Colour histogram: compact feature vector• Pre-computed histograms of all the database images• Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼
100 ms for single omnicam image
◦ Portable, low-cost system for visually impaired• Modest hardware and software requirements• Easily put together using off-the-shelf components
Natural Scene Categorization – p.9/32
System Overview (contd.)
• Environment Retrieval◦ Real-time operation
• Colour histogram: compact feature vector• Pre-computed histograms of all the database images• Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼
100 ms for single omnicam image
◦ Portable, low-cost system for visually impaired• Modest hardware and software requirements• Easily put together using off-the-shelf components
Natural Scene Categorization – p.9/32
System Overview (contd.)
• Results
◦ Cylindrical concentric mosaics
Natural Scene Categorization – p.10/32
System Overview (contd.)
• Results
◦ Cylindrical concentric mosaics
Natural Scene Categorization – p.10/32
System Overview (contd.)
• Results
◦ Still omnicam image
Natural Scene Categorization – p.11/32
System Overview (contd.)
• Results
◦ Still omnicam image
Natural Scene Categorization – p.11/32
System Overview (contd.)
• Results
◦ Omnivideo sequence
5
10
15
20
W
W
B
W
W
W
W
W
W
W
B
W
W
W
W
X
X
X
X
XW
W
W
W
W
W
B
W
W
W
1
n
nFORWARD DIRECTION
BACKWARD DIRECTION
1051 15 20
R R R L L
n
R
25
Natural Scene Categorization – p.12/32
System Overview (contd.)
• Results
◦ Omnivideo sequence
5
10
15
20
W
W
B
W
W
W
W
W
W
W
B
W
W
W
W
X
X
X
X
XW
W
W
W
W
W
B
W
W
W
1
n
nFORWARD DIRECTION
BACKWARD DIRECTION
1051 15 20
R R R L L
n
R
25
Natural Scene Categorization – p.12/32
Analyzing our results
• System accuracy: close to 70%– This is not enough!• Some scenes are inherently ambiguous!• Often the second best class is the correct class
• Limitations1. Limited discriminating power of global colour histogram
(GCH)2. Local colour histogram (LCH) based on tiling cannot be
used3. Each frame analyzed independently
• Possible solutions1. Adding memory to the system2. Clustering scheme before computing similarity measure
Natural Scene Categorization – p.13/32
Analyzing our results
• System accuracy: close to 70%– This is not enough!• Some scenes are inherently ambiguous!• Often the second best class is the correct class
• Limitations1. Limited discriminating power of global colour histogram
(GCH)2. Local colour histogram (LCH) based on tiling cannot be
used3. Each frame analyzed independently
• Possible solutions1. Adding memory to the system2. Clustering scheme before computing similarity measure
Natural Scene Categorization – p.13/32
Analyzing our results
• System accuracy: close to 70%– This is not enough!• Some scenes are inherently ambiguous!• Often the second best class is the correct class
• Limitations1. Limited discriminating power of global colour histogram
(GCH)2. Local colour histogram (LCH) based on tiling cannot be
used3. Each frame analyzed independently
• Possible solutions1. Adding memory to the system2. Clustering scheme before computing similarity measure
Natural Scene Categorization – p.13/32
Method I. Adding memory to the system
• System uses only the current observation in labeling• Good idea to use all observations upto the current one• Desired: A recursive implementation to calculate the
posterior (should be able to do it in real-time!)• Hidden Markov Model: Parameter estimation using Kevin
Murphy’s HMM toolkit
• Challenges1. Estimation of the transition matrix- possible solution is to
use limited classes2. Enormous training data required
Natural Scene Categorization – p.14/32
Method I. Adding memory to the system
• System uses only the current observation in labeling• Good idea to use all observations upto the current one• Desired: A recursive implementation to calculate the
posterior (should be able to do it in real-time!)• Hidden Markov Model: Parameter estimation using Kevin
Murphy’s HMM toolkit
• Challenges1. Estimation of the transition matrix- possible solution is to
use limited classes2. Enormous training data required
Natural Scene Categorization – p.14/32
Adding memory. . . (Results)
• Improved confidence in the results. However, negligibleimprovement in the accuracy
• Reasons for poor performance◦ Limited number of transitions in categories (as opposed
to locations◦ Typical training data for HMMs is thousands of labels:
difficult to collect such vast data• Limitation: Makes the system dependent on the system
dependent on the training sequence
Natural Scene Categorization – p.15/32
Method II. Preclustering the image
• Presence of clutter, images from a broad domain• Premise: The part of the image indicative of the semantic
category forms a distinct part in the feature space
Some test images belonging to the ‘Water-bodies’ category
• Possible solution: segment out the clutter in the scene
Natural Scene Categorization – p.16/32
Preclustering the image. . .
• K means clustering of the image• Use only pixels from the largest cluster to compute the
colour histogram
Results of K means clustering on the test images
• Results◦ Accuracy improves significantly– for ‘water-bodies’ class
improvement from 25% to about 72%
• Limitations: What about, say, a traffic scene?!
Natural Scene Categorization – p.17/32
Preclustering the image. . .
• K means clustering of the image• Use only pixels from the largest cluster to compute the
colour histogram
Results of K means clustering on the test images
• Results◦ Accuracy improves significantly– for ‘water-bodies’ class
improvement from 25% to about 72%
• Limitations: What about, say, a traffic scene?!
Natural Scene Categorization – p.17/32
Preclustering the image. . .
• K means clustering of the image• Use only pixels from the largest cluster to compute the
colour histogram
Results of K means clustering on the test images
• Results◦ Accuracy improves significantly– for ‘water-bodies’ class
improvement from 25% to about 72%
• Limitations: What about, say, a traffic scene?!
Natural Scene Categorization – p.17/32
Model-based approaches
• Stochastic models used to learn semantic concepts fromtraining images
• Use of normal perspective images• Use of local image features• Two models examined
1. probabilistic Latent Semantic Analysis (pLSA)2. Maximum entropy models
• Use of the ‘bag of words’ approach
Natural Scene Categorization – p.18/32
Bag of words approach
• Local features more robust to occlusions and spatialvariations
• Image represented as a collection of local patches• Image patches are members of a learned (visual)
vocabulary• Positional relationships not considered!• Data representation by a co-occurrence matrix
• Notation◦ D = {d1, . . . , dN} : corpus of documents◦ W = {w1, . . . , wM} : dictionary of words◦ Z = {z1, . . . , zK} : (latent) topic variables◦ N = {n(w, d)}: co-occurrence table
Natural Scene Categorization – p.19/32
Bag of words approach
• Local features more robust to occlusions and spatialvariations
• Image represented as a collection of local patches• Image patches are members of a learned (visual)
vocabulary• Positional relationships not considered!• Data representation by a co-occurrence matrix
• Notation◦ D = {d1, . . . , dN} : corpus of documents◦ W = {w1, . . . , wM} : dictionary of words◦ Z = {z1, . . . , zK} : (latent) topic variables◦ N = {n(w, d)}: co-occurrence table
Natural Scene Categorization – p.19/32
pLSA model . . .
• Generative model◦ select a document d with probability P (d)
◦ select a latent class z with probability P (z|d)
◦ select a word w with probability P (w|z)
• Joint observation probabilityP (d,w) = P (d)P (w|d), whereP (w|d) =
∑
z∈Z P (w|z)P (z|d)
• Modeling assumptions1. Observation pairs (d,w) generated independently2. Conditional independence assumption
P (w, d|z) = P (w|z)P (d|z)
Natural Scene Categorization – p.20/32
pLSA model . . .
• Generative model◦ select a document d with probability P (d)
◦ select a latent class z with probability P (z|d)
◦ select a word w with probability P (w|z)
• Joint observation probabilityP (d,w) = P (d)P (w|d), whereP (w|d) =
∑
z∈Z P (w|z)P (z|d)
• Modeling assumptions1. Observation pairs (d,w) generated independently2. Conditional independence assumption
P (w, d|z) = P (w|z)P (d|z)
Natural Scene Categorization – p.20/32
pLSA model . . .
• Generative model◦ select a document d with probability P (d)
◦ select a latent class z with probability P (z|d)
◦ select a word w with probability P (w|z)
• Joint observation probabilityP (d,w) = P (d)P (w|d), whereP (w|d) =
∑
z∈Z P (w|z)P (z|d)
• Modeling assumptions1. Observation pairs (d,w) generated independently2. Conditional independence assumption
P (w, d|z) = P (w|z)P (d|z)
Natural Scene Categorization – p.20/32
pLSA model . . .
• Model fitting◦ Maximize the log-likelihood functionL =
∑
d∈D
∑
w∈Wn(d,w)logP (d,w)
◦ Minimizing the KL divergence between the empiricaldistribution and the model
◦ EM algorithm to learn model parameters
• Evaluating model on unseen test images◦ P (w|z) and P (z|d) learned from the training dataset◦ ‘Fold-in’ heuristic for categorization: learned factors
P (w|z) are kept fixed, mixing coefficients P (z|dtest) areestimated using the EM iterations
Natural Scene Categorization – p.21/32
pLSA model . . .
• Model fitting◦ Maximize the log-likelihood functionL =
∑
d∈D
∑
w∈Wn(d,w)logP (d,w)
◦ Minimizing the KL divergence between the empiricaldistribution and the model
◦ EM algorithm to learn model parameters
• Evaluating model on unseen test images◦ P (w|z) and P (z|d) learned from the training dataset◦ ‘Fold-in’ heuristic for categorization: learned factors
P (w|z) are kept fixed, mixing coefficients P (z|dtest) areestimated using the EM iterations
Natural Scene Categorization – p.21/32
pLSA model . . .
• Details of experiment to evaluate model◦ 5 categories: houses, forests, mountains, streets and
beaches◦ Image dataset: COREL photo CDs, images from internet
search engines, and personal image collections◦ 100 images of each category◦ Modifications in Rob Fergus’s code for the experiments◦ 128-dim SIFT feature used to represent a patch◦ Visual codebook with 125 entries
• Image annotationz = arg maxi P (zi|dtest)
Natural Scene Categorization – p.22/32
pLSA model. . . Results
• 50 runs of the experiment: with random partitioning on eachrun
• Vastly different accuracy on different runs: best case ∼ 46%,and worst case 5%
• Analysis of the results◦ Confusion matrix gives us further insights◦ Most of the labeling errors occur between houses and
streets◦ Ambiguity between mountains and forests
Natural Scene Categorization – p.23/32
pLSA model. . . Results
• 50 runs of the experiment: with random partitioning on eachrun
• Vastly different accuracy on different runs: best case ∼ 46%,and worst case 5%
• Analysis of the results◦ Confusion matrix gives us further insights◦ Most of the labeling errors occur between houses and
streets◦ Ambiguity between mountains and forests
Natural Scene Categorization – p.23/32
Results using the pLSA model
Figure 0: Some images that were wrongly anno-
tated by our system
Natural Scene Categorization – p.24/32
Results of the pLSA model . . .
• Comparison with the naive Bayes’ classifier
Figure 0: Confusion matrices for the pLSA and
naive Bayes models
• 10-fold cross validation test on the same dataset: meanaccuracy: ∼ 66%
Natural Scene Categorization – p.25/32
Analysis of our results
• Reasons for poor performance◦ Model convergence!◦ Local optima problem in the EM algorithm◦ Optimum value of the objective function depends on the
initialized values◦ We initialize the algorithm randomly at each run!
• Possible solution: Deterministic annealing EM (DAEM)algorithm
• Even with DAEM no guarantee of converging to the globaloptimal solution
Natural Scene Categorization – p.26/32
Analysis of our results
• Reasons for poor performance◦ Model convergence!◦ Local optima problem in the EM algorithm◦ Optimum value of the objective function depends on the
initialized values◦ We initialize the algorithm randomly at each run!
• Possible solution: Deterministic annealing EM (DAEM)algorithm
• Even with DAEM no guarantee of converging to the globaloptimal solution
Natural Scene Categorization – p.26/32
Maximum entropy models
• Maximum entropy prefers a uniform distribution when nodata are available
• Best model is the one that is:1. Consistent with the constraints imposed by training data2. Makes as few assumptions as possible
• Training dataset: {(x1, y1), (x2, y2), . . . , (xN , yN )}, where xi
represents an image and yi represents a label• Predicate functions
◦ Unigram predicate: co-occurrence statistics of a wordand a label
fv1,LABEL(x, y) =
{
1 if y=LABEL and v1 ∈ x
0 otherwise
Natural Scene Categorization – p.27/32
Maximum entropy models . . .
• Notation◦ f : predicate function◦ p(x, y): empirical distribution of the observed pairs◦ p(y|x): stochastic model to be learnt
• Model fitting: expected value of the predicate function w.r.t.to the stochastic model should equal the expected value ofthe predicate measured from the training data
• Constrained optimization problemMaximize H(p) = −
∑
x,y p(x)p(y|x)logp(y|x)
s.t.∑
x,y p(x, y)f(x, y) =∑
x,y p(x)p(y|x)f(x, y)
• p(y|x) = 1Z(x)exp
∑ki=1 λifi(x, y)
Natural Scene Categorization – p.28/32
Maximum entropy models . . .
• Notation◦ f : predicate function◦ p(x, y): empirical distribution of the observed pairs◦ p(y|x): stochastic model to be learnt
• Model fitting: expected value of the predicate function w.r.t.to the stochastic model should equal the expected value ofthe predicate measured from the training data
• Constrained optimization problemMaximize H(p) = −
∑
x,y p(x)p(y|x)logp(y|x)
s.t.∑
x,y p(x, y)f(x, y) =∑
x,y p(x)p(y|x)f(x, y)
• p(y|x) = 1Z(x)exp
∑ki=1 λifi(x, y)
Natural Scene Categorization – p.28/32
Maximum entropy models . . .
• Notation◦ f : predicate function◦ p(x, y): empirical distribution of the observed pairs◦ p(y|x): stochastic model to be learnt
• Model fitting: expected value of the predicate function w.r.t.to the stochastic model should equal the expected value ofthe predicate measured from the training data
• Constrained optimization problemMaximize H(p) = −
∑
x,y p(x)p(y|x)logp(y|x)
s.t.∑
x,y p(x, y)f(x, y) =∑
x,y p(x)p(y|x)f(x, y)
• p(y|x) = 1Z(x)exp
∑ki=1 λifi(x, y)
Natural Scene Categorization – p.28/32
Results for the maximum entropy model
• Same dataset, feature and codebook as used for the pLSAexperiment
• Evaluation using Zhang Le’s maximum entropy toolkit
• 25-fold cross-validation accuracy: ∼ 70%
• The second best label is often the correct label: accuracyimproves to 85%
Figure 1: Confusion matrices for the maximum
entropy and naive Bayes models
Natural Scene Categorization – p.29/32
Results for the maximum entropy model
• Same dataset, feature and codebook as used for the pLSAexperiment
• Evaluation using Zhang Le’s maximum entropy toolkit
• 25-fold cross-validation accuracy: ∼ 70%
• The second best label is often the correct label: accuracyimproves to 85%
Figure 1: Confusion matrices for the maximum
entropy and naive Bayes models
Natural Scene Categorization – p.29/32
Results for the maximum entropy model
• Same dataset, feature and codebook as used for the pLSAexperiment
• Evaluation using Zhang Le’s maximum entropy toolkit
• 25-fold cross-validation accuracy: ∼ 70%
• The second best label is often the correct label: accuracyimproves to 85%
Figure 1: Confusion matrices for the maximum
entropy and naive Bayes modelsNatural Scene Categorization – p.29/32
A comparative study
Method # of catg. training # per catg. perf(%)
Maximum entropy 5 50 70
pLSA 5 50 46
Naive Bayes’ classifier 5 50 66
Fei-Fei 13 100 64
Vogel 6 ∼100 89.3
Vogel 6 ∼100 67.2
Oliva 8 250 ∼ 300 89
Table 0: A performance comparison with other
studies reported in literature.
Natural Scene Categorization – p.30/32
Future Work
• Further investigations into the pLSA model• Issue of model convergence• DAEM algorithm is not the ideal solution• Using a richer feature set, e.g., bank of Gabor filters• For maximum entropy models, ways to define predicates
that will capture semantic information better
Natural Scene Categorization – p.31/32
THANK YOU
Natural Scene Categorization – p.32/32