Matching Words and Pictures · 2008. 3. 25. · know exactly the right search terms to get useful...
Transcript of Matching Words and Pictures · 2008. 3. 25. · know exactly the right search terms to get useful...
Matching Words and Pictures
Rongzhou Shen, Danyi Feng, Qing Xia
March 4, 2008
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 1 / 54
Introduction
Text and images are separately ambiguous
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 2 / 54
Introduction
Text and images are separately ambiguous
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 2 / 54
Introduction
Text and images are separately ambiguous
Figure: Ubuntu 7.10 with Compiz Fusion
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 2 / 54
Introduction
Text and images are separately ambiguous
However:They tend not to be when jointly displayed
Figure: Ubuntu 7.10 with Compiz Fusion
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 2 / 54
Outline
Applications
Data Preprocessing
Annotation Models
Correspondence Models
Model Integration
Evaluation
Experiments
Conclusion
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 3 / 54
Applications
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 4 / 54
Applications - Real World
Browsing and searching
Identify faces, naked people, pedestrians or cars
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 5 / 54
Applications - Real World
Browsing and searching
Identify faces, naked people, pedestrians or cars
However, there is a large disparity between user needs and whattechnology supplies
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 5 / 54
Applications - User Requests
A request to a stock photo library
Pretty girl doing something active, sporty in a summery setting, beach -not wearing lycra, exercise clothes - more relaxed in tee-shirt. Feature isabout deodorant so girl should look active - not sweaty but happy, healthy,carefree - nothing too posed or set up - nice and natural looking.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 6 / 54
Applications - User Requests
A request to a stock photo library
Pretty girl doing something active, sporty in a summery setting, beach -not wearing lycra, exercise clothes - more relaxed in tee-shirt. Feature isabout deodorant so girl should look active - not sweaty but happy, healthy,carefree - nothing too posed or set up - nice and natural looking.
So, what do users request from an image matching application?
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 6 / 54
Applications - User Requests
Images both by object kinds and identities
Images by what they depict and by what they are about
Queries based on image histograms, texture, overall appearance, etc.are vanishingly uncommon
Text associated with images is extremely useful in practice
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 7 / 54
Applications - User Requests
Images both by object kinds and identities
Images by what they depict and by what they are about
Queries based on image histograms, texture, overall appearance, etc.are vanishingly uncommon
Text associated with images is extremely useful in practice
Given the user requests, how do we fulfil these requests?
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 7 / 54
Linking Text and Images
BooleanQueries: Does the text link to the image? YES or NO
ProbabilityModels: Does the text link to the image? 83.74% possibility
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 8 / 54
Linking Text and Images
BooleanQueries: Does the text link to the image? YES or NO
ProbabilityModels: Does the text link to the image? 83.74% possibility
A Probability Model would be more beneficial: one doesn’t need toknow exactly the right search terms to get useful results.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 8 / 54
Annotation and Correspondence
One possible way of using probability models is to predict text givenimages.
Annotation Predict annotations of entire images using all informationpresent.
Correspondence Associate particular words with particular imagesubstructures.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 9 / 54
Annotation and Correspondence
One possible way of using probability models is to predict text givenimages.
Annotation Predict annotations of entire images using all informationpresent.
Correspondence Associate particular words with particular imagesubstructures.
So, what specific models are there and how are Annotation andCorrespondence done?
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 9 / 54
Data Preprocessing
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 10 / 54
Input Preprocessing
Each image is segmented using Normalised Cuts (Shi and Malik,2000).
Represent 8 largest regions in each image by computing a set of 40features for each region.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 11 / 54
Input Preprocessing
Each image is segmented using Normalised Cuts (Shi and Malik,2000).
Represent 8 largest regions in each image by computing a set of 40features for each region.
What are the features?
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 11 / 54
Region features
Size Portion of the image covered by the region
Position Coordinates of the region center of mass normalized by theimage dimensions.
Color The average and standard deviation of (R,G,B), (L,a,b) and(r=R/(R+G+B), g=G/(R+G+B)) over the region
Texture The average and variance of 16 filter responses
Shape The ratio of the area to the perimeter squared, the momentof inertia and the ratio of the region area to that of itsconvex hull
We will refer to a region, together with the features, as a blob.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 12 / 54
Annotation Models
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 13 / 54
Annotation Models - Introduction
Multi-Modal Hierarchical Aspect Model
Mixture of Multi-Modal Latent Dirichlet Allocation
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 14 / 54
Multi-Modal Hierarchical Aspect Models
Figure: Multi-modal extension of a hierarchical model for text.
Image regions are generated using a Gaussian distribution.
Words are generated using a multinomial distribution.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 15 / 54
Multi-Modal Hierarchical Aspect Models
A document is modeled by a sum over the clusters, weighted by theprobability that the document is in the cluster.
p(D|d) =∑
c
p(c)∏
w∈W
[∑
l
p(w |l , c)p(l |d)]Nw
Nw,d
∏
b∈B
[∑
l
p(b|l , c)p(l |d)]Nb
Nb,d
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 16 / 54
Multi-Modal Hierarchical Aspect Models
p(D|d) =∑
c
p(c)∏
w∈W
[∑
l
p(w |l , c)p(l |d)]Nw
Nw,d
∏
b∈B
[∑
l
p(b|l , c)p(l |d)]Nb
Nb,d
l, c Levels, Clusters
w Words in document d
b The image regions in document d
D The set of observations for document
W, B The set of words and blobs for the document D = W ∪ BNw
Nw,d, Nb
Nb,dNormalisation constants for differing numbers of words and
blobs in each image.
We will refer to this as model I-0 in the results later on.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 17 / 54
Multi-Modal Hierarchical Aspect Models
We also experimented with allowing a cluster dependent level structurewhere p(l |d) is replaced with p(l |c , d), and refered this as model I-1:
p(D|d) =∑
c
p(c)∏
w∈W
[∑
l
p(w |l , c)p(l |c , d)]Nw
Nw,d
∏
b∈B
[∑
l
p(b|l , c)p(l |c , d)]Nb
Nb,d
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 18 / 54
Multi-Modal Hierarchical Aspect Models
We also experimented with allowing a cluster dependent level structurewhere p(l |d) is replaced with p(l |c , d), and refered this as model I-1:
p(D|d) =∑
c
p(c)∏
w∈W
[∑
l
p(w |l , c)p(l |c , d)]Nw
Nw,d
∏
b∈B
[∑
l
p(b|l , c)p(l |c , d)]Nb
Nb,d
Both model I-0 and model I-1 are not generative, they are specificto the documents in the training set.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 18 / 54
Multi-Modal Hierarchical Aspect Models
Two alternatives for generalization:
Marginalize out the training data
Estimate the mixing weights using a cluster specific averagecomputed during training (model I-2).
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 19 / 54
Multi-Modal Hierarchical Aspect Models
Two alternatives for generalization:
Marginalize out the training data
Estimate the mixing weights using a cluster specific averagecomputed during training (model I-2).
p(D) =∑
c
p(c)∏
w∈W
[∑
l
p(w |l , c)p(l |c)]Nw
Nw,d
∏
b∈B
[∑
l
p(b|l , c)p(l |c)]Nb
Nb,d
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 19 / 54
Multi-Modal Hierarchical Aspect Models
Model Fitting All models are fit using EM algorithm.
Image Based Word Prediction
p(w |B) ∝ p(w ,B)
∝∑
c
p(c)p(w |c)p(B |c)
=∑
c
p(c)[∑
l
p(w |l , c)p(l |c)]∏
b∈B
[∑
l
p(b|l , c)p(l |c)]Nb
Nb,d
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 20 / 54
Mixture of Multi-Modal Latent Dirichlet Allocation
Latent Dirichlet Allocation(LDA)
A generative probabilistic model for independent collections of data whereeach collection is modelled by a randomly generated mixture over latentfactors.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 21 / 54
Mixture of Multi-Modal Latent Dirichlet Allocation
Latent Dirichlet Allocation(LDA)
A generative probabilistic model for independent collections of data whereeach collection is modelled by a randomly generated mixture over latentfactors.
Example
Text modelling: A LDA model with two topics - CAT, DOGCAT: milk, meow, kittenDOG: puppy, bark, boneA document is generated by picking a distribution over topics, and giventhis distribution, picking the topic of each specific word. Then words aregenerated given their topics.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 21 / 54
Mixture of Multi-Modal Latent Dirichlet Allocation
1 Choose one of J mixture components c ∼ Multinomial(η).
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 22 / 54
Mixture of Multi-Modal Latent Dirichlet Allocation
1 Choose one of J mixture components c ∼ Multinomial(η).
2 Conditioned on the mixture component, choose a mixture over Kfactors, θ ∼ Dir(αc).
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 22 / 54
Mixture of Multi-Modal Latent Dirichlet Allocation
1 Choose one of J mixture components c ∼ Multinomial(η).
2 Conditioned on the mixture component, choose a mixture over Kfactors, θ ∼ Dir(αc).
3 For each of the N words:1 Choose one of K factors zn ∼ Multinomial(θ).2 Choose one of V words wn from p(wn|zn, c , β), the conditional
probability of wn given the mixture component and latent factor.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 22 / 54
Mixture of Multi-Modal Latent Dirichlet Allocation
1 Choose one of J mixture components c ∼ Multinomial(η).
2 Conditioned on the mixture component, choose a mixture over Kfactors, θ ∼ Dir(αc).
3 For each of the N words:1 Choose one of K factors zn ∼ Multinomial(θ).2 Choose one of V words wn from p(wn|zn, c , β), the conditional
probability of wn given the mixture component and latent factor.
4 For each of the M blobs:1 Choose a factor sm ∼ Multinomial(θ).2 Choose a blob bm from p(bm|sm, c , µ, Σ), a multivariate Gaussian
distribution with diagonal covariance, conditioned on the factor sm andthe mixture component c .
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 22 / 54
Correspondence Models
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 23 / 54
Correspondence Models - Introduction
It’s natural to want to build models that can predict words for specificimage regions rather than for a whole image, there are several simple ways:
The simplest : Discrete Data Translation
Hierarchical Clustering Model
MoM-LDA
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 24 / 54
Discrete Data Translation
It is inspired by machine translation: a lexicon links discrete objects (wordsin one language) to discrete objects(words in the other language)
Use K-mean to Vector-quantize representations of image regions Eachregion then get a single label( blob token).
Predict words :construct a joint probability table linking word tokento blob token
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 25 / 54
Correspondence from a Hierarchical Clustering Model
Hierarchical clustering models encode this correspondence throughCo-occurrence
Co-occurrence example
The word ”tiger” always co-occurs with an orange stripy region and neverotherwise.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 26 / 54
Correspondence from a Hierarchical Clustering Model
Hierarchical clustering models encode this correspondence throughCo-occurrence
Co-occurrence example
The word ”tiger” always co-occurs with an orange stripy region and neverotherwise.
Word prediction
p(w |b) ∝∑
c
p(c)∑
l
p(l)p(w |l , c)p(b|l , c) (Region only)
p(w |b) ∝∑
c
p(c |B)∑
l
p(l)p(w |l , c)p(b|l , c) (Region cluster)
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 26 / 54
Integrating correspondence and
Hierarchical clustering
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 27 / 54
Integrating Correspondence and Hierarchical Cluster
Problem in translating from image region to word is similar with machinetranslation.
Problem in translating from English to French
sun → soleilBut if the surrounding words are computer relatedsun → sun
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 28 / 54
Integrating Correspondence and Hierarchical Cluster
Problem in translating from image region to word is similar with machinetranslation.
Problem in translating from English to French
sun → soleilBut if the surrounding words are computer relatedsun → sun
Solution
Build explicit correspondence information into existing hierarchicalclustering model
1 Linking Word Emission and Region Emission Probabilities withMixture Weights
2 Paired Word and Region Emission at Nodes
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 28 / 54
Linking Word Emission and Region Emission Probabilities
with Mixture Weights
To implement this strategy by having the vertical mixture weights for theregions carries over to that for words.
p(D|d) =∑
c
p(c)∏
w∈W
[∑
l
p(w |l , c)p(l |B , c , d)]Nw
Nw,d
∏
b∈B
[∑
l
p(b|l , c)p(l |d)]Nb
Nb,d
wherep(l |B , c , d) ∝
∑
b∈B
p(l |b, c , d)
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 29 / 54
Linking Word Emission and Region Emission Probabilities
with Mixture Weights
To implement this strategy by having the vertical mixture weights for theregions carries over to that for words.
p(D|d) =∑
c
p(c)∏
w∈W
[∑
l
p(w |l , c)p(l |B , c , d)]Nw
Nw,d
∏
b∈B
[∑
l
p(b|l , c)p(l |d)]Nb
Nb,d
wherep(l |B , c , d) ∝
∑
b∈B
p(l |b, c , d)
Dependence Models
This model is Model D-0p(l |c , d) =⇒ p(l |d)get D-1 cluster dependent level distributionsp(l) =⇒ p(l |d) get D-2 drop the dependency on training set
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 29 / 54
Paired Word and Region Emission at Nodes
This method further tightens the relationship between the regions andwords. Then the observed words and regions are emitted in pairsD = {(w , b)}
p(D|d) =∑
c
p(c)∏
(w ,b)∈D
[∑
l
p((w , b)|l , c)p(l |d)]
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 30 / 54
Paired Word and Region Emission at Nodes
This method further tightens the relationship between the regions andwords. Then the observed words and regions are emitted in pairsD = {(w , b)}
p(D|d) =∑
c
p(c)∏
(w ,b)∈D
[∑
l
p((w , b)|l , c)p(l |d)]
Correspondence Models
This model is Model C-0Model I-1 and I-2 are similarly modified to get C-1 and C-2
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 30 / 54
Paired Word and Region Emission at Nodes
This method further tightens the relationship between the regions andwords. Then the observed words and regions are emitted in pairsD = {(w , b)}
p(D|d) =∑
c
p(c)∏
(w ,b)∈D
[∑
l
p((w , b)|l , c)p(l |d)]
Correspondence Models
This model is Model C-0Model I-1 and I-2 are similarly modified to get C-1 and C-2
The correspondence is not provide in training data, we need toestimate correspondence as part of the training process
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 30 / 54
Estimate the correspondence
All methods for computing the correspondence assume that the probabilitythat a word and a segment correspond can be estimated by the probabilitythat they are emitted from the same node.Using w ⇔ b to denote that the word,w , and the region, b, correspond, incase C-0
p(w ⇔ b) ≈∑
c
p(c)∑
l
p((w , b)|l , c)p(l |d)
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 31 / 54
Correspondence Issues
Issues
The choice of correspondence model, Should there be a one-one mapbetween regions and words(Usually impossible.), there is no option ofdeciding that a region corresponds to no word.
Solution
A special word NULL
Fertility: one to many
Refusal to predict: when the annotation with the highest probabilitygiven the region has too low a probability
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 32 / 54
Evaluation
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 33 / 54
Evaluation
Measuring Annotation Performance:Compare the words predicted by models with words actually presentfor held-out data
Measuring Correspondence Performance:Examine whether we predict appropriate words for each particularregion
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 34 / 54
Evaluation-Annotation Models
Empirical word frequency
Relative to prediction performance
Matching the performance empirical density
A sensible indicator
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 35 / 54
Evaluation-Annotation Models
Empirical word frequency
Quality of word posterior distribution
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 36 / 54
Evaluation-Annotation Models
Empirical word frequency
Quality of word posterior distribution
Estimate the Kullback-Leibler divergence between predictivedistribution q(w |B),and target distribution p(w).
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 36 / 54
Evaluation-Annotation Models
Empirical word frequency
Quality of word posterior distribution
Estimate the Kullback-Leibler divergence between predictivedistribution q(w |B),and target distribution p(w).
E(model)KL =
∑w∈vocabulary p(w)log p(w)
q(w |B
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 36 / 54
Evaluation-Annotation Models
Empirical word frequency
Quality of word posterior distribution
Estimate the Kullback-Leibler divergence between predictivedistribution q(w |B),and target distribution p(w).
E(model)KL =
∑w∈vocabulary p(w)log p(w)
q(w |B
= constant − 1K
∑w∈observed logq(w |B).
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 36 / 54
Evaluation-Annotation Models
Empirical word frequency
Quality of word posterior distribtion
Goodness of the model:
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 37 / 54
Evaluation-Annotation Models
Empirical word frequency
Quality of word posterior distribtion
Goodness of the model: loss function
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 37 / 54
Evaluation-Annotation Models
Empirical word frequency
Quality of word posterior distribtion
Goodness of the model: loss function
Tradintinoal zero-one loss is highly misleading
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 37 / 54
Evaluation-Annotation Models
Empirical word frequency
Quality of word posterior distribtion
Goodness of the model: loss function
Tradintinoal zero-one loss is highly misleading
EmodelNS = r/n − w/(N − n)
N the vocabulary sizen the number of actual words for the imager the number of words predicted correctly
w the number of words predicted incorrectly
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 37 / 54
Evaluation-Annotation Models
Empirical word frequency
Quality of word posterior distribtion
Goodness of the model: loss function
Tradintinoal zero-one loss is highly misleading
EmodelNS = r/n − w/(N − n)
N the vocabulary sizen the number of actual words for the imager the number of words predicted correctly
w the number of words predicted incorrectly
EmodelPR = r/n, based on the n best words.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 37 / 54
Evaluation-Annotation Models
Empirical word frequency
Quality of word posterior distribtion
E(model)KL = constant − 1
K
∑w∈observed logq(w |B).
Goodness of the model:loss function
EmodelNS = r/n − w/(N − n)
EmodelPR = r/n, based on the n best words.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 38 / 54
Evaluation-Annotation Models
Empirical word frequency
Quality of word posterior distribtion
E(model)KL = constant − 1
K
∑w∈observed logq(w |B).
Goodness of the model:loss function
EmodelNS = r/n − w/(N − n)
EmodelPR = r/n, based on the n best words.
EKL = 1K
∑data (E empirical
KL − EmodelKL )
ENS = 1K
∑data (E empirical
NS − EmodelNS )
EPR = 1K
∑data (E empirical
PR − EmodelPR )
Negtive → worse; positive → better
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 38 / 54
Evaluation-Correspondence Models
Manual correspondence scoring
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 39 / 54
Evaluation-Correspondence Models
Manual correspondence scoring
Limit the size of the dataset
Contain significant noise
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 39 / 54
Evaluation-Correspondence Models
Manual correspondence scoring
Limit the size of the dataset
Contain significant noise
Assumption
A method that cannot predict annotations accurately is unlikely to predictcorrespondence well.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 39 / 54
Evaluation-Correspondence Models
Manual correspondence scoring
Limit the size of the dataset
Contain significant noise
Using annotation as a proxy.
Assumption
A method that cannot predict annotations accurately is unlikely to predictcorrespondence well.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 39 / 54
Evaluation-Correspondence Models
Using annotation as a proxy
Image based methods
Region based methods
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 40 / 54
Evaluation-Correspondence Models
Using annotation as a proxy
Image based methodsCompute the annotation in the natural way.
Region based methods
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 40 / 54
Evaluation-Correspondence Models
Using annotation as a proxy
Image based methodsCompute the annotation in the natural way.
Region based methodsConsider each region seperately.Use image based annotation method to computre region wordposteriors.Marginalize out the correspondence of each region from the model,and test it as an annotation model.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 40 / 54
Experiments
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 41 / 54
Experiments
Use images from 160 CD’s from Corel image data set.
75% training sets, 25% “standard” held out sets.
Exclude words occur less than 20 times in test set.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 42 / 54
Experiments
Annotation Results.
Correspondence Results
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 42 / 54
Experiments
Annotation Results.
KL-divergence score measure
Normalized score measure
Comparison of models using different scores
Correspondence Results
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 42 / 54
Annotation Results
KL-divergence score measure
Normalized score measure.
Comparison of models using different scores.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 43 / 54
Annotation Results
KL-divergence score measure
Check for overfitting.Number of iteration.
Normalized score measure.
Comparison of models using different scores.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 43 / 54
Annotation Results
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 44 / 54
Annotation Results
KL-divergence score measure
Check for overfitting:Number of iteration:
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 45 / 54
Annotation Results
KL-divergence score measure
Check for overfitting: not a big problemNumber of iteration:
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 45 / 54
Annotation Results
KL-divergence score measure
Check for overfitting: not a big problemNumber of iteration: performance on training set improved withnumber of iteratrions.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 45 / 54
Annotation Results
KL-divergence score measure
Check for overfitting: not a big problemNumber of iteration: performance on training set improved withnumber of iteratrions.
Normalized score measure
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 45 / 54
Annotation Results
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 46 / 54
Annotation Results
KL-divergence score measure
Check for overfitting: not a big problemNumber of iteration: performance on training set improved withnumber of iteratrions.
Normalized score measure
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 47 / 54
Annotation Results
KL-divergence score measure
Check for overfitting: not a big problemNumber of iteration: performance on training set improved withnumber of iteratrions.
Normalized score measureGo from zero to some peak, and then to drop down to zero again.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 47 / 54
Annotation Results
KL-divergence score measure
Check for overfitting: not a big problemNumber of iteration: performance on training set improved withnumber of iteratrions.
Normalized score measureGo from zero to some peak, and then to drop down to zero again.
Comparison of models using different scores.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 47 / 54
Annotation Results
Annotation results are compared between models.Measure PR:
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 48 / 54
Annotation Results
Annotation results are compared between models.Measure NS:
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 48 / 54
Annotation Results
Annotation results are compared between models.Measure KL:
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 48 / 54
Annotation Results
KL-divergence score measure
Normalized score measure
Comparison of models using different scores.
Words are generallyassociated with pieces of images.Methods using clustering are reliant on having images which are closeto the training data.The result for the MoM-LDA model are worth nothing.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 49 / 54
Correspondence Result
Study the keyword prediction error,E empiricalPR − Emodel
PR
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 50 / 54
Correspondence Result
Study the keyword prediction error,E empiricalPR − Emodel
PR
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 50 / 54
Correspondence Result
Study the keyword prediction error,E empiricalPR − Emodel
PR
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 50 / 54
Correspondence Result
Study the keyword prediction error,E empiricalPR − Emodel
PR
Paired word-blob emission approach improves correspondenceperformance over annotation performance.
For both correspondence and annotation, linear-D-0-region-onlymethod appears to be the best overall choice.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 50 / 54
Discussion
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 51 / 54
Discussion
The Corel data set is simple
annotations are nounsvocabulary is small
Performance is almost certainly affected by the correspondence modelused
Have little information about the effect of supervision
Large scale evaluation of correspondence models is genuimely difficultthe key issue seems to be the entropy of the labels.
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 52 / 54
Thank you!
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 53 / 54
Any Questions?
Rongzhou Shen, Danyi Feng, Qing Xia () Matching Words and Pictures March 4, 2008 54 / 54