2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval.
-
date post
20-Jan-2016 -
Category
Documents
-
view
233 -
download
0
Transcript of 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval.
Flickr Distance
ACM Multimedia 2008
Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li
Microsoft Research AsiaUniversity of Science and Technology of China
October 28, 2008
2
IndexingRankingClusterin
g……Recommendation
Annotation
Multimedia
Information
Retrieval
Multimedia
Information
Retrieval
3
Image Similarity
/Distance
Concept Similarity
/Distance
Annotation
Indexing
Ranking
Clustering
……
Recommendation
4
Image Similarity
/Distance
Concept Similarity
/Distance
Image Similarity/Distance
5
Image Similarity/Distance
Numerous efforts have been made.
Concept Similarity
/Distance
Concept Similarity/Distance
Image Similarity/Distance
6
Concept Similarity/Distance
Olympic
Numerous efforts have been made.
Sports
Cat
Tiger
Paw
More and more used, but not well studied.
7
WordNet Distance
Google Distance
Tag Concurrence Distance
WordNet Distance
8
WordNet150,000 words
WordNet DistanceQuite a few methods to get it in WordNetBasic idea is to measure the length of the path between two words
Pros and ConsPros:
Cons:
Built by human experts, so close to human perception
Coverage is limited and difficult to extend
Google Distance
9
Normalized Google Distance (NGD)Reflects the concurrency of two words in Web documentsDefined as
Pros and ConsPros:Cons:
Easy to get and huge coverage
Only reflects concurrency in textual documents. Not really concept distance (semantic relationship)
10
Concept Pairs
Google Distance
Airplane – Dog 0.2562
Football – Soccer 0.1905
Horse – Donkey 0.2147
Airplane – Airport 0.3094
Car – Wheel 0.3146
Tag Concurrence Distance
11
Reflects the frequency of two tags occur in the same imagesBased on the same idea of NGDMostly is sparse (> 95% are zero in the similarity matrix)
Pros and ConsPros:Cons:
Images are taken into accounta)Tags are sparse so visual
concurrency is not well reflected
b)Training data is difficult to get
similarity matrix: 500 tagssimilarity matrix: 50 tags
Image Tag Concurrence Distance (Qi, Hua,
et al. ACMMM07)
12
Tag Concurrence Distance
0.8532
0.1739
0.4513
0.1833
0.9617
Concept Pairs
Google Distance
Airplane – Dog 0.2562
Football – Soccer 0.1905
Horse – Donkey 0.2147
Airplane – Airport 0.3094
Car – Wheel 0.3146
Different Concept Relationships
13
Synonymydifferent words but the same
meaning
table tennis
ping-pong—
Visually Similarsimilar things or things of same
type
horse donkey
—
Meronymypart and the whole
car wheel—
Concurrencyexist at the
same scene/place
airplane
airport
—
14
Image tag concurrence distance implicitly uses image information, but tags are too sparse
Google distance’s coverage is very high, but it is for text domain
Con
cep
t D
ista
nce
WordNet distance is good, but coverage is too low
Mine from ontology
Mine from text documents
Mine from image tags
15
Can we mine concept distance
from image content?
Some Facts
16
Semantic concept distance is based on human’s cognition
80% of human cognition comes from visual information
There are around 2.8 billion photos on Flickr (by Sep 08)
In average each Flickr image has around 8 tags
To mine concept distance from a large tagged
image collection based on image content
bear, fur, grass, tree polar bear, water, sea polar bear, fighting, usa
Overview of Flickr Distance
17
Concept A: Airplane
Concept B: Airport
Concept Model A
Concept Model B
Flickr Distance (A, B)
Flickr Distance
0.5151
0.0315
0.4231
0.0576
0.0708
18
Flickr Distance is able to cover the four different semantic relationshipsSynonymy, Visually Similar, Meronymy, and Concurrency
What We Need
19
R1: A Good Image CollectionLargeHigh coverage, especially on daily lifeWith tags
What We Need
20
R2: A Good Concept Representation or ModelBased on image contentCan cover wider concept relationshipsCan handle large-concept set
SVM, Boosting, …Discriminati
veGenerative
Global FeatureLocal
Featurew/o Spatial
Relationw/ Spatial Relation
Bag-of-Words (pLSA, LDA), …
2D HMM, MRF, …
Concept Models
What We Need
21
SVM, Boosting, …Discriminati
veGenerative
Global FeatureLocal
Featurew/o Spatial
Relationw/ Spatial Relation
Bag-of-Words, …2D HMM, MRF, …
Concept Models
VLM – Visual Language Model Spatial-relation sensitive
Efficient Can handle object variations
Statistical Language Model
22
I am talking about statisticallanguagemodel.
Unigram Model
Bigram Model
Trigram Model
xnx wPwwwwP 21
121 xxnx wwpwwwwP
2121 xxxnx wwwPwwwwP
Visual Language Model (VLM)
23
Unigram Model
Bigram Model
Trigram Model
xymnxy wPwwwwP 1211
yxxymnxy wwPwwwwP ,11211
1,,11211 yxyxxymnxy wwwpwwwwP
1 0 0 1 0 0 1 0
Image Patch
Patch GradientTexture Histogram Hashing Visual Word
Visual Word Generation
Comparison on Image CategorizationCaltech 8 categories / 5097 images
pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM0
20
40
60
80
100
59 64
88 90 90
Accuracy (%)
Performance of VLM
24pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM
0.00
0.50
1.00
1.50
2.00
2.50
3.00
1.11
2.44
0.44
0.840000000000001
0.14
Training Time (sec/image)
Latent-Topic VLM (1)
25
Why Latent-Topic
Latent-Topic VLMVisual variations of concept are taken as latent topics
Cconceptoftopiclatentkthez
Cconceptinimagejthed
conceptAC
dzPzwwwPdwwwP
thCk
thCj
K
k
Cj
Ck
Ckyxyxxy
Cjyxyxxy
:
:
:
,,1
1,,11,,1
Latent-Topic VLM (2)
26
Latent-Topic VLM TrainingSolved by EM algorithm, The objective function is to maximize the joint distribution of concept and its visual word arrangement Aw
Cd yx
Cjyxyxxy
w
Cj
dwwwP
CApmaximize
,1,,1 ,
,
=
Estimate the posteriors of the hidden topics
Maximize the likelihood of visual arrangement
Performance of LT-VLM
27
Comparison on Image CategorizationCaltech 8 categories / 5097 images
pLSA (BOW)
LDA (BOW) 2D MHMM SVM VLM LT-VLM0
20
40
60
80
100
59 64
88 90 90 94
Accuracy (%)
pLSA (BOW)
LDA (BOW)
2D MHMM SVM VLM LT-VLM0.00
1.00
2.00
3.00
1.11
2.44
0.44
0.8400000000000
01
0.14 0.24
Training Time (sec/image)
Flickr Distance
28
Kullback – Leibler (KL) divergenceGood, but not symmetric
Jensen –Shannon (JS) divergenceBetter, as it is symmetricAnd, square root of JS divergence is a metric, so is Flickr Distance
K
i
K
j zzJSCj
CiFlickr C
jCiPPDCzPCzPCCD
1 1 2121 )|()|()|(),( 2121
l Z
Z
ZZZKL lP
lPlPPPD
Cj
Ci
Ci
CCi
2
1
121 log)(
2)(
2
1)(
2
1)(
11
2121
Ci
Ci
Cj
Ci
CCi
ZZ
ZKLZKLZZJS
PPM
MPDMPDPPD
topic distance
topic distance
concept distance
Procedure of Flickr Distance
29
Concept A: Airplane
Concept B: Airport
Concept Model A
Concept Model B
Flickr Distance (A, B)
Tag search in
Flickr
Jensen-Shannon
Divergence
LT-VLM
Experiments
30
EvaluationObjective evaluationSubjective evaluation
ApplicationsConcept clusteringImage annotationTag recommendation
Experiments - Configurations
31
Images6,400,000 from Flickr
Concepts130,000,000 different tags10,000,000 filtered tags1,000 randomly-selected tags
ComparisonNormalized Google Distance (NGD)Tag Concurrence Distance (TCD)Flickr Distance (FD)
Eva1: Subjective Evaluation
32
Ground-Truth12 persons are asked to score semantic correlation of each concept pairAverage scores are taken as ground-truth
Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all distance pairs D(a,b) and D(c,d)Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth
NGD TCD FD0.470.480.49
0.50.510.520.530.540.550.560.57
Correct Rate
Eva2: Objective Evaluation
33
Ground-TruthWordNet DistanceOnly 497 concepts (overlap of WordNet and the 1000 concepts)
Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all distance pairs D(a,b) and D(c,d)Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth
NGD TCD FD0.45
0.46
0.47
0.48
0.49
0.5
0.51
0.52
0.53
0.54
Correct Rate
App1: Concept Clustering
34
Concept Clustering23 concepts; 3 groups – (1) outer space, (2) animal and (3) sports
Normalized Google Distance Tag Concurrence Distance Flickr Distance
Group1 Group2 Group3 Group 1 Group2 Group3 Group1 Group2 Group3
bearshorsesmoonspace
bowlingdolphindonkeySaturnsharkssnake
softballspidersturtle
Venuswhalewolf
baseballbasketball
footballgolf
soccertennis
volleyball
moonspaceVenuswhale
baseballdonkeysoftball
wolf
basketballbears
bowlingdolphinfootball
golfhorsesSaturnsharkssoccer
spiderstennisturtle
volleyball
moonSaturnspaceVenus
bearsdolphindonkey
golfhorsessharksspiderstenniswhalewolf
baseballbasketball
footballsnakesoccerbowlingsoftball
volleyball
App2: Image Annotation
35
Based on an approach using concept relationDual Cross-Media Relevance Model (DCMRM, J. Liu et al. ACMMM 2007) On 79 concepts / 79,000 images
The number of correctly annotated keywords at the first N words
1 2 3 4
NGD-DCMRM 55 212 212 301
TC-DCMRM 53 186 193 310
FD-DCMRM 57 354 423 960
100300500700900
1100
55
212 212301
53186 193
310
57
354423
960
NGD-DCMRM TC-DCMRM FD-DCMRM
Tota
l n
um
ber
of
corr
ect
keyw
ord
s
App3: Tag Recommendation
36
To Improve Tagging QualityEliminating tag incompletion, noises, and ambiguity500 images / 10 recommended tags per image
NGD Tag Concurrent Distance Flickr Distance0.58
0.6
0.62
0.64
0.66
0.68
0.7
0.72
0.74
0.76
0.78
0.65200000000000
1
0.66500000000000
1
0.75800000000000
1
Precision @ 10
Discussion
37
Why VLM divergence can estimate concept distance?
Why FD works well even tags are not complete?
Computer
TV
Office
room patternscomputer patterns other patterns
room patterns TV patterns other patterns
room patternsscreen patterns other patterns
VLM: distribution of trigrams
38
If we find similar patterns in the images associated with
different concepts,
the corresponding concept relationships can be
discovered.
Computer Office
Summary
39
A novel approach to discover semantic relationships from image contentbased on real-life images from the Webbased on collective intelligence from grassroots
A distance more consistent with human’s perception
A measurement more effective in many applications
Flickr Distance
Future Work
40
Flickr Distance as a Service.
Thank You
41
Backup
42
TagNet
43
TagNet – Visual Concept Net
Can be used in many applicationsKnowledge representationConcept learningMultimedia retrieval...
)(:
)(:
)(:
,,
weightDistanceFlickrWw
edgeiprelationshsemanticEe
nodeconceptVv
WEVG
TagNet
44
VisualizationThe bigger the distance, the longer the edgeUsing a tool called NetDraw provided byInternational Network for
Social Network Analysis
Outline Motivation
Overview
Visual Language Model
Flickr Distance Calculation
Evaluations and Applications
45
Semantic Relationship Is Important
46
Many efforts on using semantic relationshipsGJ Qi et al. Correlative Multi-Label Video Annotation. ACM MM 2007.R. Datta et al. Image Retrieval: Ideas, Influences and the Trends of the New Age. ACM Computing Surveys, 2008.L. Leslie et al. Annotation of Paintings with High-Level Semantic Concepts Using Transductive Inference and Ontology-based Concept Disambiguation. ACM MM 2007.J. Yu et al. Semantic Subspace Projection and Its Application in Image Retrieval. IEEE T CSVT 2008.
Applications of semantic relationshipsNatural language processingObject detectionConcept detectionMultimedia retrieval
Discussion
47
Why VLM divergence can estimate concept distance?
Why FD works well even tags are not complete?
Computer
TV
Office
room patternscomputer patterns other patterns
room patterns TV patterns other patterns
room patternsscreen patterns other patterns
VLM: distribution of trigrams
Flickr Distance is able to cover the four different semantic relationships
Synonymy, Visually Similar, Meronymy, and Concurrency
Text vs. Image
48
Word
Grammar 1-dim dependence Statistical Language Model
Visual word
Visual grammar 2-dim dependence Visual Language Model
Visual Word Generation
49
Typical methodsSIFT + Clustering/PCA
Our methodPatch + Texture Direction Histogram + HashingEfficient, low-dimension, and rotation-Invariant Only need 1/20 computation of SIFT feature
1 0 0 1 0 0 1 0
Image Patch
Patch Gradient
Texture HistogramHashing Visual Word
Performance of VLM
50
Comparison on Image CategorizationCaltech 8 categories / 5097 images (L. Wu, et al. MIR 2007/T-MM 2008)
pLSA (BOW)
LDA (BOW) 2D MHMM SVM VLM LT-VLM0
20
40
60
80
100
59 64
88 90 90 94
Accuracy (%)
pLSA (BOW)
LDA (BOW)
2D MHMM SVM VLM LT-VLM0.00
1.00
2.00
3.00
1.11
2.44
0.44
0.8400000000000
01
0.14 0.24
Training Time (sec/image)
Eva1: Objective Evaluation
51
Ground-TruthWordNet DistanceOnly 497 concepts (overlap of WordNet and the 1000 concepts)
Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all concept triples (A,B,C)Step 2: Get 6 distance pairs for each triple (consider asymmetry)Step 3: Compute the correct ratio of each distance pair in terms of order (not value), compared with ground-truth distance
pair
NGD Ground-TruthC
A
B C
A
B
(AB,AC) x(AB, BC) √(AC, BC) √
Performance of VLM
52
Comparison on Image CategorizationCaltech 8 categories / 5097 images
pLSA (BOW)
LDA (BOW) 2D MHMM SVM VLM LT-VLM0
20
40
60
80
100
59 64
88 90 90 94
Accuracy (%)
pLSA (BOW)
LDA (BOW)
2D MHMM SVM VLM LT-VLM0.00
1.00
2.00
3.00
1.11
2.44
0.44
0.8400000000000
01
0.14 0.24
Training Time (sec/image)
Future Work
53
ScalabilityLarge-scale testingTagNet as a service
Other data“PicNet Distance” based on different dataset / Optimizing datasetIntegrating text/tag concurrency distance and Flickr Distance
Concept modelingHandling scale variations (multiple-resolution)New models
More applicationsTag rankingQuery suggestions