What Are the High-Level Concepts with Small Semantic Gaps?
description
Transcript of What Are the High-Level Concepts with Small Semantic Gaps?
What Are the High-Level Concepts with Small Semantic
Gaps?
CS 4763 Multimedia System, Spring 2008
Outline Introduction LCSS: a lexicon of high-level
concepts with small semantic gaps. Framework of LCSS Construction Methodologies
Experimental results Conclusion
Recent years have witnessed a fast development of Multimedia Information Retrieval (MIR) .
Semantic gap is still a fundamental barrier. --Difference between the expressing power or
descriptions of low-level features and high-level semantic concepts.
Introduction
To reduce the semantic gap, concept-based multimedia search has been introduced.
Select a concept lexicon relatively easy for computers to understand
Collect training data Model high-level semantic concepts
Introduction
Problem Concept Lexicon selection is usually simplified by
manual selection or totally ignored in previous works.
i.e. Caltech 101, Caltech 256, PASCAL --Implicitly favored those relatively “easy” concepts
MediaMill challenge concept data (101 terms) and Large-Scale Concept Ontology for Multimedia (LSCOM) (1,000 concepts)
-- Manually annotated concept lexicon on broadcast news video from the TRECVID benchmark
All the lexica ignore the differences of semantic gaps among concepts.
No automatic selection is executed.
Concepts with smaller semantic gaps are likely to be better modeled and retrieved than concepts with larger ones.
Very little research is found on quantitative analysis of semantic gap.
What are the well-defined semantic concepts for learning?
How to automatically find them?
Problem
Objective Automatically construct a lexicon of high-level concepts with
small semantic gap (LCSS).
Two properties for LCSS: 1) Concepts are commonly used. --The words have high occurrence frequency within the descriptions of real-world images.
2) Concepts are expected to be visually and semantically consistent. -- Images of these concepts have smaller semantic gaps (easy to be modeled for retrieval and annotation)
Web images have rich textual features
--filename, title, alt text, and surrounding text.
Input titles and comments are good semantic descriptions of the images
These textual features are much closer to the semantics of the images than visual features.
Idea
Framework of LCSS Construction
Web Image Crawling,Visual & Text Indexing
Images on World Wide Web Database
Visual IndexSystem
SurroundingText Index
System
Ij
Ik
Ix )(_ , jx IItsim
)(_ , kx IItsim
Confidence Map
Re-Rank based on
Confidence Score
Word_ 1Word_ 2Word_ 3Word_ 4Word_ 5Word_ 6Word_ 7Word_ 8Word_ 9Word_ 10Word_ 11
...
Concepts Lexicon
Words Ranka b c d e f g
h i
j k
Construct Content and Context Sparse
Similarity Matrix
Text-Based Keyword
Extraction Affinity Propagation Clustering
I1
I1 I2 I3 I4 I5 I6 I7 I8 ...
I2
I3
I4
I5
I6
I7
I8
...
2
7
3
8
5
6
3
1
9
12 2
3
34
Data Collection
2.4 million web images from photo forums --Photo.net, Photosig.com etc Photos have high quality and rich textual information 64 dimensional global visual feature --color moments, color correlogram and color-texture moments
Framework of LCSS Construction
Web Image Crawling,Visual & Text Indexing
Images on World Wide Web Database
Visual IndexSystem
SurroundingText Index
System
Ij
Ik
Ix )(_ , jx IItsim
)(_ , kx IItsim
Confidence Map
Re-Rank based on
Confidence Score
Word_ 1Word_ 2Word_ 3Word_ 4Word_ 5Word_ 6Word_ 7Word_ 8Word_ 9Word_ 10Word_ 11
...
Concepts Lexicon
Words Ranka b c d e f g
h i
j k
Construct Content and Context Sparse
Similarity Matrix
Text-Based Keyword
Extraction Affinity Propagation Clustering
I1
I1 I2 I3 I4 I5 I6 I7 I8 ...
I2
I3
I4
I5
I6
I7
I8
...
2
7
3
8
5
6
3
1
9
12 2
3
34
Nearest Neighbor Confidence Score (NNCS)
The higher the NNCS value, the smaller the semantic gap.
Calculate NNCS Score for all 2.4 million images with K=500
Select 36231 candidate images with top NNCS
--its relatively large size and memory concern for the Affinity Propagation clustering algorithm.
K
iixx IItextsim
KINNCS
1,_
1
Framework of LCSS Construction
Web Image Crawling,Visual & Text Indexing
Images on World Wide Web Database
Visual IndexSystem
SurroundingText Index
System
Ij
Ik
Ix )(_ , jx IItsim
)(_ , kx IItsim
Confidence Map
Re-Rank based on
Confidence Score
Word_ 1Word_ 2Word_ 3Word_ 4Word_ 5Word_ 6Word_ 7Word_ 8Word_ 9Word_ 10Word_ 11
...
Concepts Lexicon
Words Ranka b c d e f g
h i
j k
Construct Content and Context Sparse
Similarity Matrix
Text-Based Keyword
Extraction Affinity Propagation Clustering
I1
I1 I2 I3 I4 I5 I6 I7 I8 ...
I2
I3
I4
I5
I6
I7
I8
...
2
7
3
8
5
6
3
1
9
12 2
3
34
Clustering Using Affinity Propagation
Fast for large scale data set and require no prior information (e.g., number of clusters).
36231×36231 content-context similarity matrix (CCSM)
P
Ij
V1
V2
V3
V4
V5
T1
T2
T3
T4 T5Ii
Ik
-9 -8
-9
-4
-1
-5
.
.
.
. . .
-1
-6
Framework of LCSS Construction
Web Image Crawling,Visual & Text Indexing
Images on World Wide Web Database
Visual IndexSystem
SurroundingText Index
System
Ij
Ik
Ix )(_ , jx IItsim
)(_ , kx IItsim
Confidence Map
Re-Rank based on
Confidence Score
Word_ 1Word_ 2Word_ 3Word_ 4Word_ 5Word_ 6Word_ 7Word_ 8Word_ 9Word_ 10Word_ 11
...
Concepts Lexicon
Words Ranka b c d e f g
h i
j k
Construct Content and Context Sparse
Similarity Matrix
Text-Based Keyword
Extraction Affinity Propagation Clustering
I1
I1 I2 I3 I4 I5 I6 I7 I8 ...
I2
I3
I4
I5
I6
I7
I8
...
2
7
3
8
5
6
3
1
9
12 2
3
34
Text-based Keyword Extraction (TBKE) The set of the related keywords of cluster is denoted as .
The relevance score of a keyword to cluster is denoted as:
For each keyword , its relevance score to the whole cluster pool C can be denoted as
iji
i
ij
ijWkorW
otherwiseW
Ckoccurence
CkrScore0
)1|ln(|
),(
),(_
CC
ijj
i
CkScorekScore ),()(
iC iW
jk iC
jk
LCSS
Table 1. Top 50 keywords in the LCSS lexicon
Category Concepts
Scene/Landscape sunset, sky, beach, garden, lake, sunflow, water firework, cloud, moon, sunrise, mountain, city, river, snow, rain, home, island
Object flower, rose, butterfly, tree, bee, candle, bridge, leaf, eye, tulip, orchid, house, peacock, window, glass, bird, rock
Color blue, red, yellow, green, pink, purple, orange, dark, golden
Season fall, spring, summer, autumn
Others small, wild
Experiments
00.010.020.030.040.050.060.070.080.090.1
sunset
flow
erblue red
rose
yellow
green
sky
pink
butterfly
tree
beach
garden
water
cloud
Co
nfi
den
ce S
core
For each keyword w, randomly selected 500 titled photos with this keyword.
The confidence value decreases similarly to the keyword rank’s depreciation.
Demonstrates that the image labeled with the top words have higher confidence value.
Figure 1: Distribution of average confidence value. x-axis represents top 15 keywords (from left to right) in LCSS.
Experiments Apply the lexicon on
the University of Washington (UW) dataset
(1109 images, 5 labeled ground truth annotations, 350 unique words)
Refine the annotation results obtained by the search-based image annotation algorithm (SBIA).
Figure 2: Image annotation refinement scenario .
Retrieve
Annotation
Annotation Relevance Reranking
RoseRed
Flower...
keyword search
Input Image
Visual Feature Extraction
Search Engine
Visual IndexSystem
1 2345
…
Word_a 1Word_b 2Word_c 3Word_d 4Word_e 5Word_f 6Word_g 7Word_h 8Word_i 9Word_j 10Word_k11
...
LexiconRelevanceMapping
Words Rank
Words Rank
Annotation Pruning
OR
Final Annotation
Red rose
Blooming rose
Last red rose
One more rose
Word_eWord_bWord_fWord_hWord_g
Annotation Refinement
Text IndexSystem
Surrounding
Experiments
The refinement distinctively improves original annotation’s precision and recall when s becomes larger.
The performance of refinement keeps stable when
s is equal or larger than 100.
Figure 3: Annotation precision and recall of different sizes of lexicon .
0.09
0.1
0.11
0.12
0.13
0.14
0 20 40 60 80 100 120 140 160 180 200
Phrase_rerank Phrase_pruneTerm_rerank Term_prune
Pre
cisi
on
s
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
0 20 40 60 80 100 120 140 160 180 200
Phrase_rerank Phrase_prune
Term_rerank Term_prune
s
Re
call
Experiments
When m is ranging from 3 to 7, the Precision and Recall of refined annotation are improved most .
The Precision and Recall of the annotation pruning
remains same especially while m is larger than 7. --most of top 7 annotation words fall into the
LCSS.
Figure 4: Annotation precision and recall of different sizes of m .
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
0.16
1 2 3 4 5 6 7 8 9 10
Original_phrase Phrase_rerank Phrase_pruneOriginal_term Term_rerank Term_prune
Prec
isio
n
m
0.020.040.060.080.10.120.140.160.18
1 2 3 4 5 6 7 8 9 10
Original_phrase Phrase_rerank Phrase_pruneOriginal_term Term_rerank Term_prune
m
Reca
ll
Experiments
LSCOM: Large Scale Concepts Ontology for Multimedia. WordNet: A very large lexical database of English.
(100,303)
LSCOM and WordNet do not improve the annotation precision.
--Many correct annotations are not included in these two lexica thus are pruned.
Figure 4: Annotation precision and recall of different lexica.
0.04
0.06
0.08
0.1
0.12
0.14
0.16
1 2 3 4 5 6 7 8 9 10
Original_phrase LCSSLSCOM WordNet
Prec
isio
n
m
00.020.040.060.080.10.120.140.160.18
1 2 3 4 5 6 7 8 9 10
Original_phrase LCSSLSCOM WordNet
Rec
all
m
ConclusionQuantitatively study and formulate the semantic gap problem.
Propose a novel framework to automatically select visually and semantically consistent concepts.
LCSS is the first lexicon for concepts with small semantic gap great help for data collection and concept
modeling. a candidate pool of semantic concepts --image annotation, annotation refinement and
rejection. Potential applications in query optimization and
MIR.
Future Work Investigate different features to construct
feature-based lexica Texture feature Shape feature SIFT feature
Open questions How many semantic concepts are necessary? Which features are good for image retrieval
with specific concept? ……
Reference C. G. Snoek, M. Worring, J. C. van Gemert, J. M. Geusebroek, and A. W.
Smeulders. “The challenge problem for automated detection of 101 semantic concepts in multimedia.” Proc. of ACM Multimedia, 2006.
C. M. Naphade, J. R. Smith, J. Tesic, and S. F. Chang, et al. “Large-scale concept ontology for multimedia.” IEEE MultiMedia, 13(3):86–91, 2006.
B. J. Frey, and D. Dueck. Clustering by passing messages between data points. Science, 315:972-976, 2007.
X. J. Wang, L. Zhang, F. Jing, and W. Y. Ma. “AnnoSearch: image auto-annotation by search”. Proc. of IEEE Conf. CVPR, New York, June, 2006.
C. Wang, F. Jing, L. Zhang, and H. J. Zhang. “ Scalable search-based image annotation of personal images. ” Proc. of the 8th ACM international Workshop on Multimedia information Retrieval, Santa Barbara, CA, USA, 2006. 10.