Multimedia Data Management M - unibo.it
Transcript of Multimedia Data Management M - unibo.it
ALMA MATER STUDIORUM - UNIVERSITÀ DI BOLOGNA
Multimedia Data Management M
Second cycle degree programme (LM) in Computer Engineering University of Bologna
Semantic Multimedia
Data Annotation Home page: http://www-db.disi.unibo.it/courses/MDM/ Electronic version: 6.01.MultimediaDataAnnotation.pdf Electronic version: 6.01.MultimediaDataAnnotation-2p.pdf
I. Bartolini
Semantic gap in MM data retrieval Automatic annotation of MM data Imagination case study
Outline
2 2 I. Bartolini Multimedia Data Management M
Till now we have mostly focused on efficiency aspects i.e., “How to efficiently execute a MM query?”
It is now time to also consider the effectiveness of the MM data retrieval process, which includes everything related to the user expectation! Effectiveness in term of: quality of result objects availability of simple but powerful tools, able to smooth the processes of
query formulation/personalization result interpretation
Back to our main problem: MM data annotation
3 3 I. Bartolini Multimedia Data Management M
The semantic gap problem Characterizing the object content by means of low level features (e.g.,
color, texture, and shape of an image) represents a completely automatic solution to MM data retrieval However low level feature are not always able to properly characterize the
semantic content of objects e.g., two images should be considered “similar” even if their semantic content is
completely different
This is due to the semantic gap existing between the user subjective notion of similarity and the one according to which a low level features-based retrieval system evaluate two objects to be similar prevents to reach 100% precision results
4 4 I. Bartolini Multimedia Data Management M
Possible solution (Semi-)automatically provide a semantic characterization (e.g., by means
of keywords or tags) for each object able to capture its content e.g., ([sky, cheetah] vs. [sky, eagle])
Combine visual features with tags by taking the best of the two approaches [LSD+06, LZL+07, DJL+08]
[sky, cheetah] [sky, eagle]
5 5 I. Bartolini Multimedia Data Management M
Automatically infer semantics to MM objects
Automatic objects annotation requires user intervention 1) Relevant feedback Exploiting user feedback to understand which are real relevant objects
to the query… will see how very soon 2) Learning The system is trained by means of a set of objects that are manually
annotated by the user (training phase) Exploiting the training set, the system is able to predict labels for
uncaptioned objects: the test object is compared to training objects; labels associated to the “best” objects are proposed for labeling (labeling & testing phases)
user
? sky, rock, ground, desert, Monument
Valley
DB images
6 6 I. Bartolini
Learning approaches
Classification MM object annotation as a problem of object classification (e.g., using
Bayesian networks, SVM, K-means clustering etc.) in which each label is treated as a distinct class
“Given the feature vector of a new object I, which is the probability that I belongs to class w?”
Graph-based MM object annotation as a problem of graph exploration !A graph is a representation of a set of objects connected by links. The
interconnected objects are represented by vertices/nodes, and the links that connect vertices are called edges Nodes represents images and its attributes (e.g., corresponding regions, associated
labels, etc.) Edges are node-attribute relations
“Starting the navigation from the node representing the new image I, which is the probability to cross image node w?”
7 7 I. Bartolini Multimedia Data Management M
Automatically infer semantics to MM objects is still an open challenge Image annotation techniques differ in: 1. What “annotation” means
Enriching images with a set of tags/labels Providing a rich semantic description through the concepts of a
full-fledged RDF ontology !An ontology is a formal framework for representing knowledge; it names and
defines the types, properties, and interrelationships of the entities in a specific domain entities are conceptualizations of phenomena
!Resource Description Framework (RDF) is theW3C reference language for ontologies Based on triples “(subject–predicate–object): the subject denotes the
resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object
2. What kind of tags/concepts are provided General-purpose system System tailored to discover only specific concepts/classes
Peculiarities on semantic annotation (1)
8 8 I. Bartolini Multimedia Data Management M
Peculiarities on semantic annotation (2) 3. Annotation “granularity” Single keyword
e.g., for images, at image level Multiple keywords
e.g., for images, at image level or at region level
9 9 I. Bartolini Multimedia Data Management M
Image annotation problem How current commercial systems tackle the problem: Image search extensions of Google and Yahoo consider the original Web
context, e.g.: file name title surrounding text
to support keyword-based search Microsoft’s Photo Gallery, Google Picasa, and Yahoo’s Flickr rely on
user-provided tags or labels Apple iPhoto uses meta-data and user provided annotations
Google similar images labs allows users to search for images using
pictures rather than words (i.e., to find other images that look like the selected one)
10 10 I. Bartolini Multimedia Data Management M
Imagination case study [BC08a]
Imagination: IMAGe (semI-)automatic anNotATION Images as set of regions
Labels are tags which are associated at the image level Graph-based approach (à la Page Rank) 3-level of graph objects
Images Regions with low level features (i.e., color and texture) Tags assigned to images
plus K-NN links computed on region similarities
“Given a new image provide tags that are affine to the image and semantically correlated to each other”
11 11 I. Bartolini Multimedia Data Management M
DB images
I1 I2 I3
Intuitive example
regions
Iq
new image
deer, grass, bush
bear, rock, grass, water, ground
rock, bear, grass, water, ground
tags ?, …, ?
12 12 I. Bartolini Multimedia Data Management M
13
R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16
T4 T5 T6 T7 T1 T2 T3
I1 I2 I3
deer grass bush bear rock water ground
Iq
Mixed Media Graph (MMG) construction GMMG
new image
DB images
deer, grass, bush bear, rock, grass, water, ground
rock, bear, grass, water ground 13 13 I. Bartolini
R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15
T4 T5 T6 T7 T1 T3
I3
deer bear rock water ground
Random Walk with Restart (RWR) [PYF+04]
restart!
Iq
R16 R8
I2
GMMG
I1
restart at the query node (with probability p) randomly walk to one link (with probability 1-p) For each tag node a relative frequency (i.e., the affinity) is computed approximating the steady state probability
T2
grass bush
14 14 I. Bartolini Multimedia Data Management M
Why tags correlation?
MMG + RWR heavily relies of NN edges involving the new image (i.e., low level features) If a region of the new image is highly similar to a region of GMMG, which
however has some terms unrelated, this might easily lead to have such tags highly scored! uncorrelated tags, or even contradictory
Tags correlation modeled as “co-occurrence of pairs of tags” within the DB images e.g., PSimRank algorithm (Fogaras et al. 2005)
new image
MMG + RWR
grass deer sheep horse cow ground
predicted tags
15 15 I. Bartolini Multimedia Data Management M
Analyzing correlations of tags
Link analysis on a sub-graph of GMMG to find highly-correlated tags bipartite graph GT second-order bipartite graph GT
2
T4 T5 T6 T7 T1 T2 T3
I1 I2 I3
deer grass bush bear rock water ground
R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13
GMMG
GT
16 16 I. Bartolini Multimedia Data Management M
Intuitive example
T4
T5
T1
T2
T3
I1
I2
tiger
water
grass
rock
bear
GT
(water, tiger)
(grass, tiger)
(grass, water)
(grass, grass)
(rock, tiger)
(rock, water)
(bear, tiger)
(rock, rock)
(rock, grass)
(bear, water)
(bear, grass)
(bear, rock)
(I1, I2)
(I1, I1)
(I2, I2)
(grass, grass)
(rock, water)
G2T
“an edge between nodes (Ii,Ij) and (Tr,Ts) is added iff the two edges (Ii,Tr) and (Ij,Ts) equivalently, (Ii,Ts) and (Ij,Tr)) are in GT”
node u
edges E(u)
17 17 I. Bartolini Multimedia Data Management M
PSimRank algorithm (Fogaras et al. 2005)
(water, tiger)
(grass, tiger)
(grass, water)
(grass, grass)
(rock, tiger)
(rock, water)
(bear, tiger)
(rock, rock)
(rock, grass)
(bear, water)
(bear, grass)
(bear, rock)
(I1, I2)
(I1, I1)
(I2, I2)
(grass, grass)
(rock, water)
G2T
A similarity score is computed for each tags node of G2
T “two nodes are similar if they are referenced
by similar nodes” two tags are similar if they are present in similar
images two images are similar if they contain similar
tags The process is independent from the query
node off-line
Tags
Tags
Similarity [0,1]
edges E(u)
node u 18 18 I. Bartolini Multimedia Data Management M
Putting it all together PT tags with highest steady state probability returned by MMG + RWR
step are reduced considering tags correlation We model the problem as an instance of the Maximum Weight Clique
Problem [BBP+99]
An edge is added between two nodes if their correlation score exceeds a threshold c
19 19 I. Bartolini Multimedia Data Management M
How to compare two images based on its annotation (i.e., its tags)? Exact text matching
simple not flexible
Semantic-based image retrieval (1)
flower
petunia
“flower”
Query:
…
…
22 22 I. Bartolini Multimedia Data Management M
Is it possible to perform better? Yes! Exploiting the semantic relations of keywords belonging to a preexistent
taxonomy or ontology (e.g., WordNet [Mil95]) and applying a “fuzzy” text matching still simple more flexible more accurate
Semantic-based image retrieval (2)
flower petunia Flower
Petunia Orchid … Poppy
Flowering Plant
Seed Plant
…
…
“Petunia is a flower!”
“flower” Query:
… …
23 23 I. Bartolini Multimedia Data Management M
Semantic relations Among possible semantic relations hyponymy/hypernymy (also indicated with subset/superset, or the ISA
relation) Define hierarchy
…
mammal
bear feline canine
brown bear black bear cat dog fox
“A bear is a mammal”
24 24 I. Bartolini Multimedia Data Management M
Semantic similarity Problem: Quantify the similarity between two terms of the hierarchy
(e.g., brown bear and feline)
Sim (t1,t2) = 2 * level(common-father)
level(t1) + level(t2)
common-father
t3 t2
t1
level(com-father)
level(t2)
level(t1)
mammal
bear feline canine
brown bear black bear cat dog fox
The similarity between a term of the hierarchy and terms belonging its sub-tree is equal to one (sim=1)
25 25 I. Bartolini Multimedia Data Management M
Semantic relaxation
What happens if I am looking for “bear” images but neither “bear” images nor “specialized term of bear” images are present in the DB? To avoid empty semantic result, semantic relaxation is
defined A weight for each level of the hierarchy (starting from the query term)
represents the percentage of semantic relaxation the user is willing to accept
weights
ROOT
animal
mammal
carnivore
bear
W8=42
W7=55
W6=68
W0=100
26 26 I. Bartolini Multimedia Data Management M
Example
Carnivore Liv=7 Sim=0,933
Bear Liv=8
Feline Liv=8 Sim=0,875
Canine Liv=8 Sim=0,875
Brown bear Liv=9 Sim=1
Black bear Liv=9 Sim=1
Cat Liv=9 Sim=0,824
Dog Liv=9 Sim=0,824
2 * 7 8 + 9
Keyword: “Bear” Relaxation: 42%
27 27 I. Bartolini Multimedia Data Management M
Content-based image retrieval (query by example (QBE) paradigm)
? “flower”
Semantic-based image retrieval (e.g., query by keyword paradigm)
+
“I am looking for flower images…”
Combining visual features with tags
28 28 I. Bartolini Multimedia Data Management M
Practical example (1)
Cont Sim=0,628 Cont Sim=0,486
Cont Sim=0,431 Cont Sim=0,392
Cont Sim=0,347 Cont Sim=0,339
Sem Sim=1
Sem Sim=1
Sem Sim=1
Sem Sim=1
Sem Sim=1
Sem Sim=1
Sem Sim=1
keyword:
“Flower” semantic content
Flower
Petunia Orchid … Poppy
Flowering Plant
Seed Plant
…
… Sem Sim=1 Sem Sim=1 Sem Sim=1
29 29 I. Bartolini
Practical example (2)
Flowering Plant
Seed Plant
Vascular Plant
Flower
Petunia Orchid … Poppy
…
…
Ligneous Plant …
Bush …
Rose …
Sem Sim=1 Sem Sim=0,5
livello 3 peso=90
livello 4 peso=83
livello 5 peso=68
livello 6 peso=55
livello 7 peso=42
Sem Sim(“Flower”,”Rose”)=
2*livello(“Vascular Plant”)
livello(“Flower”)+livello(“Rose”)
Keyword:“Flower”
Relaxation=92%
30 30 I. Bartolini Multimedia Data Management M
Integration policies
1
S:1 C:0,628
3
S:1 C:null
4
S:1 C:null
5
S:1 C:null
6
S:1 C:null
7
S:1 C:null
8
S:1 C:null
9
S:0,5 C:null
10
S:0,5 C:null
11
S:null C:0,486
12
S:null C:0,392
Immagine Query
S:1 C:0,431
2
1 1 1 1
1 1 1 1
0,5
0,5
0,628 0,486
0,392 0,431
1) Semantic similarity 2) Content similarity
(Content similarity = null) (Semantic similarity = null)
31 31 I. Bartolini Multimedia Data Management M