Multimedia Data Management M - unibo.it

ALMA MATER STUDIORUM - UNIVERSITÀ DI BOLOGNA

Multimedia Data Management M

Second cycle degree programme (LM) in Computer Engineering University of Bologna

Semantic Multimedia

Data Annotation Home page: http://www-db.disi.unibo.it/courses/MDM/ Electronic version: 6.01.MultimediaDataAnnotation.pdf Electronic version: 6.01.MultimediaDataAnnotation-2p.pdf

I. Bartolini

Semantic gap in MM data retrieval Automatic annotation of MM data Imagination case study

Outline

2 2 I. Bartolini Multimedia Data Management M

Till now we have mostly focused on efficiency aspects i.e., “How to efficiently execute a MM query?”

It is now time to also consider the effectiveness of the MM data retrieval process, which includes everything related to the user expectation! Effectiveness in term of: quality of result objects availability of simple but powerful tools, able to smooth the processes of

query formulation/personalization result interpretation

Back to our main problem: MM data annotation


The semantic gap problem Characterizing the object content by means of low level features (e.g.,

color, texture, and shape of an image) represents a completely automatic solution to MM data retrieval However low level feature are not always able to properly characterize the

semantic content of objects e.g., two images should be considered “similar” even if their semantic content is

completely different

This is due to the semantic gap existing between the user subjective notion of similarity and the one according to which a low level features-based retrieval system evaluate two objects to be similar prevents to reach 100% precision results


Possible solution (Semi-)automatically provide a semantic characterization (e.g., by means

of keywords or tags) for each object able to capture its content e.g., ([sky, cheetah] vs. [sky, eagle])

Combine visual features with tags by taking the best of the two approaches [LSD+06, LZL+07, DJL+08]

[sky, cheetah] [sky, eagle]


Automatically infer semantics to MM objects

Automatic objects annotation requires user intervention 1) Relevant feedback Exploiting user feedback to understand which are real relevant objects

to the query… will see how very soon 2) Learning The system is trained by means of a set of objects that are manually

annotated by the user (training phase) Exploiting the training set, the system is able to predict labels for

uncaptioned objects: the test object is compared to training objects; labels associated to the “best” objects are proposed for labeling (labeling & testing phases)

user

? sky, rock, ground, desert, Monument

Valley

DB images

6 6 I. Bartolini

Learning approaches

Classification MM object annotation as a problem of object classification (e.g., using

Bayesian networks, SVM, K-means clustering etc.) in which each label is treated as a distinct class

“Given the feature vector of a new object I, which is the probability that I belongs to class w?”

Graph-based MM object annotation as a problem of graph exploration !A graph is a representation of a set of objects connected by links. The

interconnected objects are represented by vertices/nodes, and the links that connect vertices are called edges Nodes represents images and its attributes (e.g., corresponding regions, associated

labels, etc.) Edges are node-attribute relations

“Starting the navigation from the node representing the new image I, which is the probability to cross image node w?”


Automatically infer semantics to MM objects is still an open challenge Image annotation techniques differ in: 1. What “annotation” means

Enriching images with a set of tags/labels Providing a rich semantic description through the concepts of a

full-fledged RDF ontology !An ontology is a formal framework for representing knowledge; it names and

defines the types, properties, and interrelationships of the entities in a specific domain entities are conceptualizations of phenomena

!Resource Description Framework (RDF) is theW3C reference language for ontologies Based on triples “(subject–predicate–object): the subject denotes the

resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object

2. What kind of tags/concepts are provided General-purpose system System tailored to discover only specific concepts/classes

Peculiarities on semantic annotation (1)


Peculiarities on semantic annotation (2) 3. Annotation “granularity” Single keyword

e.g., for images, at image level Multiple keywords

e.g., for images, at image level or at region level


Image annotation problem How current commercial systems tackle the problem: Image search extensions of Google and Yahoo consider the original Web

context, e.g.: file name title surrounding text

to support keyword-based search Microsoft’s Photo Gallery, Google Picasa, and Yahoo’s Flickr rely on

user-provided tags or labels Apple iPhoto uses meta-data and user provided annotations

Google similar images labs allows users to search for images using

pictures rather than words (i.e., to find other images that look like the selected one)


Imagination case study [BC08a]

Imagination: IMAGe (semI-)automatic anNotATION Images as set of regions

Labels are tags which are associated at the image level Graph-based approach (à la Page Rank) 3-level of graph objects

Images Regions with low level features (i.e., color and texture) Tags assigned to images

plus K-NN links computed on region similarities

“Given a new image provide tags that are affine to the image and semantically correlated to each other”


DB images

I1 I2 I3

Intuitive example

regions

Iq

new image

deer, grass, bush

bear, rock, grass, water, ground

rock, bear, grass, water, ground

tags ?, …, ?


13

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16

T4 T5 T6 T7 T1 T2 T3

I1 I2 I3

deer grass bush bear rock water ground

Iq

Mixed Media Graph (MMG) construction GMMG

new image

DB images

deer, grass, bush bear, rock, grass, water, ground

rock, bear, grass, water ground 13 13 I. Bartolini

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15

T4 T5 T6 T7 T1 T3

I3

deer bear rock water ground

Random Walk with Restart (RWR) [PYF+04]

restart!

Iq

R16 R8

I2

GMMG

I1

restart at the query node (with probability p) randomly walk to one link (with probability 1-p) For each tag node a relative frequency (i.e., the affinity) is computed approximating the steady state probability

T2

grass bush


Why tags correlation?

MMG + RWR heavily relies of NN edges involving the new image (i.e., low level features) If a region of the new image is highly similar to a region of GMMG, which

however has some terms unrelated, this might easily lead to have such tags highly scored! uncorrelated tags, or even contradictory

Tags correlation modeled as “co-occurrence of pairs of tags” within the DB images e.g., PSimRank algorithm (Fogaras et al. 2005)

new image

MMG + RWR

grass deer sheep horse cow ground

predicted tags


Analyzing correlations of tags

Link analysis on a sub-graph of GMMG to find highly-correlated tags bipartite graph GT second-order bipartite graph GT

2

T4 T5 T6 T7 T1 T2 T3

I1 I2 I3

deer grass bush bear rock water ground

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13

GMMG

GT


Intuitive example

T4

T5

T1

T2

T3

I1

I2

tiger

water

grass

rock

bear

GT

(water, tiger)

(grass, tiger)

(grass, water)

(grass, grass)

(rock, tiger)

(rock, water)

(bear, tiger)

(rock, rock)

(rock, grass)

(bear, water)

(bear, grass)

(bear, rock)

(I1, I2)

(I1, I1)

(I2, I2)

(grass, grass)

(rock, water)

G2T

“an edge between nodes (Ii,Ij) and (Tr,Ts) is added iff the two edges (Ii,Tr) and (Ij,Ts) equivalently, (Ii,Ts) and (Ij,Tr)) are in GT”

node u

edges E(u)


PSimRank algorithm (Fogaras et al. 2005)

(water, tiger)

(grass, tiger)

(grass, water)

(grass, grass)

(rock, tiger)

(rock, water)

(bear, tiger)

(rock, rock)

(rock, grass)

(bear, water)

(bear, grass)

(bear, rock)

(I1, I2)

(I1, I1)

(I2, I2)

(grass, grass)

(rock, water)

G2T

A similarity score is computed for each tags node of G2

T “two nodes are similar if they are referenced

by similar nodes” two tags are similar if they are present in similar

images two images are similar if they contain similar

tags The process is independent from the query

node off-line

Tags

Tags

Similarity [0,1]

edges E(u)

node u 18 18 I. Bartolini Multimedia Data Management M

Putting it all together PT tags with highest steady state probability returned by MMG + RWR

step are reduced considering tags correlation We model the problem as an instance of the Maximum Weight Clique

Problem [BBP+99]

An edge is added between two nodes if their correlation score exceeds a threshold c


Imagination user interface


Predicted tags


How to compare two images based on its annotation (i.e., its tags)? Exact text matching

simple not flexible

Semantic-based image retrieval (1)

flower

petunia

“flower”

Query:

…

…


Is it possible to perform better? Yes! Exploiting the semantic relations of keywords belonging to a preexistent

taxonomy or ontology (e.g., WordNet [Mil95]) and applying a “fuzzy” text matching still simple more flexible more accurate

Semantic-based image retrieval (2)

flower petunia Flower

Petunia Orchid … Poppy

Flowering Plant

Seed Plant

…

…

“Petunia is a flower!”

“flower” Query:

… …


Semantic relations Among possible semantic relations hyponymy/hypernymy (also indicated with subset/superset, or the ISA

relation) Define hierarchy

…

mammal

bear feline canine

brown bear black bear cat dog fox

“A bear is a mammal”


Semantic similarity Problem: Quantify the similarity between two terms of the hierarchy

(e.g., brown bear and feline)

Sim (t1,t2) = 2 * level(common-father)

level(t1) + level(t2)

common-father

t3 t2

t1

level(com-father)

level(t2)

level(t1)

mammal

bear feline canine

brown bear black bear cat dog fox

The similarity between a term of the hierarchy and terms belonging its sub-tree is equal to one (sim=1)


Semantic relaxation

What happens if I am looking for “bear” images but neither “bear” images nor “specialized term of bear” images are present in the DB? To avoid empty semantic result, semantic relaxation is

defined A weight for each level of the hierarchy (starting from the query term)

represents the percentage of semantic relaxation the user is willing to accept

weights

ROOT

animal

mammal

carnivore

bear

W8=42

W7=55

W6=68

W0=100


Example

Carnivore Liv=7 Sim=0,933

Bear Liv=8

Feline Liv=8 Sim=0,875

Canine Liv=8 Sim=0,875

Brown bear Liv=9 Sim=1

Black bear Liv=9 Sim=1

Cat Liv=9 Sim=0,824

Dog Liv=9 Sim=0,824

2 * 7 8 + 9

Keyword: “Bear” Relaxation: 42%


Content-based image retrieval (query by example (QBE) paradigm)

? “flower”

Semantic-based image retrieval (e.g., query by keyword paradigm)

+

“I am looking for flower images…”

Combining visual features with tags


Practical example (1)

Cont Sim=0,628 Cont Sim=0,486



Sem Sim=1

Sem Sim=1

Sem Sim=1

Sem Sim=1

Sem Sim=1

Sem Sim=1

Sem Sim=1

keyword:

“Flower” semantic content

Flower


Flowering Plant

Seed Plant

…

… Sem Sim=1 Sem Sim=1 Sem Sim=1

29 29 I. Bartolini

Practical example (2)

Flowering Plant

Seed Plant

Vascular Plant

Flower


…

…

Ligneous Plant …

Bush …

Rose …

Sem Sim=1 Sem Sim=0,5

livello 3 peso=90

livello 4 peso=83

livello 5 peso=68

livello 6 peso=55

livello 7 peso=42

Sem Sim(“Flower”,”Rose”)=

2*livello(“Vascular Plant”)

livello(“Flower”)+livello(“Rose”)

Keyword:“Flower”

Relaxation=92%


Integration policies

1

S:1 C:0,628

3

S:1 C:null

4

S:1 C:null

5

S:1 C:null

6

S:1 C:null

7

S:1 C:null

8

S:1 C:null

9

S:0,5 C:null

10

S:0,5 C:null

11

S:null C:0,486

12

S:null C:0,392

Immagine Query

S:1 C:0,431

2

1 1 1 1

1 1 1 1

0,5

0,5

0,628 0,486

0,392 0,431

1) Semantic similarity 2) Content similarity

(Content similarity = null) (Semantic similarity = null)


“petunia” query


Visual result for “petunia” query


Multimedia Data Management M - unibo.it

Documents

Transcript of Multimedia Data Management M - unibo.it