2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval.

Flickr Distance

ACM Multimedia 2008

Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li

Microsoft Research AsiaUniversity of Science and Technology of China

October 28, 2008

2

IndexingRankingClusterin

g……Recommendation

Annotation

Multimedia

Information

Retrieval

Multimedia

Information

Retrieval

3

Image Similarity

/Distance

Concept Similarity

/Distance

Annotation

Indexing

Ranking

Clustering

……

Recommendation

4

Image Similarity

/Distance

Concept Similarity

/Distance

Image Similarity/Distance

5


Numerous efforts have been made.

Concept Similarity

/Distance

Concept Similarity/Distance

http://www.cnzozo.com/updata/allimg/200802/1_21083536.jpg



6

Concept Similarity/Distance

Olympic

Numerous efforts have been made.

Sports

Cat

Tiger

Paw

More and more used, but not well studied.



7

WordNet Distance

Google Distance

Tag Concurrence Distance

WordNet Distance

8

WordNet150,000 words

WordNet DistanceQuite a few methods to get it in WordNetBasic idea is to measure the length of the path between two words

Pros and ConsPros:

Cons:

Built by human experts, so close to human perception

Coverage is limited and difficult to extend

http://www.princeton.edu/main/

Google Distance

9

Normalized Google Distance (NGD)Reflects the concurrency of two words in Web documentsDefined as

Pros and ConsPros:Cons:

Easy to get and huge coverage

Only reflects concurrency in textual documents. Not really concept distance (semantic relationship)

http://www.english.uva.nl/

http://www.english.uva.nl/

http://www.uva.nl/

10

Concept Pairs

Google Distance

Airplane – Dog 0.2562

Football – Soccer 0.1905

Horse – Donkey 0.2147

Airplane – Airport 0.3094

Car – Wheel 0.3146


11

Reflects the frequency of two tags occur in the same imagesBased on the same idea of NGDMostly is sparse (> 95% are zero in the similarity matrix)

Pros and ConsPros:Cons:

Images are taken into accounta)Tags are sparse so visual

concurrency is not well reflected

b)Training data is difficult to get

similarity matrix: 500 tagssimilarity matrix: 50 tags

Image Tag Concurrence Distance (Qi, Hua,

et al. ACMMM07)

12


0.8532

0.1739

0.4513

0.1833

0.9617

Concept Pairs

Google Distance

Airplane – Dog 0.2562

Football – Soccer 0.1905

Horse – Donkey 0.2147

Airplane – Airport 0.3094

Car – Wheel 0.3146

Different Concept Relationships

13

Synonymydifferent words but the same

meaning

table tennis

ping-pong—

Visually Similarsimilar things or things of same

type

horse donkey

—

Meronymypart and the whole

car wheel—

Concurrencyexist at the

same scene/place

airplane

airport

—

http://images.google.com/imgres?imgurl=http://wallpapers.boolsite.net/srv32/Images/Wallpapers/Sport_Jeux/Sport_PingPong.jpg&imgrefurl=http://www.hd911.com/archives/40&h=768&w=1024&sz=65&hl=en&start=1&usg=__jiC2O8rN0WFnlZ0r0gxEy7HmR7Q=&tbnid=yRjfS55QaMLhwM:&tbnh=113&tbnw=150&prev=/images?q=ping-pong&gbv=2&hl=en&newwindow=1&safe=off

http://images.google.com/imgres?imgurl=http://www.robbinstabletennis.com/minitt.jpg&imgrefurl=http://www.robbinstabletennis.com/minitab.htm&h=310&w=335&sz=9&hl=en&start=4&usg=__dgi5VKXUxgGUMsWI61ilKsJcN0A=&tbnid=jSoBOjO77etTQM:&tbnh=110&tbnw=119&prev=/images?q=ping-pong&gbv=2&hl=en&newwindow=1&safe=off



http://images.google.com/imgres?imgurl=http://zedomax.com/blog/wp-content/uploads/2007/03/wow_airplane.jpg&imgrefurl=http://zedomax.com/blog/2007/03/14/big-airplane/&h=460&w=690&sz=198&hl=en&start=1&usg=__cYfe2scTKa_Kw6ATQ1QgBFs65go=&tbnid=iovs6qlfOZH3rM:&tbnh=93&tbnw=139&prev=/images?q=airplane&gbv=2&hl=en&newwindow=1&safe=off

http://images.google.com/imgres?imgurl=http://www.imagereferencedatabase.com/myadmin/_files/photogallery/af829_army_soldiers_acu_walking_towards_commercial_airplane_07.jpg&imgrefurl=http://www.imagereferencedatabase.com/myadmin/photogallery/photo-41-41-2393-0-Yes.html&h=960&w=1280&sz=104&hl=en&start=86&usg=__9F6LZpJoD3ag9vQM07SPJRv2SUw=&tbnid=He6tsMScfvcLNM:&tbnh=113&tbnw=150&prev=/images?q=airplane&start=72&gbv=2&ndsp=18&hl=en&newwindow=1&safe=off&sa=N

http://images.google.com/imgres?imgurl=http://www.cardcow.com/images/hertz-car-rental-airplane-transportation-aircraft-29015.jpg&imgrefurl=http://www.cardcow.com/29015/hertz-car-rental-airplane-transportation-aircraft/&h=428&w=600&sz=60&hl=en&start=16&usg=__CbF1DksOcpFH5JWvRTrl1n4xjqM=&tbnid=r6Vrde4bJ5kMUM:&tbnh=96&tbnw=135&prev=/images?q=airplane&gbv=2&ndsp=18&hl=en&newwindow=1&safe=off&sa=N

14

Image tag concurrence distance implicitly uses image information, but tags are too sparse

Google distance’s coverage is very high, but it is for text domain

Con

cep

t D

ista

nce

WordNet distance is good, but coverage is too low

Mine from ontology

Mine from text documents

Mine from image tags

15

Can we mine concept distance

from image content?

Some Facts

16

Semantic concept distance is based on human’s cognition

80% of human cognition comes from visual information

There are around 2.8 billion photos on Flickr (by Sep 08)

In average each Flickr image has around 8 tags

To mine concept distance from a large tagged

image collection based on image content

bear, fur, grass, tree polar bear, water, sea polar bear, fighting, usa

http://www.flickr.com/photos/72833233@N00/471179761/

http://www.flickr.com/photos/tut99/302629092/

Overview of Flickr Distance

17

Concept A: Airplane

Concept B: Airport

Concept Model A

Concept Model B

Flickr Distance (A, B)


http://images.google.com/imgres?imgurl=http://www.wisebread.com/files/fruganomics/imagecache/blog_image_full/files/fruganomics/blog-images/airplane.JPG&imgrefurl=http://www.wisebread.com/hitching-a-ride-on-an-airplane&h=320&w=400&sz=8&hl=en&start=2&sig2=IJ4dtthYeb_EcROCwajN5Q&usg=__HQTMPGSjnVQfXEGnTdNRxGdIJ7Y=&tbnid=TOXbYjrEkRQ9VM:&tbnh=99&tbnw=124&ei=xlb4SO7cMZCktQPg6Y28DQ&prev=/images?q=airplane&gbv=2&hl=en&newwindow=1&safe=off

http://images.google.com/imgres?imgurl=http://www.skycontrol.net/UserFiles/Image/BusinessGA_img/200611/200611boeing-green-airplane.jpg&imgrefurl=http://www.skycontrol.net/business-general-aviation/boeing-business-jets-delivers-its-100th-green-airplane/&h=417&w=625&sz=58&hl=en&start=5&sig2=UgCUprpMDm4PLLIfEABHZg&usg=__PZ0v1zo9ZTdZ-oE0jeZF03eczEk=&tbnid=L6RaZQRo5beVHM:&tbnh=91&tbnw=136&ei=xlb4SO7cMZCktQPg6Y28DQ&prev=/images?q=airplane&gbv=2&hl=en&newwindow=1&safe=off

http://www.flickr.com/photos/yonemore2/2949146598/

http://images.google.com/imgres?imgurl=http://www.cardcow.com/images/hertz-car-rental-airplane-transportation-aircraft-29015.jpg&imgrefurl=http://www.cardcow.com/29015/hertz-car-rental-airplane-transportation-aircraft/&h=428&w=600&sz=60&hl=en&start=20&sig2=TOveIB3CEyEtiMZasm9xKA&usg=__lrDESQ1SmkLOOfundq-wXwX1aW4=&tbnid=r6Vrde4bJ5kMUM:&tbnh=96&tbnw=135&ei=xlb4SO7cMZCktQPg6Y28DQ&prev=/images?q=airplane&gbv=2&hl=en&newwindow=1&safe=off

http://images.google.com/imgres?imgurl=http://www.hickerphoto.com/data/media/186/nice-airport-airplane_12202.jpg&imgrefurl=http://www.hickerphoto.com/nice-airport-airplane-12202-pictures.htm&h=311&w=468&sz=17&hl=en&start=18&sig2=WJ5AOdb1s-GsD9U85Qb0qg&usg=__PxHw9mAso1M6hpmxjMcjI8mx6eY=&tbnid=IAvjcJXp79y4TM:&tbnh=85&tbnw=128&ei=xlb4SO7cMZCktQPg6Y28DQ&prev=/images?q=airplane&gbv=2&hl=en&newwindow=1&safe=off

http://images.google.com/imgres?imgurl=http://static.howstuffworks.com/gif/airport-denver-photo20b.jpg&imgrefurl=http://www.howstuffworks.com/airport.htm&h=318&w=400&sz=27&hl=en&start=1&sig2=Krt7cwILUJNimPMjtoJODw&usg=__y_uMtxWTTFPtVGsENHhxZSK_ies=&tbnid=vAQDGTQrOIOytM:&tbnh=99&tbnw=124&ei=r1f4SKT4CYy6sAOp9YirDQ&prev=/images?q=airport&gbv=2&hl=en&newwindow=1&safe=off

http://images.google.com/imgres?imgurl=http://www.freefoto.com/images/2052/01/2052_01_1---Manchester-International-Airport-_web.jpg&imgrefurl=http://www.freefoto.com/preview/2052-01-1?ffid=2052-01-1&h=400&w=600&sz=150&hl=en&start=3&sig2=FmUHbLWssrn6KKtFWjrfsQ&usg=__D503ONiIXdOyt0RzCLW1HETIrQM=&tbnid=gXTKG08V7uLrmM:&tbnh=90&tbnw=135&ei=r1f4SKT4CYy6sAOp9YirDQ&prev=/images?q=airport&gbv=2&hl=en&newwindow=1&safe=off

http://www.flickr.com/photos/rayearth/2949122250/


http://images.google.com/imgres?imgurl=http://www.jeffcitymo.org/cd/airport/images/Airport5-02planes.JPG&imgrefurl=http://www.jeffcitymo.org/cd/airport/airport.html&h=469&w=700&sz=30&hl=en&start=6&sig2=Kcq0UKGrAPPZYXR6SxyekQ&usg=__ESxofXVp8UN-Q95bCfqMipeKeps=&tbnid=gVBCeHGK4iZRhM:&tbnh=94&tbnw=140&ei=r1f4SKT4CYy6sAOp9YirDQ&prev=/images?q=airport&gbv=2&hl=en&newwindow=1&safe=off

http://images.google.com/imgres?imgurl=http://www.2theairport.com/airport.jpg&imgrefurl=http://www.2theairport.com/pages/airportarrival.htm&h=341&w=510&sz=24&hl=en&start=9&sig2=mR0VJxuE7d8V4iwfUqGDMw&usg=__1TAiS1IowTo1ZMFVWhIzWllYXas=&tbnid=dw7D0qzWtlqvUM:&tbnh=88&tbnw=131&ei=r1f4SKT4CYy6sAOp9YirDQ&prev=/images?q=airport&gbv=2&hl=en&newwindow=1&safe=off

Flickr Distance

0.5151

0.0315

0.4231

0.0576

0.0708

18

Flickr Distance is able to cover the four different semantic relationshipsSynonymy, Visually Similar, Meronymy, and Concurrency

What We Need

19

R1: A Good Image CollectionLargeHigh coverage, especially on daily lifeWith tags

What We Need

20

R2: A Good Concept Representation or ModelBased on image contentCan cover wider concept relationshipsCan handle large-concept set

SVM, Boosting, …Discriminati

veGenerative

Global FeatureLocal

Featurew/o Spatial

Relationw/ Spatial Relation

Bag-of-Words (pLSA, LDA), …

2D HMM, MRF, …

Concept Models

What We Need

21

SVM, Boosting, …Discriminati

veGenerative

Global FeatureLocal

Featurew/o Spatial

Relationw/ Spatial Relation

Bag-of-Words, …2D HMM, MRF, …

Concept Models

VLM – Visual Language Model Spatial-relation sensitive

Efficient Can handle object variations

Statistical Language Model

22

I am talking about statisticallanguagemodel.

Unigram Model

Bigram Model

Trigram Model

xnx wPwwwwP 21

121 xxnx wwpwwwwP

2121 xxxnx wwwPwwwwP

Visual Language Model (VLM)

23

Unigram Model

Bigram Model

Trigram Model

xymnxy wPwwwwP 1211

yxxymnxy wwPwwwwP ,11211

1,,11211 yxyxxymnxy wwwpwwwwP

1 0 0 1 0 0 1 0

Image Patch

Patch GradientTexture Histogram Hashing Visual Word

Visual Word Generation


Comparison on Image CategorizationCaltech 8 categories / 5097 images

pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM0

20

40

60

80

100

59 64

88 90 90

Accuracy (%)

Performance of VLM

24pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM

0.00

0.50

1.00

1.50

2.00

2.50

3.00

1.11

2.44

0.44

0.840000000000001

0.14

Training Time (sec/image)

Latent-Topic VLM (1)

25

Why Latent-Topic

Latent-Topic VLMVisual variations of concept are taken as latent topics

Cconceptoftopiclatentkthez

Cconceptinimagejthed

conceptAC

dzPzwwwPdwwwP

thCk

thCj

K

k

Cj

Ck

Ckyxyxxy

Cjyxyxxy

:

:

:

,,1

1,,11,,1

http://images.google.com/imgres?imgurl=http://www.mrcad.com/download/apple-fruit-3d-model.jpg&imgrefurl=http://www.mrcad.com/&h=480&w=640&sz=11&hl=en&start=1&sig2=3SD5mjDq5SaFncx7YFpafQ&usg=__Tvn45mh6t0SiVLIbQK6eAKbbXD8=&tbnid=s5INNA1FciPAwM:&tbnh=103&tbnw=137&ei=Bd4CScn_Go3ENKyUyOYM&prev=/images?q=apple+fruite&gbv=2&hl=en&newwindow=1&safe=off

Latent-Topic VLM (2)

26

Latent-Topic VLM TrainingSolved by EM algorithm, The objective function is to maximize the joint distribution of concept and its visual word arrangement Aw

Cd yx

Cjyxyxxy

w

Cj

dwwwP

CApmaximize

,1,,1 ,

,

＝

Estimate the posteriors of the hidden topics

Maximize the likelihood of visual arrangement

Performance of LT-VLM

27


pLSA (BOW)

LDA (BOW) 2D MHMM SVM VLM LT-VLM0

20

40

60

80

100

59 64

88 90 90 94

Accuracy (%)

pLSA (BOW)

LDA (BOW)

2D MHMM SVM VLM LT-VLM0.00

1.00

2.00

3.00

1.11

2.44

0.44

0.8400000000000

01

0.14 0.24


Flickr Distance

28

Kullback – Leibler (KL) divergenceGood, but not symmetric

Jensen –Shannon (JS) divergenceBetter, as it is symmetricAnd, square root of JS divergence is a metric, so is Flickr Distance

K

i

K

j zzJSCj

CiFlickr C

jCiPPDCzPCzPCCD

1 1 2121 )|()|()|(),( 2121

l Z

Z

ZZZKL lP

lPlPPPD

Cj

Ci

Ci

CCi

2

1

121 log)(

2)(

2

1)(

2

1)(

11

2121

Ci

Ci

Cj

Ci

CCi

ZZ

ZKLZKLZZJS

PPM

MPDMPDPPD

topic distance

topic distance

concept distance

Procedure of Flickr Distance

29

Concept A: Airplane

Concept B: Airport

Concept Model A

Concept Model B

Flickr Distance (A, B)

Tag search in

Flickr

Jensen-Shannon

Divergence

LT-VLM


http://images.google.com/imgres?imgurl=http://www.wisebread.com/files/fruganomics/imagecache/blog_image_full/files/fruganomics/blog-images/airplane.JPG&imgrefurl=http://www.wisebread.com/hitching-a-ride-on-an-airplane&h=320&w=400&sz=8&hl=en&start=2&sig2=IJ4dtthYeb_EcROCwajN5Q&usg=__HQTMPGSjnVQfXEGnTdNRxGdIJ7Y=&tbnid=TOXbYjrEkRQ9VM:&tbnh=99&tbnw=124&ei=xlb4SO7cMZCktQPg6Y28DQ&prev=/images?q=airplane&gbv=2&hl=en&newwindow=1&safe=off

http://images.google.com/imgres?imgurl=http://www.skycontrol.net/UserFiles/Image/BusinessGA_img/200611/200611boeing-green-airplane.jpg&imgrefurl=http://www.skycontrol.net/business-general-aviation/boeing-business-jets-delivers-its-100th-green-airplane/&h=417&w=625&sz=58&hl=en&start=5&sig2=UgCUprpMDm4PLLIfEABHZg&usg=__PZ0v1zo9ZTdZ-oE0jeZF03eczEk=&tbnid=L6RaZQRo5beVHM:&tbnh=91&tbnw=136&ei=xlb4SO7cMZCktQPg6Y28DQ&prev=/images?q=airplane&gbv=2&hl=en&newwindow=1&safe=off

http://www.flickr.com/photos/yonemore2/2949146598/


http://images.google.com/imgres?imgurl=http://www.hickerphoto.com/data/media/186/nice-airport-airplane_12202.jpg&imgrefurl=http://www.hickerphoto.com/nice-airport-airplane-12202-pictures.htm&h=311&w=468&sz=17&hl=en&start=18&sig2=WJ5AOdb1s-GsD9U85Qb0qg&usg=__PxHw9mAso1M6hpmxjMcjI8mx6eY=&tbnid=IAvjcJXp79y4TM:&tbnh=85&tbnw=128&ei=xlb4SO7cMZCktQPg6Y28DQ&prev=/images?q=airplane&gbv=2&hl=en&newwindow=1&safe=off

http://images.google.com/imgres?imgurl=http://static.howstuffworks.com/gif/airport-denver-photo20b.jpg&imgrefurl=http://www.howstuffworks.com/airport.htm&h=318&w=400&sz=27&hl=en&start=1&sig2=Krt7cwILUJNimPMjtoJODw&usg=__y_uMtxWTTFPtVGsENHhxZSK_ies=&tbnid=vAQDGTQrOIOytM:&tbnh=99&tbnw=124&ei=r1f4SKT4CYy6sAOp9YirDQ&prev=/images?q=airport&gbv=2&hl=en&newwindow=1&safe=off

http://images.google.com/imgres?imgurl=http://www.freefoto.com/images/2052/01/2052_01_1---Manchester-International-Airport-_web.jpg&imgrefurl=http://www.freefoto.com/preview/2052-01-1?ffid=2052-01-1&h=400&w=600&sz=150&hl=en&start=3&sig2=FmUHbLWssrn6KKtFWjrfsQ&usg=__D503ONiIXdOyt0RzCLW1HETIrQM=&tbnid=gXTKG08V7uLrmM:&tbnh=90&tbnw=135&ei=r1f4SKT4CYy6sAOp9YirDQ&prev=/images?q=airport&gbv=2&hl=en&newwindow=1&safe=off



http://images.google.com/imgres?imgurl=http://www.jeffcitymo.org/cd/airport/images/Airport5-02planes.JPG&imgrefurl=http://www.jeffcitymo.org/cd/airport/airport.html&h=469&w=700&sz=30&hl=en&start=6&sig2=Kcq0UKGrAPPZYXR6SxyekQ&usg=__ESxofXVp8UN-Q95bCfqMipeKeps=&tbnid=gVBCeHGK4iZRhM:&tbnh=94&tbnw=140&ei=r1f4SKT4CYy6sAOp9YirDQ&prev=/images?q=airport&gbv=2&hl=en&newwindow=1&safe=off

http://images.google.com/imgres?imgurl=http://www.2theairport.com/airport.jpg&imgrefurl=http://www.2theairport.com/pages/airportarrival.htm&h=341&w=510&sz=24&hl=en&start=9&sig2=mR0VJxuE7d8V4iwfUqGDMw&usg=__1TAiS1IowTo1ZMFVWhIzWllYXas=&tbnid=dw7D0qzWtlqvUM:&tbnh=88&tbnw=131&ei=r1f4SKT4CYy6sAOp9YirDQ&prev=/images?q=airport&gbv=2&hl=en&newwindow=1&safe=off

Experiments

30

EvaluationObjective evaluationSubjective evaluation

ApplicationsConcept clusteringImage annotationTag recommendation

Experiments - Configurations

31

Images6,400,000 from Flickr

Concepts130,000,000 different tags10,000,000 filtered tags1,000 randomly-selected tags

ComparisonNormalized Google Distance (NGD)Tag Concurrence Distance (TCD)Flickr Distance (FD)

Eva1: Subjective Evaluation

32

Ground-Truth12 persons are asked to score semantic correlation of each concept pairAverage scores are taken as ground-truth

Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all distance pairs D(a,b) and D(c,d)Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth

NGD TCD FD0.470.480.49

0.50.510.520.530.540.550.560.57

Correct Rate

Eva2: Objective Evaluation

33

Ground-TruthWordNet DistanceOnly 497 concepts (overlap of WordNet and the 1000 concepts)

Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all distance pairs D(a,b) and D(c,d)Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth

NGD TCD FD0.45

0.46

0.47

0.48

0.49

0.5

0.51

0.52

0.53

0.54

Correct Rate

App1: Concept Clustering

34

Concept Clustering23 concepts; 3 groups – (1) outer space, (2) animal and (3) sports

Normalized Google Distance Tag Concurrence Distance Flickr Distance

Group1 Group2 Group3 Group 1 Group2 Group3 Group1 Group2 Group3

bearshorsesmoonspace

bowlingdolphindonkeySaturnsharkssnake

softballspidersturtle

Venuswhalewolf

baseballbasketball

footballgolf

soccertennis

volleyball

moonspaceVenuswhale

baseballdonkeysoftball

wolf

basketballbears

bowlingdolphinfootball

golfhorsesSaturnsharkssoccer

spiderstennisturtle

volleyball

moonSaturnspaceVenus

bearsdolphindonkey

golfhorsessharksspiderstenniswhalewolf

baseballbasketball

footballsnakesoccerbowlingsoftball

volleyball

App2: Image Annotation

35

Based on an approach using concept relationDual Cross-Media Relevance Model (DCMRM, J. Liu et al. ACMMM 2007) On 79 concepts / 79,000 images

The number of correctly annotated keywords at the first N words

1 2 3 4

NGD-DCMRM 55 212 212 301

TC-DCMRM 53 186 193 310

FD-DCMRM 57 354 423 960

100300500700900

1100

55

212 212301

53186 193

310

57

354423

960

NGD-DCMRM TC-DCMRM FD-DCMRM

Tota

l n

um

ber

of

corr

ect

keyw

ord

s

App3: Tag Recommendation

36

To Improve Tagging QualityEliminating tag incompletion, noises, and ambiguity500 images / 10 recommended tags per image

NGD Tag Concurrent Distance Flickr Distance0.58

0.6

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.65200000000000

1

0.66500000000000

1

0.75800000000000

1

Precision @ 10

Discussion

37

Why VLM divergence can estimate concept distance?

Why FD works well even tags are not complete?

Computer

TV

Office

room patternscomputer patterns other patterns

room patterns TV patterns other patterns

room patternsscreen patterns other patterns

VLM: distribution of trigrams

http://www.flickr.com/photos/jeffjackson/2534497145/

http://www.flickr.com/photos/laffy4k/43946201/

http://www.flickr.com/photos/stefan_ledwina/2465450062/

http://www.flickr.com/photos/paladin27/78770424/

http://www.flickr.com/photos/arkworld/119232029/

http://www.flickr.com/photos/williamhook/1983337986/

http://www.flickr.com/photos/jessmanea/1495816731/

http://www.flickr.com/photos/marcel-more/133384597/

http://www.flickr.com/photos/4durt/2236463256/

http://www.flickr.com/photos/trancemist/1493362222/


http://www.flickr.com/photos/kparrish/499881861/

38

If we find similar patterns in the images associated with

different concepts,

the corresponding concept relationships can be

discovered.

Computer Office









Summary

39

A novel approach to discover semantic relationships from image contentbased on real-life images from the Webbased on collective intelligence from grassroots

A distance more consistent with human’s perception

A measurement more effective in many applications

Flickr Distance

Future Work

40

Flickr Distance as a Service.

Thank You

41

Backup

42

TagNet

43

TagNet – Visual Concept Net

Can be used in many applicationsKnowledge representationConcept learningMultimedia retrieval...

)(:

)(:

)(:

,,

weightDistanceFlickrWw

edgeiprelationshsemanticEe

nodeconceptVv

WEVG

TagNet

44

VisualizationThe bigger the distance, the longer the edgeUsing a tool called NetDraw provided byInternational Network for

Social Network Analysis

Outline Motivation

Overview

Visual Language Model

Flickr Distance Calculation

Evaluations and Applications

45

Semantic Relationship Is Important

46

Many efforts on using semantic relationshipsGJ Qi et al. Correlative Multi-Label Video Annotation. ACM MM 2007.R. Datta et al. Image Retrieval: Ideas, Influences and the Trends of the New Age. ACM Computing Surveys, 2008.L. Leslie et al. Annotation of Paintings with High-Level Semantic Concepts Using Transductive Inference and Ontology-based Concept Disambiguation. ACM MM 2007.J. Yu et al. Semantic Subspace Projection and Its Application in Image Retrieval. IEEE T CSVT 2008.

Applications of semantic relationshipsNatural language processingObject detectionConcept detectionMultimedia retrieval

Discussion

47

Why VLM divergence can estimate concept distance?

Why FD works well even tags are not complete?

Computer

TV

Office

room patternscomputer patterns other patterns

room patterns TV patterns other patterns

room patternsscreen patterns other patterns

VLM: distribution of trigrams

Flickr Distance is able to cover the four different semantic relationships

Synonymy, Visually Similar, Meronymy, and Concurrency





http://www.flickr.com/photos/arkworld/119232029/

http://www.flickr.com/photos/williamhook/1983337986/

http://www.flickr.com/photos/jessmanea/1495816731/

http://www.flickr.com/photos/marcel-more/133384597/





Text vs. Image

48

Word

Grammar 1-dim dependence Statistical Language Model

Visual word

Visual grammar 2-dim dependence Visual Language Model


Visual Word Generation

49

Typical methodsSIFT + Clustering/PCA

Our methodPatch + Texture Direction Histogram + HashingEfficient, low-dimension, and rotation-Invariant Only need 1/20 computation of SIFT feature

1 0 0 1 0 0 1 0

Image Patch

Patch Gradient

Texture HistogramHashing Visual Word


Performance of VLM

50

Comparison on Image CategorizationCaltech 8 categories / 5097 images (L. Wu, et al. MIR 2007/T-MM 2008)

pLSA (BOW)


20

40

60

80

100

59 64

88 90 90 94

Accuracy (%)

pLSA (BOW)

LDA (BOW)


1.00

2.00

3.00

1.11

2.44

0.44

0.8400000000000

01

0.14 0.24


Eva1: Objective Evaluation

51

Ground-TruthWordNet DistanceOnly 497 concepts (overlap of WordNet and the 1000 concepts)

Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all concept triples (A,B,C)Step 2: Get 6 distance pairs for each triple (consider asymmetry)Step 3: Compute the correct ratio of each distance pair in terms of order (not value), compared with ground-truth distance

pair

NGD Ground-TruthC

A

B C

A

B

(AB,AC) x(AB, BC) √(AC, BC) √

Performance of VLM

52


pLSA (BOW)


20

40

60

80

100

59 64

88 90 90 94

Accuracy (%)

pLSA (BOW)

LDA (BOW)


1.00

2.00

3.00

1.11

2.44

0.44

0.8400000000000

01

0.14 0.24


Future Work

53

ScalabilityLarge-scale testingTagNet as a service

Other data“PicNet Distance” based on different dataset / Optimizing datasetIntegrating text/tag concurrency distance and Flickr Distance

Concept modelingHandling scale variations (multiple-resolution)New models

More applicationsTag rankingQuery suggestions

2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval.

Documents

Transcript of 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval.