Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf ·...
Transcript of Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf ·...
![Page 1: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/1.jpg)
Picture Tags and World Knowledge
learning tag rela4ons from visual seman4c sources
Lexing Xie, Xuming He
Australian Na4onal University NICTA
Oct 2013 @ ACM MM
![Page 2: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/2.jpg)
2
An image is worth 1000 words … Which 1000? How are they related?
#barcelona #4bidabo #montjuïc #catalunya #catalonia #hdr #nikon #nikond90 #d90 #18200mmf3556gvr #clouds #nuvols #cityscape #people #gent #view #sight
hWp://www.flickr.com/photos/bcnbits/3663297224/
![Page 3: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/3.jpg)
Two related goals • Which 1000?
• How are they related?
3
Why we tag [Ames, Naaman, 2007] Events and places from Flickr tags [RaWenbury, Good, Naaman, 2007] ClassTag: Tagà Wikipedia à Wordnet [Overell, Sigurbjornsson, van Zwol 2009] Can all tags be used for search? [Bischoff et al 2008]
Knowledge graph 1.0: [Cyc 1984-‐] [WordNet mid-‐1980s-‐ now] [ConceptNet 2003-‐now] Data-‐driven knowledge graphs: NELL @CMU1010-‐, Google Knowledge Graph, Yahoo! Clickstream Graph Vispedia, Perona 2010-‐
![Page 4: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/4.jpg)
Plan for this talk
• A vision for mul4media knowledge graph
• A novel connec4on across three data sources • Quan4fying visual tags from co-‐ocurrence • A network inference model on tag rela4onships
• Evalua4on of rela4on learning and tagging 4
![Page 5: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/5.jpg)
How to find 1000s of tags?
5
ConceptNet
Millions of photos with tags
14 M images, 21K illustrated synsets 5.1M with Flickr tags, 13,288 synsets
Crowd-‐sourced commonse knowledge 22K Concepts, 450K rela4ons
What can help quan4fy the “visual”-‐ness of tags?
How are tags related?
![Page 6: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/6.jpg)
Mapping tags to words
6
English Dic4onary 80,431 words, 32,606 canonical forms (lemmas)
Flickr Tags ~3 million
21,465 terms in ConceptNet
aardvark abacus abalone abandon abandonment abate abatement abaWoir abbe abbess …
zonked zoo zookeeper zoological zoology zoom zucchini zwieback zydeco zymurgy
20,366 words
7,323 words
![Page 7: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/7.jpg)
Observa4on: Visual Synsets vs Tags
7
13,288 visual synsets
7,323 common
words
Which tags are visually descrip4ve? log(coocurrence) over 5.1M images Y
“chocolate”: 1.67
“travel”: 0.91
p(w)
p(w|t)
![Page 8: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/8.jpg)
8
Food vs Places
Tag frequency Tag frequency
![Page 9: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/9.jpg)
An interac4ve visualiza4on of tag space
9
Tag frequency
Tag inform
a4vene
ss
#photos in synset
Tag% in synsets
hWp://cecs.anu.edu.au/~xlx/proj/tag-‐explore
![Page 10: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/10.jpg)
Talk Roadmap
10
• Goals for MM knowledge graph • A novel connec4on across three data sources • Quan4fying visual tags from co-‐ocurrence • Learning tag rela4onships – The Inverse Concept Rank Model – How good are the learned rela4ons?
• Image tagging evalua4ons
![Page 11: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/11.jpg)
Which tags are related?
aquarium, shark aquarium, hammerhead aquarium, georgia aquarium, fish aquarium, atlanta atlanta, shark atlanta, georgia atlanta, hammerhead shark, underwater fish, water …
n01494475 hammerhead.n.03
11
![Page 12: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/12.jpg)
A random walk model for tagging
12
apple
market
green
red leaves
basket
tomato
[Griffiths et al. “Google and the mind”, 2004] [AbboW, Austerweil and Griffiths, NIPS2012]
![Page 13: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/13.jpg)
A model for a forward random walk
G W
Personalized pagerank vector
Pagerank vector
a.k.a Google v1.0
13
![Page 14: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/14.jpg)
PageRank for photo tags
14
G W Z
…
Random walk with restart
Personalized pagerank vectors
![Page 15: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/15.jpg)
Network Inference for Tag Rela4ons
15
G W Z
…
Observe Z, infer G
![Page 16: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/16.jpg)
Loss func4on between model and observa4on
The Inverse Concept Rank Problem G Z
Normalized co-‐occurrence
Concept rela4ons ICR
Prior knowledge
Regularizer: prefer sparse G
![Page 17: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/17.jpg)
Evalua4on: Learning New Rela4ons
ConceptNet4: 51K rela4ons
ConceptNet 5 69K rela4ons
Ini4alize
evalua4on target
17
![Page 18: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/18.jpg)
Which tags are related?
aquarium, shark aquarium, hammerhead aquarium, georgia aquarium, fish aquarium, atlanta atlanta, shark atlanta, georgia atlanta, hammerhead shark, underwater fish, water …
n01494475 hammerhead.n.03
Already in ConceptNet Learned by ICR Supressed by ICR
18
![Page 19: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/19.jpg)
“MatchBox” model for picture tagging
19
Image feature
Tag feature
~ R XT UT V Y Tag latent factor Image latent
factor
• R par4ally known: Matrix comple4on • R unknown : low-‐rank linear classifiers • Y = I , low-‐rank linear classifiers
![Page 20: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/20.jpg)
Picture tagging evalua4on
20
• NUS-‐Wide Flickr dataset – 279K total, 63/81 tags found in ImageNet
• 177-‐D objectbank features • Tes4ng:
– 1K images with >= 5 tags – fullset ~100K+ images, ~1.2 tags each
![Page 21: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/21.jpg)
Top returns for a few ‘unknown’ tags
21
![Page 22: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/22.jpg)
Using ICR in picture tagging
22
0.0715
0.072
0.0725
0.073
0.0735
0.074
0.0745
0.075
0.0755
0.076
mAP-‐all
0.0126
0.0128
0.013
0.0132
0.0134
0.0136
0.0138
0.014
0.0142
mAP-‐boWom43
cooc
CN5 *ICR
![Page 23: Picture(Tags(and(World(Knowledge(users.cecs.anu.edu.au/~xlx/proj/tagnet/mm2013-tagnet.pdf · Picture(Tags(and(World(Knowledge((learning(tag(relaons(from(visual(seman4c(sources ( Lexing(Xie,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecb6807c757de52494be22d/html5/thumbnails/23.jpg)
Summary • Sta4s4cal analysis reveals photo tag proper4es • Inverse random walk algorithm for learning rela4ons • hWp://cecs.anu.edu.au/~xlx/proj/tagnet
• Also ask me about:
– Mul4media-‐Hard problems [w. Ayman + Snoek] – Social Affinity Filtering: Rich Features > Models [COSN’13]
hWps://code.google.com/p/social-‐recommenda4on/
– CSS2013: Computa4onal Social Science, Founda4ons and Fron4ers hWp://cecs.anu.edu.au/~xlx/teaching/css2013/
– Analyzing Images in Microblogs
23 Get in touch: [email protected] hWp://cecs.anu.edu.au/~xlx/