Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures
Transcript of Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures
![Page 1: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/1.jpg)
Annotating a Foreign Language Lexical
Resource with Pictures
Dmitry UstalovIMM UB RAS / UrFUYekaterinburg, Russia
![Page 2: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/2.jpg)
Outline
• Introduction•Related Work•Approach•Evaluation•Results•Discussion•Conclusion
2
![Page 3: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/3.jpg)
Introduction
•The problem of mapping images to the word senses is quite important:•multimedia search,• text illustration,• quality assessment.
• It is also interesting to assess the Yet Another RussNet lexical resource.(Braslavski et al, 2014).
3
![Page 4: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/4.jpg)
Related Work
• PicNet, a proprietary resource(Mihalcea & Leong, 2008).• ImageNet annotates WordNet with
pictures & bounding boxes(Deng et al., 2009).• Intersection with WordNet.ru is negligible.
• ImageCLEF creates software and datasets for image indexing (Mül̈ler et al., 2010).
4
![Page 5: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/5.jpg)
Related Work: Flickr
•Single-query image retrieval(Reiter et al., 2007).•Semantic Web-based approach
(Trojahn et al., 2008).•Wikipedia-based approach
(Stampouli et al., 2010).•Flickr tags with visual saliency of images (Jiang et al., 2014).
5
![Page 6: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/6.jpg)
Problem
Given an annotated image I, a bilingual dictionary B, and a lexical resource S, produce a mapping Is.
“cat”, “tomcat”, “kitten” →«кот, кошка, котёночек»
6
![Page 7: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/7.jpg)
TagBag: Assumptions
•The most image tags are nouns.•Tags may be polysemous and the redundant tags may be present.• “crane” is «журавль» or «кран»?
•The image has a “main” object.
7
![Page 8: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/8.jpg)
TagBag
•Tag. Initialize an empty vector.• Iterate over image tags and retrieve all
the translations for each tag.• Add each occurrence to a dimension.
•Bag. Prune that vector.• Remove the low frequency dimensions
with the cut-off value.• Return the resulting vector.
8
![Page 9: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/9.jpg)
TagBag: Pseudocode
9
![Page 10: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/10.jpg)
Evaluation
•The present approach is pretty simple.Let’s evaluate it empirically.•Take the top 1500 English nouns and search for Flickr photos.
http://www.talkenglish.com/Vocabulary/Top-1500-Nouns.aspx
•Get the V.K. Mueller’s dictionary.http://ustalov.imm.uran.ru/pub/mueller.tar.gz
10
![Page 11: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/11.jpg)
Experimental Setup
•Yet Another RussNet (CC BY-SA).http://russianword.net/
•Similarity measures: • cosine similarity,• Jaccard index.
•Ask threeannotators tosubmitjudgements.
11
![Page 12: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/12.jpg)
призрак, тень, намёк
12https://www.flickr.com/photos/127324269@N03/16217604730
![Page 13: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/13.jpg)
труд, работа, занятие
13https://www.flickr.com/photos/79304587@N07/16192772090
![Page 14: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/14.jpg)
мужчина, парень, юноша
14https://www.flickr.com/photos/94029069@N03/15797009873
![Page 15: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/15.jpg)
футбол
15https://www.flickr.com/photos/113780395@N05/15789001293
![Page 16: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/16.jpg)
пища, провизия, питание, корм
16https://www.flickr.com/photos/80972943@N00/16396295195
![Page 17: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/17.jpg)
Results
•The accuracy is moderately high and the agreement level is good.•Both measures demonstrate the same performance.
17
http://ustalov.imm.uran.ru/pub/tagbag-aist.tar.gz
![Page 18: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/18.jpg)
Discussion
•Some mappings are the same w.r.t. the similarity measures and 13 of 43 of these mapping are wrong.•Three sources of errors:• sloppy image tags (7 of 13),• actual mapping errors (3 of 13),• batch uploads (3 of 13).
18
![Page 19: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/19.jpg)
Conclusion
•TagBag is an unsupervised approach for mapping images to synsets.•The performance depends both on image tags and ontology bias.•Visual saliency and spam filtering may increase the quality.
19
![Page 20: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/20.jpg)
Further Work
20
![Page 21: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures](https://reader038.fdocuments.net/reader038/viewer/2022102903/55b62c1ebb61ebcb078b476c/html5/thumbnails/21.jpg)
Thank you!
Dmitry Ustalova post-graduate student @IMM UB RAS, Yekaterinburg, Russia.
https://ustalov.name/[email protected]
The present work is supported by the Russian Foundation for the Humanities, project no. 13-04-12020.
21