ImageNet: A Large-Scale Hierarchical Image Database Jia Deng, Wei Dong, Richard Socher, Li-Jia Li,...

13
ImageNet: A Large-Scale Hierarchical Image Database Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei Dept. of Computer Science, Princeton University, USA CVPR 2009 You Zhou [email protected] u 1

Transcript of ImageNet: A Large-Scale Hierarchical Image Database Jia Deng, Wei Dong, Richard Socher, Li-Jia Li,...

1

ImageNet: A Large-Scale Hierarchical Image Database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei

Dept. of Computer Science, Princeton University, USA

CVPR 2009

You [email protected]

3

Dataset in Computer Vision

UIUC Cars (2004)S. Agarwal, A. Awan, D. Roth

FERET Faces (1998)P. Phillips, H. Wechsler, J. Huang, P. Raus

CMU/VASC Faces (1998)H. Rowley, S. Baluja, T. Kanade

MNIST digits (1998-10)Y LeCun & C. Cortes

KTH human action (2004)I. Leptev & B. Caputo

Sign Language (2008)P. Buehler, M. Everingham, A. Zisserman

Segmentation (2001)D. Martin, C. Fowlkes, D. Tal, J. Malik.

COIL Objects (1996)S. Nene, S. Nayar, H. Murase

4

WordNet• WordNet is a large lexical database of English. Nouns,

verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.

• WordNet as an ontology

5

ImageNet• Image database organized according to the WordNet

hierarchy, in which each node of the hierarchy is depicted by hundreds and thousands of images.

• Knowledge ontology: Taxonomy, Partonomy

6

Collect Candidate Images• For each synset, the queries are the set of WordNet

synonyms• Accuracy of Internet image search results: 10%

– For 500-1000 clean images, need 10K images• Query expansion

– Synonyms: German shepherd, German police dog, German shepherd dog, Alsatian

– Appending words from ancestors: sheepdog, dog • Multiple languages

– Italian, Dutch, Spanish, Chinese• More search engines

7

Clean Candidate Images• Rely on humans to verify each candidate image for a

given synset• 19 years’ work• No graduate students would want to do this project• Amazon Mechanical Turk (AMT)

– 300 images: 0.02 dollar– 14,197,122 images: 946 dollars– 10 repetition: 9460 dollars

• July 2008 - April 2010: 11 million images, 15,000+ synsets

8

HIT Design• HIT(Human Intelligence Task)• Application• Qualification Test• Start tasks

– Learn about the keyword: Wiki, Google– Definition quiz: choice question about the keyword– Choose images fit the keyword (Yes or No)– Pass cheating detection

• Feedback

9

Quality Control System• Human users make mistakes• Not all users follow the instructions• Users do not always agree with each other

– Subtle or confusing synsets, e.g. Burmese cat

10

Properties of ImageNet• Scale

– 14,197,122 images, 21841 synsets indexed• Hierarchy

– densely populated semantic hierarchy

11

Properties of ImageNet• Accuracy

• Diversity

12

ImageNet Applications• Non-parametric Object Recognition

– NN-voting + noisy ImageNet– NN-voting + clean ImageNet– Naive Bayesian Nearest Neighbor (NBNN)– NBNN-100

• Tree Based Image Classification• Automatic Object Localization

13

Pros and Cons• Pros:

– Large dataset as training resource– Benchmarking– Open: Download Original Images, URLs, Features,

Object Attributes, API• Cons:

– The matching between physical world/ WordNet / ImageNet.

– Counterword– Only one tag per image