Human abilities Presented By Mahmoud Awadallah 1.

63
Human abilities Presented By Mahmoud Awadallah 1

Transcript of Human abilities Presented By Mahmoud Awadallah 1.

Page 1: Human abilities Presented By Mahmoud Awadallah 1.

Human abilities

Presented ByMahmoud Awadallah

1

Page 2: Human abilities Presented By Mahmoud Awadallah 1.

What do we perceive in a glanceof a real-world scene?

Bryan Russell

Page 3: Human abilities Presented By Mahmoud Awadallah 1.

Motivation

• Much can be recognized quickly

• Investigate the early computations of an image• Analyze real-world, complicated scenes

Page 4: Human abilities Presented By Mahmoud Awadallah 1.
Page 5: Human abilities Presented By Mahmoud Awadallah 1.

Stimuli: outdoor images

Page 6: Human abilities Presented By Mahmoud Awadallah 1.

Stimuli: outdoor images

Page 7: Human abilities Presented By Mahmoud Awadallah 1.

Stimuli: indoor images

Page 8: Human abilities Presented By Mahmoud Awadallah 1.

Stimuli: indoor images

Page 9: Human abilities Presented By Mahmoud Awadallah 1.
Page 10: Human abilities Presented By Mahmoud Awadallah 1.
Page 11: Human abilities Presented By Mahmoud Awadallah 1.
Page 12: Human abilities Presented By Mahmoud Awadallah 1.

Experiment specifications

• 5 naïve scorers

• 105 attributes assessed for eachdescription

• 2 scoring fields for each attribute:– whether the attribute is described

– if yes, whether it is accurate

Page 13: Human abilities Presented By Mahmoud Awadallah 1.

Computation of score

Attribute: building, Image: 52, PT: 500ms

Subject123

Correctly described?YesNoYes

Score: 0.67

For image 52, normalize by max score across all PT

Page 14: Human abilities Presented By Mahmoud Awadallah 1.

How the scorers perform

Building attribute

Page 15: Human abilities Presented By Mahmoud Awadallah 1.
Page 16: Human abilities Presented By Mahmoud Awadallah 1.

The “content” of a single fixation

Animate objects

Page 17: Human abilities Presented By Mahmoud Awadallah 1.

The “content” of a single fixation

Inanimate objects

Page 18: Human abilities Presented By Mahmoud Awadallah 1.

The “content” of a single fixation

Scene

Page 19: Human abilities Presented By Mahmoud Awadallah 1.

The “content” of a single fixation

Social events

Page 20: Human abilities Presented By Mahmoud Awadallah 1.

Outdoor vs. indoor bias

Page 21: Human abilities Presented By Mahmoud Awadallah 1.

Outdoor vs. indoor bias

Page 22: Human abilities Presented By Mahmoud Awadallah 1.

Summary plots

Page 23: Human abilities Presented By Mahmoud Awadallah 1.

Summary plots

Page 24: Human abilities Presented By Mahmoud Awadallah 1.
Page 25: Human abilities Presented By Mahmoud Awadallah 1.

Sensory vs. object/scene

Page 26: Human abilities Presented By Mahmoud Awadallah 1.

Sensory vs. object/scene

Page 27: Human abilities Presented By Mahmoud Awadallah 1.

Sensory vs. object/scene

Page 28: Human abilities Presented By Mahmoud Awadallah 1.

Correlation of object/sceneperception

Page 29: Human abilities Presented By Mahmoud Awadallah 1.

Scene vs. objects

Page 30: Human abilities Presented By Mahmoud Awadallah 1.

Conclusions

• Outdoor scene bias

• Less information needed forshape/sensory recognition

• Weak correlation between scene andobject perception

Page 31: Human abilities Presented By Mahmoud Awadallah 1.

80 million tiny images: a large dataset for non-parametric object and scene

recognition

Page 32: Human abilities Presented By Mahmoud Awadallah 1.

A.I. for the postmodern world:• All questions have already been answered…many times, in

many ways

• Google is dumb, the “intelligence” is in the data

Page 33: Human abilities Presented By Mahmoud Awadallah 1.

How about visual data?

• The key question here in this paper is: How big does the image dataset need to be to robustly perform recognition using simple nearest-neighbor schemes?

• Complex classification methods don’t extend well

• Can we use a simple classification method?

Page 34: Human abilities Presented By Mahmoud Awadallah 1.

Past and future of image datasets in computer vision

Lenaa dataset in one picture

1972

100

105

1010

1020

Number of

pictures

1015

Human Click Limit(all humanity takingone picture/secondduring 100 years)

Time 1996

40.000

COREL

2007

2 billion

2020?

Slide by Antonio Torralba

Page 35: Human abilities Presented By Mahmoud Awadallah 1.

How big is Flickr?

Credit: Franck_Michel (http://www.flickr.com/photos/franckmichel/)

100M photos updated daily6B photos as of August 2011!

• ~3B public photos

Page 36: Human abilities Presented By Mahmoud Awadallah 1.

How Annotated is Flickr? (tag search)

Party – 23,416,126

Paris – 11,163,625

Pittsburgh – 1,152,829

Chair – 1,893,203

Violin – 233,661

Trashcan – 31,200

Page 37: Human abilities Presented By Mahmoud Awadallah 1.

Noisy Output from Image Search Engines

Page 38: Human abilities Presented By Mahmoud Awadallah 1.

Thumbnail Collection Project

Collected 80M images

http://people.csail.mit.edu/torralba/tinyimages

Page 39: Human abilities Presented By Mahmoud Awadallah 1.

Thumbnail Collection Project

Collect images for ALL objects• List obtained from WordNet

• 75,378 non-abstract nouns in English

Page 40: Human abilities Presented By Mahmoud Awadallah 1.

Web image dataset

79.3 million images

Collected using imagesearch engines

List of nouns taken from Wordnet

Save all images in 32x32

resolution

Page 41: Human abilities Presented By Mahmoud Awadallah 1.

How Much is 80M Images?

One feature-length movie:• 105 min = 151K frames @ 24 FPS

For 80M images, watch 530 movies

How do we store this?• 1k * 80M = 80 GB

• Actual storage: 760GB

Page 42: Human abilities Presented By Mahmoud Awadallah 1.

Powers of 10Number of images on my hard drive: 104

Number of images seen during my first 10 years: 108 (3 images/second * 60 * 60 * 16 * 365 * 10 = 630720000)

Number of images seen by all humanity: 1020

106,456,367,669 humans1 * 60 years * 3 images/second * 60 * 60 * 16 * 365 = 1 from http://www.prb.org/Articles/2002/HowManyPeopleHaveEverLivedonEarth.aspx

Number of photons in the universe: 1088

Number of all 8-bits 32x32 images: 107373

256 32*32*3 ~ 107373

Page 43: Human abilities Presented By Mahmoud Awadallah 1.

Are 32x32 images enough?

Page 44: Human abilities Presented By Mahmoud Awadallah 1.

Are 32x32 images enough?

Page 45: Human abilities Presented By Mahmoud Awadallah 1.

Are 32x32 images enough?

Page 46: Human abilities Presented By Mahmoud Awadallah 1.

Statistics of database of tiny images

46

Page 47: Human abilities Presented By Mahmoud Awadallah 1.

Lots

Of

Images

A. Torralba, R. Fergus, W.T.Freeman. PAMI 2008

Page 48: Human abilities Presented By Mahmoud Awadallah 1.

Lots

Of

Images

A. Torralba, R. Fergus, W.T.Freeman. PAMI 2008

Page 49: Human abilities Presented By Mahmoud Awadallah 1.

Lots

Of

Images

Page 50: Human abilities Presented By Mahmoud Awadallah 1.

First Attempt

Used SSD++ to find nearest neighbors of query image

• Used first 19 principal components

Page 51: Human abilities Presented By Mahmoud Awadallah 1.

SSD says these are not similar

?

Page 52: Human abilities Presented By Mahmoud Awadallah 1.

Another similarity measure

Page 53: Human abilities Presented By Mahmoud Awadallah 1.

Wordnet Voting Scheme

Ground truth

One image – one vote

Page 54: Human abilities Presented By Mahmoud Awadallah 1.

Classification at Multiple Semantic Levels

Votes:

Animal 6Person 33Plant 5Device 3Administrative4Others 22

Votes:

Living 44Artifact 9Land 3Region 7Others 10

Page 55: Human abilities Presented By Mahmoud Awadallah 1.
Page 56: Human abilities Presented By Mahmoud Awadallah 1.

Person Recognition

23% of all imagesin dataset containpeople

Wide range ofposes: not justfrontal faces

Page 57: Human abilities Presented By Mahmoud Awadallah 1.

Person Recognition – Test Set

•1016 images fromAltavista using“person” query

•High res and 32x32available

•Disjoint from 79million tiny images

Page 58: Human abilities Presented By Mahmoud Awadallah 1.
Page 59: Human abilities Presented By Mahmoud Awadallah 1.
Page 60: Human abilities Presented By Mahmoud Awadallah 1.

Person Recognition

Task: person in image or not?

(c) shows the recall-precision curves for all 1018 images gathered fromAltavista, and (d) shows curves for the subset of 173 images where people occupy at least 20% of the image

Page 61: Human abilities Presented By Mahmoud Awadallah 1.

Scene classification

yellow = 7,900 image training set; red = 790,000 images; blue = 79,000,000 images

Page 62: Human abilities Presented By Mahmoud Awadallah 1.
Page 63: Human abilities Presented By Mahmoud Awadallah 1.

What If we have Labels…