Iccv2009 recognition and learning object categories p2 c03 - objects and annotations
Transcript of Iccv2009 recognition and learning object categories p2 c03 - objects and annotations
![Page 1: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/1.jpg)
Pictures and Words
![Page 2: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/2.jpg)
Vision and language in human brain
FFA
LOCV1
PPABrocaArea
WernickeArea
Language Vision
![Page 3: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/3.jpg)
Vision and language in human brain
figure modified from: http://www.colorado.edu/intphys/Class/IPHY3730
![Page 4: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/4.jpg)
Vision and language in human brain
figure modified from: http://www.colorado.edu/intphys/Class/IPHY3730
(Translation: “This is not a pipe.”)
?
![Page 5: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/5.jpg)
![Page 6: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/6.jpg)
![Page 7: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/7.jpg)
Fei-Fei, Iyer, Koch, Perona, JoV, 2007
What can you see in a glance of a scene?
![Page 8: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/8.jpg)
I think I saw two people on a field. (Subject: RW)
Outdoor scene. There were some kind of animals, maybe dogs or horses, in the middle of the picture. It looked like they were running in the middle of a grassy field. (Subject: IV)
two people, whose profile was toward me. looked like they were on a field of some sort and engaged in some sort of sport (their attire suggested soccer, but it looked like there was too much contact for that). (Subject: AI)
Some kind of game or fight. Two groups of two men? The foregound pair looked like one was getting a fist in the face. Outdoors seemed like because i have an impression of grass and maybe lines on the grass? That would be why I think perhaps a game, rough game though, more like rugby than football because they pairs weren't in pads and helmets, though I did get the impression of similar clothing. maybe some trees? in the background. (Subject: SM)
PT = 500ms
PT = 27ms
PT = 40ms
PT = 67ms
This was a picture with some dark sploches in it. Yeah. . .that's about it. (Subject: KM)
PT = 107ms
Fei-Fei, Iyer, Koch, Perona, JoV, 2007
![Page 9: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/9.jpg)
Section outline
• Early “pictures and words” work• Content-based retrieval• Beyond nouns, towards total scene annotation
![Page 10: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/10.jpg)
“Pictures and words”• Barnard, Duygulu, de Freitas, Forsyth, Blei, Jordan,
Matching words and pictures, JMLR, 2003• Duygulu, Barnard, de Freitas, Forsyth, Object Recognition
as Machine Translation: Learning a lexicon for a fixed image vocabulary , ECCV, 2003
• Blei & Jordan, Modeling annotated data, ACM SIGIR, 2003
• Chang, Goh, Sychay, & Wu, Soft annotation using Bayes point machines, IEEE Transactions on Circuits and Systems for Video Technology, 2003
• Goh, Chang, & Cheng, Ensemble of SVM-based classifiers for annotation, 2003
• ….
![Page 11: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/11.jpg)
Barnard et al. JMLR, 2005
• Images are composed of multimodal “concepts”.
• Images are clustered based on priors over concepts.
• Learning determines localized concepts models from global annotations.– Addresses the correspondence
problem – One possible assumption:
concept models simultaneously generate both a word and blob
sun
sunskywaterwaves
Slide courtesy of Kobus Barnard (1 hour ago!)
![Page 12: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/12.jpg)
Barnard et al. JMLR, 2005
sun
sunskywaterwaves
Slide courtesy of Kobus Barnard (1 hour ago!)
• A generative model for assembling image data sets from multimodal clusters– Chose an image cluster by p(c)– Chose multimodal concept
clusters using p(s|c)– From each multimodal cluster,
sample a Gaussian for blob features, p(b|s), and a multinomial for words, p(w|s)
– (Skip with some probability to account for mismatched numbers of words and blobs)
– For a given correspondence*
p({w b}) p(c)c p(w | l)p(b | l)p(l | c)
l
wb
![Page 13: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/13.jpg)
Barn
ard
et a
l. JM
LR, 2
005
![Page 14: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/14.jpg)
Section outline
• Early “pictures and words” work• Content-based retrieval• Beyond nouns, towards total scene annotation
![Page 15: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/15.jpg)
Content-based retrieval
Rose
FlowerPetals
Australian Floribunda Rose
Love
CorollaTower France
Eiffel Tower
Paris
Elegance
Symmetry
Slide courtesy of Ritendra Datta, Jia Li, James Z. Wang
![Page 16: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/16.jpg)
Literature – MANY!!!
• A. W. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, Content-Based Image Retrieval at the End of the Early Years, IEEE Trans. Pattern Analysis and Machine Intelligence , 22(12):1349-1380, 2000.
• R. Datta, D. Joshi, J. Li, and J. Z. Wang, Image Retrieval: Ideas, Influences, and Trends of the New Age, ACM Computing Surveys, vol. 40, no. 2, pp. 5:1-60, 2008.
![Page 17: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/17.jpg)
Try out Alipr (www.alipr.com)
![Page 18: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/18.jpg)
Try out Alipr (www.alipr.com)
![Page 19: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/19.jpg)
Automatic Image Annotation: ALIP
Slide courtesy of Ritendra Datta, Jia Li, James Z. Wang
![Page 20: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/20.jpg)
Automatic Image Annotation: ALIP
Slide courtesy of Ritendra Datta, Jia Li, James Z. Wang
![Page 21: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/21.jpg)
Automatic Image Annotation: ALIP
Classification results form the basis Salient words appearing in the classification favored more
Annotation Process
Building, sky, lake, landscape,
Europe, tree
Food, indoor, cuisine, dessert
Snow, animal, wildlife, sky,
cloth, ice, people
Slide courtesy of Ritendra Datta, Jia Li, James Z. Wang
![Page 22: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/22.jpg)
Section outline• Early “pictures and words” work• Content-based retrieval• Beyond nouns, towards total scene annotation
– PropositionsA. Gupta and L. S. Davis, Beyond Nouns: Exploiting prepositions and
comparative adjectives for learning visual classifiers, ECCV, 2008
– Objects, scenes, activitiesL.-J. Li and L. Fei-Fei. What, where and who? Classifying event by
scene and object recognition. ICCV, 2007L.-J. Li, R. Socher and L. Fei-Fei. Towards Total Scene
Understanding:Classification, Annotation and Segmentation in an Automatic Framework. CVPR, 2009
![Page 23: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/23.jpg)
Section outline• Early “pictures and words” work• Content-based retrieval• Beyond nouns, towards total scene annotation
– PropositionsA. Gupta and L. S. Davis, Beyond Nouns: Exploiting prepositions and
comparative adjectives for learning visual classifiers, ECCV, 2008
– Objects, scenes, activitiesL.-J. Li and L. Fei-Fei. What, where and who? Classifying event by
scene and object recognition. ICCV, 2007L.-J. Li, R. Socher and L. Fei-Fei. Towards Total Scene
Understanding:Classification, Annotation and Segmentation in an Automatic Framework. CVPR, 2009
![Page 24: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/24.jpg)
Gupta & Davis, EECV, 2008
“Beyond nouns”
![Page 25: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/25.jpg)
Gupta & Davis, EECV, 2008
“Beyond nouns”
![Page 26: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/26.jpg)
Gupta & Davis, EECV, 2008
![Page 27: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/27.jpg)
Section outline• Early “pictures and words” work• Content-based retrieval• Beyond nouns, towards total scene annotation
– PropositionsA. Gupta and L. S. Davis, Beyond Nouns: Exploiting prepositions and
comparative adjectives for learning visual classifiers, ECCV, 2008
– Objects, scenes, activitiesL.-J. Li and L. Fei-Fei. What, where and who? Classifying event by
scene and object recognition. ICCV, 2007L.-J. Li, R. Socher and L. Fei-Fei. Towards Total Scene
Understanding:Classification, Annotation and Segmentation in an Automatic Framework. CVPR, 2009
![Page 28: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/28.jpg)
What, where and who? Classifying events by scene and object recognition
L-J Li & L. Fei-Fei, ICCV 2007
![Page 29: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/29.jpg)
scene pathway object pathway
event
L.-J. Li & L. Fei-Fei ICCV 2007
“where” pathway
“what” pathway
PFC
![Page 30: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/30.jpg)
scene pathway
“Polo Field”
L.-J. Li & L. Fei-Fei ICCV 2007
Fei-Fei & Perona, CVPR, 2005
![Page 31: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/31.jpg)
object pathway
O= ‘horse’
L.-J. Li & L. Fei-Fei ICCV 2007
L.-J. Li , G. Wang & L. Fei-Fei, CVPR, 2007
G. Wang & L. Fei-Fei, CVPR, 2006
L. Cao & L. Fei-Fei, ICCV, 2007
![Page 32: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/32.jpg)
The 3W stories
what who where
L.-J. Li & L. Fei-Fei ICCV 2007
![Page 33: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/33.jpg)
Classification Annotation Segmentation
Horse
Horse
Horse
HorseHorse
SkyTree
Grass
AthleteHorseGrassTreesSkySaddle
Horse
Athlete
class: Polo
L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009
![Page 34: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/34.jpg)
Total Scene
Our model: a hierarchical representation of the image and its semantic contents
Class: Polo
AthleteHorseGrassTreesSkySaddle
HorseHorseHorse
Horse
SkyTree
GrassHorse
Athlete
…
noisy images and tags
Learning
Recognition
GenerativeModel
initialization
Sky
Athlete
Tree
Mountain
Rock Class:
Rock climbing
AthleteMountainTreesRockSkyAscent
Sky
Athlete
Water
Treesailboat
Class: Sailing
AthleteSailboatTreesWaterSkyWind
L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009
![Page 35: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/35.jpg)
Total Scene
Our model: a hierarchical representation of the image and its semantic contents
Class: Polo
AthleteHorseGrassTreesSkySaddle
HorseHorseHorse
Horse
SkyTree
GrassHorse
Athlete
…
noisy images and tags
Learning
Recognition
GenerativeModel
initialization
Sky
Athlete
Tree
Mountain
Rock Class:
Rock climbing
AthleteMountainTreesRockSkyAscent
Sky
Athlete
Water
Treesailboat
Class: Sailing
AthleteSailboatTreesWaterSkyWind
GenerativeModel
L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009
![Page 36: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/36.jpg)
Total Scene
The model: a hierarchical representation of the image and its semantic contents
AthleteHorseGrassTreesSkySaddle
C
Polo
O
horse
RNF
XAr ZNr Nt
S
T
D
Horse
“Switch variable”VisibleNot visible
“Connector variable”
Visual
Text
![Page 37: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/37.jpg)
Total Scene
Our model: a hierarchical representation of the image and its semantic contents
Class: Polo
AthleteHorseGrassTreesSkySaddle
HorseHorseHorse
Horse
SkyTree
GrassHorse
Athlete
…
noisy images and tags
Learning
Recognition
GenerativeModel
initialization
Sky
Athlete
Tree
Mountain
Rock Class:
Rock climbing
AthleteMountainTreesRockSkyAscent
Sky
Athlete
Water
Treesailboat
Class: Sailing
AthleteSailboatTreesWaterSkyWind
GenerativeModel
Learning
initialization
L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009
![Page 38: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/38.jpg)
Total Scene
Need some good, initial “guestimate” of O
C
RNF
XAr Nr
ZNt
T
S
O
Scene/Event imagesfrom the Internet
L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009
![Page 39: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/39.jpg)
Total Scene
Scene/Event imagesfrom the Internet
Athlete
HorseGrassTree
SaddleWind
+
GenerativeModel
Auto-semi-supervised learning: Small # of initialized images + Large # of uninitialized images
Large # of uninitialized images
Small # of initialized images
L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009
![Page 40: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/40.jpg)
Total Scene
Our model: a hierarchical representation of the image and its semantic contents
Class: Polo
AthleteHorseGrassTreesSkySaddle
HorseHorseHorse
Horse
SkyTree
GrassHorse
Athlete
…
noisy images and tags
Learning
Recognition
GenerativeModel
initialization
Sky
Athlete
Tree
Mountain
Rock Class:
Rock climbing
AthleteMountainTreesRockSkyAscent
Sky
Athlete
Water
Treesailboat
Class: Sailing
AthleteSailboatTreesWaterSkyWind
L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009
![Page 41: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/41.jpg)
Badminton
Bocce
Croquet
Polo
8 Event/Scene Classes
Rockclimbing
Rowing
Sailing
Snowboarding
![Page 42: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/42.jpg)
43
Class: Croquet Class: Bocce Class: Snowboarding Class: Polo
Class: Sailing Class: Badminton Class: Rock Climbing Class: Rowing
Total Scene
Some sample results
L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009
![Page 43: Iccv2009 recognition and learning object categories p2 c03 - objects and annotations](https://reader035.fdocuments.net/reader035/viewer/2022062312/554e731db4c9054a698b4bef/html5/thumbnails/43.jpg)
I think I saw two people on a field. (Subject: RW)
Outdoor scene. There were some kind of animals, maybe dogs or horses, in the middle of the picture. It looked like they were running in the middle of a grassy field. (Subject: IV)
two people, whose profile was toward me. looked like they were on a field of some sort and engaged in some sort of sport (their attire suggested soccer, but it looked like there was too much contact for that). (Subject: AI)
Some kind of game or fight. Two groups of two men? The foregound pair looked like one was getting a fist in the face. Outdoors seemed like because i have an impression of grass and maybe lines on the grass? That would be why I think perhaps a game, rough game though, more like rugby than football because they pairs weren't in pads and helmets, though I did get the impression of similar clothing. maybe some trees? in the background. (Subject: SM)
PT = 500ms
PT = 27ms
PT = 40ms
PT = 67ms
This was a picture with some dark sploches in it. Yeah. . .that's about it. (Subject: KM)
PT = 107ms
Fei-Fei, Iyer, Koch, Perona, JoV, 2007