Towards Total Scene Understanding:
Classification, Annotation and Segmentation in an
Automatic FrameworkLi-Jia Li, Richard Socher, Li
Fei-Fei
1
2
City Travel
Pagoda
SunriseSunshine
Sun
3
City Travel
Pagoda
SunriseSunshine
Sun
Weber et al 00Fergus et al 03Felzenswalb et al 04Fei-Fei et al 05Sivic et al 05Bosch et al 06Oliva et al 01Lazebnik et al 06
Shi et al 00Felzenszwalb et al04Sali et al 99Winn et al 05Kumar et al 05Cao et al 07Russell et al 06Todorovic et al 06
Duygulu et al 02
Barnard et al 03
Blei et al 03Gupta et al 08
Alipr Li et al 03Sudderth et al 05
SegmentationSegmentation
ClassificationClassification
AnnotationAnnotationRemark: Approaches in yellow will be used to compare withour model in later Experiments.
4
City Travel
Pagoda
SunriseSunshine
Sun
Weber et al 00Fergus et al 03Felzenswalb et al 04Fei-Fei et al 05Sivic et al 05Bosch et al 06Oliva et al 01Lazebnik et al 06
Shi et al 00Felzenszwalb et al04Sali et al 99Winn et al 05Kumar et al 05Cao et al 07Russell et al 06Todorovic et al 06
Duygulu et al 02
Barnard et al 03
Blei et al 03Gupta et al 08
Alipr Li et al 03Sudderth et al 05
SegmentationSegmentation
ClassificationClassification
AnnotationAnnotation
Total Scene Total Scene UnderstandiUnderstandi
ngng
Application
5
6
ClassificationClassification AnnotationAnnotation SegmentationSegmentation
Mutually beneficial!Mutually beneficial!
7
AthleteHorseGrassTreesSkySaddle
ClassificationClassification AnnotationAnnotation SegmentationSegmentation
HorseHorse
class: Polo
8
Horse
Horse
Horse
HorseHorse
SkyTree
Grass
AthleteHorseGrassTreesSkySaddle
ClassificationClassification AnnotationAnnotation SegmentationSegmentation
Horse
Athleteclass: Polo
9
class: Polo
Horse
Horse
Horse
HorseHorse
AthleteHorseGrassTreesSkySaddle
ClassificationClassification AnnotationAnnotation SegmentationSegmentation
10
Related Work:
Tu et al 03
AnnotationAnnotationSegmentationSegmentation
Horse
Horse
Horse
HorseHorse
SkyTree
GrassHorse
Athlete
Li & Fei-Fei 07
AnnotationAnnotationClassificationClassification
Sky
GrassHorse
AthleteHorse
Horse
Horse
HorseHorse
Class: Polo
ClassificationClassificationSegmentationSegmentation
Tree
Heitz et al 08
Class: Polo
Learning
Model
Recognition & Experiment
Outline
ClassificationClassification
AnnotationAnnotation SegmentationSegmentation
12
C
Nr
O
RNF
XAr
NtZ
S
T
D
AthleteHorseGrassTreesSkySaddle
13
C
Visual
Text
class: Polo
AthleteHorseGrassTreesSkySaddle
Joint distribution of random variable Visual Component
Text Component.
D
14
O
14
Text Component.
D
Visual
TextC
class: Polo
15
RNF
Color LocationTexture Shape
Text Component.
O
D
Visual
TextC
class: Polo
RNF
O
D
Visual
TextC
class: Polo
16
XAr
Text Component.
RNF
O
D
Visual
TextC
class: Polo
XAr ZNr Nt “Connector variable”
AthleteHorseGrassTreesSkySaddle
Text Component.
RNF
O
D
Visual
TextC
class: Polo
XAr ZNr Nt “Connector variable”
.
S AthleteHorseGrassTreesSkySaddle
AthleteHorseGrassTreesSkySaddle
VisibleNot visible
“Switch variable”
Horse
Horse
Horse
HorseHorse
Athlete
Horse
RNF
O
D
Visual
TextC
class: Polo
XAr ZNr Nt “Connector variable”
S AthleteHorseGrassTreesSkySaddle
VisibleNot visible
“Switch variable”
T
Horse
.
Visual Text C
Nr
O
RNF
XAr
NtZ
S
TLearning
Model
Recognition & Experiment
Outline
21
Learning
Exact Exact Inference is Inference is Intractable !Intractable !
Relationship of the random variables
Visual
Text C
Nr
O
RNF
XAr
NtZ
S
T
22
Relationship of the random variables
Visual
Text C
Nr
O
RNF
XAr
NtZ
S
T
Top-down force
Bottom-up force from visual information
Bottom-up force from text information
Collapsed Gibbs Sampling
(R. Neal, 2000)
Scene/Event imagesfrom the Internet
There is no object-text correspondence…
AthleteHorseGrassTree
Saddle
23
Scene/Event imagesfrom the Internet
Our model builds the correspondence…
C
Nr
O
RNF
XAr
NtZ
S
T
D
AthleteHorseGrassTree
Saddle
24
25
AthleteHorseGrassTreesSkySaddle
AthleteHorseGrassBall
However, a big obstacle is: many objects always co-occur together
??
?
Scene/Event imagesfrom the Internet
26
C
RNF
XAr Nr Z
Nt
T
S
O
One solution: some good initialization of O
Grass
Athlete
Horse
AthleteHorseGrassTreesSkySaddle
Scene/Event imagesfrom the Internet
Scene/Event imagesfrom the Internet
27
Initializing O: obtain internet images for each O Object images
28
Scene/Event images
C
RNF
XAr Nr Z
Nt
T
SO
Any object
detection&
segmentation
Algorithm
D
Initializing O: train an object detector for each OObject imagesEvent/Scene images
29
Scene/Event images
…Black box
object detection& segmentation
Black box object detection& segmentation
C
RNF
XAr Nr Z
Nt
T
SO
D
Initialize O in the scene image by the trained object detectors
Object imagesEvent/Scene images
Any object
detection&
segmentation
Algorithm
30
Scene/Event images
…Black box
object detection& segmentation
Black box object detection& segmentation
C
RNF
XAr Nr Z
Nt
T
SO
Black box object detection& segmentation
D
Initialize O in the scene image by the trained object detectors
Cao & Fei-Fei, 2007
θ C
XR
O
NrAr
Our Model
Object imagesEvent/Scene images
C
RNF
XAr Nr Z
Nt
T
SO
D
AutoAuto--semi-supervised learning: Small # of initialized images + Large # of uninitialized images
Our Model + AthleteHorseGrassTree
SaddleWind
Small # of initialized images
AthleteRockGrassTree
SkyRope
AthleteSnow
TreeSky
SnowboardLarge # of uninitialized images
Scene/Event images
AthleteHorseGrassTree
SaddleWind
AthleteRockGrassTree
SkyRope
AthleteSnow
TreeSky
Snowboard
Large # of uninitialized images
Visual Text C
Nr
O
RNF
XAr
NtZ
S
T
Learning Model
Recognition & Experiment• Dataset• Learned Model• Results
OutlineSmall # of automatically initialized images
Badminton
Bocce
Croquet
Polo
33
8 Event/Scene Classes
Remark: Tags are not used during testing
Rockclimbing
Rowing
Sailing
Snowboarding
34
8 Event/Scene Classes
35
C
Nr
RNF
XAr
NtZ
S
T
Learned model: O
D
O
36
Athlete
Grass
Horse
C
Nr
O
NF
XAr
NtZ
S
T
D
R
Learned model: R
37
C
Nr
O
RNF
XAr
NtZ
T
D
S
Learned model: S
38
8 way classification: 54%
ClassificationClassification AnnotationAnnotation SegmentationSegmentation
39
ClassificationClassification AnnotationAnnotation SegmentationSegmentation
Alipr: Li et al 03 Corr LDA: Blei et al 03
40
ClassificationClassification AnnotationAnnotation SegmentationSegmentation
Effect of top-down class context
41
Horse
C
O
R X Z
T
SO
R X Z
T
S
Model w/o top-down class Full Model
AthleteHorseGrassTree
SaddleWind
AthleteRockGrassTree
SkyRope
AthleteSnow
TreeSky
Snowboard
Large # of uninitialized images
Small # of automatically initialized images
Visual Text C
Nr
O
RNF
XAr
NtZ
S
T
Sky
AthleteTree
Mountain
Rock Class: Rock
climbingAthleteMountainTreeRockSkyAscent
Sky
Athlete
Water
Tree sailboat
Class: SailingAthleteSailboatTreeWaterSkyWind
Learning Model
Recognition & Experiment
Tree
AthleteSnowboard
Snow
Class: Snowboarding
AthleteSnowboardTreeSnowSkyPowder
ThankProf. Silvio Savarese , Juan Carlos Niebles, Chong Wang, Barry Chai, Min Sun, Bangpeng Yao, Hao Su, Jia Deng, anonymous reviewers
And You
43
Top Related