Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic...

43
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework Li-Jia Li, Richard Socher, Li Fei-Fei 1
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    229
  • download

    0

Transcript of Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic...

Towards Total Scene Understanding:

Classification, Annotation and Segmentation in an

Automatic FrameworkLi-Jia Li, Richard Socher, Li

Fei-Fei

1

2

City Travel

Pagoda

SunriseSunshine

Sun

3

City Travel

Pagoda

SunriseSunshine

Sun

Weber et al 00Fergus et al 03Felzenswalb et al 04Fei-Fei et al 05Sivic et al 05Bosch et al 06Oliva et al 01Lazebnik et al 06

Shi et al 00Felzenszwalb et al04Sali et al 99Winn et al 05Kumar et al 05Cao et al 07Russell et al 06Todorovic et al 06

Duygulu et al 02

Barnard et al 03

Blei et al 03

Gupta et al 08

Alipr Li et al 03

Sudderth et al 05

SegmentationSegmentation

ClassificationClassification

AnnotationAnnotation

Remark: Approaches in yellow will be used to compare withour model in later Experiments.

4

City Travel

Pagoda

SunriseSunshine

Sun

Weber et al 00Fergus et al 03Felzenswalb et al 04Fei-Fei et al 05Sivic et al 05Bosch et al 06Oliva et al 01Lazebnik et al 06

Shi et al 00Felzenszwalb et al04Sali et al 99Winn et al 05Kumar et al 05Cao et al 07Russell et al 06Todorovic et al 06

Duygulu et al 02

Barnard et al 03

Blei et al 03

Gupta et al 08

Alipr Li et al 03

Sudderth et al 05

SegmentationSegmentation

ClassificationClassification

AnnotationAnnotation

Total Scene Total Scene UnderstandiUnderstandi

ngng

Application

5

6

ClassificationClassification AnnotationAnnotation SegmentationSegmentation

Mutually beneficial!Mutually beneficial!

7

AthleteHorseGrassTreesSkySaddle

ClassificationClassification AnnotationAnnotation SegmentationSegmentation

HorseHorse

class: Polo

8

Horse

Horse

Horse

HorseHorse

SkyTree

Grass

AthleteHorseGrassTreesSkySaddle

ClassificationClassification AnnotationAnnotation SegmentationSegmentation

Horse

Athlete

class: Polo

9

class: Polo

Horse

Horse

Horse

HorseHorse

AthleteHorseGrassTreesSkySaddle

ClassificationClassification AnnotationAnnotation SegmentationSegmentation

10

Related Work:

Tu et al 03

AnnotationAnnotation

SegmentationSegmentation

Horse

Horse

Horse

HorseHorse

SkyTree

GrassHorse

Athlete

Li & Fei-Fei 07

AnnotationAnnotation

ClassificationClassification

Sky

GrassHorse

Athlete

Horse

Horse

Horse

HorseHorse

Class: Polo

ClassificationClassification

SegmentationSegmentation

Tree

Heitz et al 08

Class: Polo

Learning

Model

Recognition & Experiment

Outline

ClassificationClassification

AnnotationAnnotation SegmentationSegmentation

12

C

Nr

O

RNF

XAr

NtZ

S

T

D

AthleteHorseGrassTreesSkySaddle

13

C

Visual

Text

class: Polo

AthleteHorseGrassTreesSkySaddle

Joint distribution of random variable Visual Component

Text Component.

D

14

O

14

Text Component.

D

Visual

TextC

class: Polo

15

RNF

Color LocationTexture Shape

Text Component.

O

D

Visual

TextC

class: Polo

RNF

O

D

Visual

TextC

class: Polo

16

XAr

Text Component.

RNF

O

D

Visual

TextC

class: Polo

XAr ZNr Nt “Connector variable”

AthleteHorseGrassTreesSkySaddle

Text Component.

RNF

O

D

Visual

TextC

class: Polo

XAr ZNr Nt “Connector variable”

.

S AthleteHorseGrassTreesSkySaddle

AthleteHorseGrassTreesSkySaddle

VisibleNot visible

“Switch variable”

Horse

Horse

Horse

HorseHorse

Athlete

Horse

RNF

O

D

Visual

TextC

class: Polo

XAr ZNr Nt “Connector variable”

S AthleteHorseGrassTreesSkySaddle

VisibleNot visible

“Switch variable”

T

Horse

.

Visual

Text C

Nr

O

RNF

XAr

NtZ

S

TLearning

Model

Recognition & Experiment

Outline

21

Learning

Exact Exact Inference is Inference is Intractable !Intractable !

Relationship of the random variables

Visual

Text C

Nr

O

RNF

XAr

Nt

Z

S

T

22

Relationship of the random variables

Visual

Text C

Nr

O

RNF

XAr

Nt

Z

S

T

Top-down force

Bottom-up force from visual information

Bottom-up force from text information

Collapsed Gibbs Sampling

(R. Neal, 2000)

Scene/Event imagesfrom the Internet

There is no object-text correspondence…

AthleteHorseGrassTree

Saddle

23

Scene/Event imagesfrom the Internet

Our model builds the correspondence…

C

Nr

O

RNF

XAr

NtZ

S

T

D

AthleteHorseGrassTree

Saddle

24

25

AthleteHorseGrassTreesSkySaddle

AthleteHorseGrassBall

However, a big obstacle is: many objects always co-occur together

??

?

Scene/Event imagesfrom the Internet

26

C

RNF

XAr Nr

ZNt

T

S

O

One solution: some good initialization of O

Grass

Athlete

Horse

AthleteHorseGrassTreesSkySaddle

Scene/Event imagesfrom the Internet

Scene/Event imagesfrom the Internet

27

Initializing O: obtain internet images for each O

Object images

28

Scene/Event images

C

RNF

XAr

Nr ZNt

T

S

O

Any object

detection&

segmentation

Algorithm

D

Initializing O: train an object detector for each O

Object imagesEvent/Scene images

29

Scene/Event images

…Black box

object detection& segmentation

Black box object detection& segmentation

C

RNF

XAr

Nr ZNt

T

S

O

D

Initialize O in the scene image by the trained object detectors

Object imagesEvent/Scene images

Any object

detection&

segmentation

Algorithm

30

Scene/Event images

…Black box

object detection& segmentation

Black box object detection& segmentation

C

RNF

XAr

Nr ZNt

T

S

O

Black box object detection& segmentation

D

Initialize O in the scene image by the trained object detectors

Cao & Fei-Fei, 2007

θ C

XR

O

NrAr

Our Model

Object imagesEvent/Scene images

C

RNF

XAr

Nr ZNt

T

S

O

D

AutoAuto--semi-supervised learning: Small # of initialized images + Large # of uninitialized images

Our Model +Athlete

HorseGrassTree

SaddleWind

Small # of initialized images

AthleteRockGrassTree

SkyRope

AthleteSnow

TreeSky

SnowboardLarge # of uninitialized images

Scene/Event images

Athlete

HorseGrassTree

SaddleWind

AthleteRockGrassTree

SkyRope

AthleteSnow

TreeSky

Snowboard

Large # of uninitialized images

Visual

Text C

Nr

O

RNF

XAr

NtZ

S

T

Learning Model

Recognition & Experiment• Dataset• Learned Model• Results

OutlineSmall # of automatically initialized images

Badminton

Bocce

Croquet

Polo

33

8 Event/Scene Classes

Remark: Tags are not used during testing

Rockclimbing

Rowing

Sailing

Snowboarding

34

8 Event/Scene Classes

35

C

Nr

RNF

XAr

NtZ

S

T

Learned model: O

D

O

36

Athlete

Grass

Horse

C

Nr

O

NF

XAr

NtZ

S

T

D

R

Learned model: R

37

C

Nr

O

RNF

XAr

NtZ

T

D

S

Learned model: S

38

8 way classification: 54%

ClassificationClassification AnnotationAnnotation SegmentationSegmentation

39

ClassificationClassification AnnotationAnnotation SegmentationSegmentation

Alipr: Li et al 03 Corr LDA: Blei et al 03

40

ClassificationClassification AnnotationAnnotation SegmentationSegmentation

Effect of top-down class context

41

Horse

C

O

R X Z

T

S

O

R X Z

T

S

Model w/o top-down class Full Model

Athlete

HorseGrassTree

SaddleWind

AthleteRockGrassTree

SkyRope

AthleteSnow

TreeSky

Snowboard

Large # of uninitialized images

Small # of automatically initialized images

Visual

Text C

Nr

O

RNF

XAr

NtZ

S

T

Sky

Athlete

Tree

Mountain

RockClass:

Rock climbingAthleteMountainTreeRockSkyAscent

Sky

Athlete

Water

Treesailboat

Class: Sailing

AthleteSailboatTreeWaterSkyWind

Learning Model

Recognition & Experiment

Tree

AthleteSnowboard

Snow

Class:

Snowboarding

AthleteSnowboardTreeSnowSkyPowder

ThankProf. Silvio Savarese , Juan Carlos Niebles, Chong Wang, Barry Chai, Min Sun, Bangpeng Yao, Hao Su, Jia Deng, anonymous reviewers

And You

43