Download - Fcv poster parikh

Image Binary descrip0ons Rela0ve descrip0ons

not natural, not open, perspec0ve

more natural than tallbuilding; less natural than forest; more open than tallbuilding; less open than coast; more perspec0ve than tallbuilding;

not natural, not open, perspec0ve

more natural than insidecity; less natural than highway; more open than street; less open than coast; more perspec0ve than highway; less

perspec0ve than insidecity natural, open, perspec0ve

more natural than tallbuilding; less natural than mountain; more open than mountain; less perspec0ve than opencountry;

White, not Smiling, VisibleForehead

more White than AlexRodriguez; more Smiling than JaredLeto; less Smiling than ZacEfron; more VisibleForehead than JaredLeto; less

VisibleForehead than MileyCyrus

White, not Smiling, not VisibleForehead

more White than AlexRodriguez; less White than MileyCyrus; less Smiling than HughLaurie; more VisibleForehead than ZacEfron; less

VisibleForehead than MileyCyrus not Young,

BushyEyebrows, RoundFace

more Young than CliveOwen; less Young than ScarleMJohansson; more BushyEyebrows than ZacEfron; less BushyEyebrows than AlexRodriguez;

more RoundFace than CliveOwen; less RoundFace than ZacEfron

2. Learning Rela-ve A0ributes

Categorical (binary) aMributes are restric0ve and can be unnatural

Rela-ve A0ributes Devi Parikh (TTIC) and Kristen Grauman (UT Aus0n)

6. Datasets Proposed idea: Rela-ve A0ributes  Richer communica0on between

humans and machines  Describe images or categories rela0vely

e.g. “dogs are furrier than giraffes”, “find less congested downtown Chicago scene than

 Learn a ranking func0on for each aMribute

1. Main Idea 4. Rela-ve Zero-‐shot Learning Mo-va-on:

Natural Not Natural ?

Smiling Not Smiling ?

Enables new applica-ons  Novel zero-‐shot learning from aMribute

comparisons  Precise automa0cally generated textual

descrip0ons of images

5. Describing Images Rela-vely { }{ },. . .∼},( ){ ,

. . .�For each aMribute

rm(xi) = wTmxiLearn a scoring func0on

∀(i, j) ∈ Om : wTmxi > wT

mxj ∀(i, j) ∈ Sm : wTmxi = wT

mxj

that best sa0sfies constraints:

Supervision is am, Sm:Om:

Max-‐margin learning to rank formula-on

� ∼ �Young: Smiling: …

Learnt rela-ve a0ributes

Rela-ve a0ributes space

 Training: Images from S seen and descrip0ons of U unseen categories

 Tes0ng: Categorize image into N (=S+U) categories

Young: H C S � � Z M �

Unseen categories

Smiling: Z M � Need not use all aMributes  Need not relate to all S

Smiling

Youth

S

H Z C

M

Auto -‐ generate textual descrip-on of:

�Density: … Learnt rela-ve a0ributes

3. Ranking Func-on vs. Binary Classifier Score

min

�1

2||wT

m||22 + C

��ξ2ij +

�γ2ij

��

s.t wTm(xi − xj) ≥ 1− ξij , ∀(i, j) ∈ Om; |wT

m(xi − xj)| ≤ γij , ∀(i, j) ∈ Sm;

ξij ≥ 0; γij ≥ 0

% correctly ordered pairs Classifier Ranker

Outdoor scenes 80% 89%

Celebrity faces 67% 82%

Rela-ve a0ributes space

Density

1/8 dataset

“more dense than , less dense than

C C H H H C F H H M F F I F

“more dense than Highways, less dense than Forests”

”

Binary RelativeOSR TI S HCOMF

natural 0 0 0 0 1 1 1 1 T≺I∼S≺H≺C∼O∼M∼Fopen 0 0 0 1 1 1 1 0 T∼F≺I∼S≺M≺H∼C∼O

perspective 1 1 1 1 0 0 0 0 O≺C≺M∼F≺H≺I≺S≺Tlarge-objects 1 1 1 0 0 0 0 0 F≺O∼M≺I∼S≺H∼C≺Tdiagonal-plane 1 1 1 1 0 0 0 0 F≺O∼M≺C≺I∼S≺H≺Tclose-depth 1 1 1 1 0 0 0 1 C≺M≺O≺T∼I∼S∼H∼FPubFig ACHJ MS VZ

Masculine-looking 1 1 1 1 0 0 1 1 S≺M≺Z≺V≺J≺A≺H≺CWhite 0 1 1 1 1 1 1 1 A≺C≺H≺Z≺J≺S≺M≺VYoung 0 0 0 0 1 1 0 1 V≺H≺C≺J≺A≺S≺Z≺MSmiling 1 1 1 0 1 1 0 1 J≺V≺H≺A∼C≺S∼Z≺MChubby 1 0 0 0 0 0 0 0 V≺J≺H≺C≺Z≺M≺S≺A

Visible-forehead 1 1 1 0 1 1 1 0 J≺Z≺M≺S≺A∼C∼H∼VBushy-eyebrows 0 1 0 1 0 0 0 0 M≺S≺Z≺V≺H≺A≺C≺JNarrow-eyes 0 1 1 0 0 0 1 1 M≺J≺S≺A≺H≺C≺V≺ZPointy-nose 0 0 1 0 0 0 0 1 A≺C≺J∼M∼V≺S≺Z≺HBig-lips 1 0 0 0 1 1 0 0 H≺J≺V≺Z≺C≺M≺A≺S

Round-face 1 0 0 0 1 1 0 0 H≺V≺J≺C≺Z≺A≺S≺M

0 1 2 3 4 50

20

40

60

80

Accu

racy

# unseen categories

OSR

0 1 2 3 4 50

20

40

60

Accu

racy

# unseen categories

PubFig

DAP SRA Proposed

1 2 5 150

20

40

60

Accu

racy

# labeled pairs

OSR

1 2 5 150

20

40

60

Accu

racy

# labeled pairs

PubFig

DAP SRA Proposed

6 5 4 3 2 10

20

40

60

Accu

racy

# att to describe unseen

OSR

11109 8 7 6 5 4 3 2 10

20

40

60

Accu

racy

# att to describe unseen

PubFig

DAP SRA Proposed

1 2 30

20

40

60

Accu

racy

Looseness of constraints

OSR

1 2 30

20

40

60

Accu

racy

Looseness of constraints

PubFig

DAP SRA Proposed

1 2 30

20

40

60

80

100

# top choices

% c

orre

ct im

age

in to

p ch

oice

s

BinaryRelative

  Outdoor Scene Recogni-on (OSR): 2688 images, 8 categories: coast (C), forest (F), highway (H), inside-‐city (I), mountain (M), open-‐country (O), street (S) and tall-‐building (T), gist features;

  Public Figure Face (PubFig): 800 images, 8 categories: Alex Rodriguez (A), Clive Owen (C), Hugh Laurie (H), Jared Leto (J), Miley Cyrus (M), ScarleM Johansson (S), Viggo Mortensen (V) and Zac Efron (Z), gist and color features

8. Zero-‐shot Learning Results

Whereas conven0onal Binary descrip-on: “not dense”

Not dense: Dense:

Rela-ve descrip-on:

7. Image Descrip-on Results

wb wm

Number of unseen categories

Amt. of labeled data to learn a0ributes

Amount of descrip-on

Quality of descrip-on

Baselines:   Direct AMribute Predic0on (DAP) [Lampert et al. 2009] (binary)

  Classifier instead of ranker (SRA)

min

�1

2||wT

m||22 + C

��ξ2ij +

�γ2ij

��


m(xi − xj)| ≤ γij , ∀(i, j) ∈ Sm;

ξij ≥ 0; γij ≥ 0

min

�1

2||wT

m||22 + C

��ξ2ij +

�γ2ij

��


m(xi − xj)| ≤ γij , ∀(i, j) ∈ Sm;

ξij ≥ 0; γij ≥ 0

Adapted objec0ve from [Joachims, 2002]

How do learned ranking func0ons

differ from classifier outputs?

Infer image category using max-‐likelihood

More smiling than

Less smiling than

More VisFHead than

Less VisFHead than

More chubby than

Less chubby than

Human subject experiment: Which image is?

OSR PubFig

c(x) = argmax�

m

P (acm|x)

classical recogni0on problem binary ~ rela0ve supervision

<< baseline supervision can give unique ordering on all classes

An aMribute is more discrimina0ve when used rela0vely

Rela0ve aMributes jointly carve out space for unseen category

”

? ? ? Example descrip-ons