Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1...

41
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva Ramanan Charless Fowlkes

Transcript of Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1...

Page 1: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

1

Bilinear Classifiers for Visual Recognition

Computational Vision Lab.University of California Irvine

To be presented in NIPS 2009

Hamed Pirsiavash Deva Ramanan Charless Fowlkes

Page 2: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

2

Introduction

ny

nf

nx

Reshape

(:)

Features Feature vector

A window on

input image

x =

...

nynxnf×1

� Linear model for visual recognition

1

Page 3: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

3

Introduction

� Linear classifier

� Learn a template

� Apply it to all possible windows

Template

(Model)Feature

vector

wTx > 0

...

T

...

> 0

Page 4: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

4

Introduction

ny

nf

nx

Features

Reshape

nf

X =

. . ....

nynx×nf

n f

n f

ns := nxny

. . ....

ns×nf

=

. . ....

ns×d

×[... . . .

]

d×nf

W Ws WTf

W =WsWTf

d ≤ min(ns, nf )

d

Page 5: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

5

Introduction

� Motivation for bilinear models

� Reduced rank: less number of parameters

� Better generalization: reduced over-fitting

� Run-time efficiency

� Transfer learning

� Share a subset of parameters between different but related tasks

Page 6: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

6

Outline

� Introduction

� Sliding window classifiers

� Bilinear model and its motivation

� Extension

� Related work

� Experiments� Pedestrian detection

� Human action classification

� Conclusion

Page 7: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

7

Sliding window classifiers

� Extract some visual features from a spatio-temporal window� e.g., histogram of gradients (HOG) in Dalal and Triggs’

method

� Train a linear SVM using annotated positive and negative instances

� Detection: evaluate the model on all possible windows in space-scale domain

� Use convolution since the model is linear

wTx > 0

minw L(w) =1

2wTw +C

∑nmax(0, 1− ynw

Txn)

Page 8: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

8

Sliding window classifiers

Sample image FeaturesSample template W

Page 9: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

9

Bilinear model (Definition)

� Visual data are better modeled as matrices/tensors rather than vectors

� Why not use the matrix structure

Page 10: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

10

Bilinear model (Definition)

ny

nf

nx

Features

Reshape

nf

nf

Tr(

. . ....

T

. . ....

) =

...

T

...

nf

Tr(WTX) = wTx

Feature XModel WTr(WTX) > 0 wTx > 0

X =

. . ....

ns×nf

ns := nxny

w x

Page 11: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

11

Bilinear model (Definition)

� Bilinear model

wTx > 0

W

n f d

n f

Ws WTf

W =WsWTf

f(X) = Tr(WfWTs X)

d ≤ min(ns, nf )

. . ....

ns×nf

=

. . ....

ns×d

×[... . . .

]

d×nf

f(x) > 0

Page 12: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

12

Bilinear model (Definition)

� Bilinear model

wTx > 0

W

n f d

n f

Ws WTf

W =WsWTf

f(X) = Tr(WfWTs X)

d ≤ min(ns, nf )

. . ....

ns×nf

=

. . ....

ns×d

×[... . . .

]

d×nf

Bilinear in Wf and Ws

f(x) > 0

f(X) = Tr(WTs XWf )

Page 13: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

13

Bilinear model (Definition)

� Bilinear model

wTx > 0

W

n f d

n f

Ws WTf

W =WsWTf

f(X) = Tr(WfWTs X)

d ≤ min(ns, nf )

. . ....

ns×nf

=

. . ....

ns×d

×[... . . .

]

d×nf

Bilinear in Wf and Ws

f(x) > 0

f(X) = Tr(WTs XWf )

f(X) = Tr(WTX)Was Linear:

Page 14: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

14

Bilinear model (Learning)

� Linear SVM for a given set of training pairs {Xn, yn}

Regularizer Constraints

Parameter

Objective function

minW L(W ) = 1

2Tr(WTW ) + C

∑nmax(0, 1− ynTr(W

TXn))

Page 15: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

15

Bilinear model (Learning)

� Linear SVM for a given set of training pairs {Xn, yn}

Regularizer Constraints

Parameter

Objective functionW :=WsW

Tf

minW L(W ) = 1

2Tr(WTW ) + C

∑nmax(0, 1− ynTr(W

TXn))

Page 16: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

16

Bilinear model (Learning)

� For Bilinear formulation

� Biconvex so solve by coordinate decent

� By fixing one set of parameters, it’s a typical SVM problem (with a change of basis)

� Use off-the-shelf SVM solver in the loop

minL(Ws,Wf ) =1

2Tr(WfW

Ts WsW

Tf )+C

∑nmax(0, 1−ynTr(WfW

Ts Xn))

Page 17: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

17

Motivation

� Regularization� Similar to PCA, but not orthogonal and learned discriminatively and jointly with the template

� Run-time efficiency� convolutions instead of d nf

d ≤ min(ns, nf )

SubspaceReduced dimensional Template

W =WsWTf

Page 18: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

18

Motivation

� Transfer learning

� Share the subspace between different problems

� e.g human detector and cat detector

� Optimize the summation of all objective functions

� Learn a good subspace using all data

Wf

W =WsWTf

Page 19: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

19

Extension

� Multi-linear

� High-order tensors

� instead of just

� For 1D feature

� Separable filter for (Rank=1)

� Spatio-temporal templates

L(Ws,Wf)

L(Wx,Wy,Wt,Wf)

L(Wx,Wy)

L(Wx,Wy,Wf )

Page 20: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

20

Related work (Rank restriction)

� Bilinear models � Often used in increasing the flexibility; however, we use

them to reduce the parameters.� Mostly used in generative models like density estimation and

we use in classification

� Soft Rank restriction� They used rather than in SVM to

regularize on rank� Convex, but not easy to solve� Decrease summation of eigen values instead of the number of non-zero eigen values (rank)

� Wolf et al (CVPR’07)� Used a formulation similar to ours with hard rank restriction� Showed results only for soft rank restriction� Used it only for one task (Didn’t consider multi-task learning)

Tr(W ) Tr(WTW )

Page 21: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

21

Related work(Transfer learning)

� Dates back to at least Caruana’s work (1997)� We got inspired by their work on multi-task learning

� Worked on: Back-propagation nets and k-nearest neighbor

� Ando and Zhang’s work (2005)� Linear model

� All models share a component in low-dimensional subspace (transfer)

� Use the same number of parameters

Page 22: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

22

Experiments: Pedestrian detection

� Baseline: Dalal and Triggs’ spatio-temporal classifier (ECCV’06)

� Linear SVM on features: (84 for each cell)

� Histogram of gradient (HOG)

� Histogram of optical flow

� Made sure that the spatiotemporal is better than the static one by modifying the learning method

8×8

Page 23: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

23

Experiments: Pedestrian detection

� Dataset: INRIA motion and INRIA static � 3400 video frame pairs

� 3500 static images

� Typical values:�

� Evaluation� Average precision

� Initialize with PCA in feature space

� Ours is 10 times faster

ns = 14× 6, nf = 84, d = 5 or 10

Page 24: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

24

Experiments: Pedestrian detection

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cis

ion

Prec/Rec curve

Bilinear AP = 0.795

Baseline AP = 0.765

PCA AP = 0.698

Page 25: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

25

Experiments: Pedestrian detection

Page 26: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

26

Experiments: Pedestrian detection

Link

Page 27: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

27

Experiments: Human action classification 1

� 1 vs all action templates� Voting:

� A second SVM on confidence values

� Dataset:� UCF Sports Action (CVPR 2008)� They obtained 69.2%� We got 64.8% but

� More classes: 12 classes rather than 9� Smaller dataset: 150 videos rather than 200� Harder evaluation protocol: 2-fold vs. LOOCV� 87 training examples rather than 199 in their case

Page 28: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

28

UCF action Results: PCA (0.444)

Dive−Side

Golf−Back

Golf−Front

Golf−Side

Kick−Front

Kick−Side

Ride−Horse

Run−Side

Skate−Front

Swing−Bench

Swing−Side

Walk−Front

Div

e−Sid

e

Gol

f−Bac

k

Gol

f−Fro

nt

Gol

f−Sid

e

Kick−

Front

Kick−

Side

Rid

e−H

orse

Run

−Sid

e

Skate

−Fro

nt

Swin

g−Ben

ch

Swin

g−Sid

e

Wal

k−Fro

nt

Page 29: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

29

UCF action Results: Linear (0.518)

Dive−Side

Golf−Back

Golf−Front

Golf−Side

Kick−Front

Kick−Side

Ride−Horse

Run−Side

Skate−Front

Swing−Bench

Swing−Side

Walk−Front

Div

e−Sid

e

Gol

f−Bac

k

Gol

f−Fro

nt

Gol

f−Sid

e

Kick−

Front

Kick−

Side

Rid

e−H

orse

Run

−Sid

e

Skate

−Fro

nt

Swin

g−Ben

ch

Swin

g−Sid

e

Wal

k−Fro

nt

Page 30: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

30

UCF action Results: Bilinear (0.648)

Dive−Side

Golf−Back

Golf−Front

Golf−Side

Kick−Front

Kick−Side

Ride−Horse

Run−Side

Skate−Front

Swing−Bench

Swing−Side

Walk−Front

Div

e−Sid

e

Gol

f−Bac

k

Gol

f−Fro

nt

Gol

f−Sid

e

Kick−

Front

Kick−

Side

Rid

e−H

orse

Run

−Sid

e

Skate

−Fro

nt

Swin

g−Ben

ch

Swin

g−Sid

e

Wal

k−Fro

nt

Page 31: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

31

UCF action Results

Dive−Side

Golf−Back

Golf−Front

Golf−Side

Kick−Front

Kick−Side

Ride−Horse

Run−Side

Skate−Front

Swing−Bench

Swing−Side

Walk−Front

Div

e−Sid

e

Gol

f−Bac

k

Gol

f−Fro

nt

Gol

f−Sid

e

Kick−

Front

Kick−

Side

Rid

e−H

orse

Run

−Sid

e

Skate

−Fro

nt

Swin

g−Ben

ch

Swin

g−Sid

e

Wal

k−Fro

nt

Dive−Side

Golf−Back

Golf−Front

Golf−Side

Kick−Front

Kick−Side

Ride−Horse

Run−Side

Skate−Front

Swing−Bench

Swing−Side

Walk−Front

Div

e−Sid

e

Gol

f−Bac

k

Gol

f−Fro

nt

Gol

f−Sid

e

Kick−

Front

Kick−

Side

Rid

e−H

orse

Run

−Sid

e

Skate

−Fro

nt

Swin

g−Ben

ch

Swin

g−Sid

e

Wal

k−Fro

nt

Dive−Side

Golf−Back

Golf−Front

Golf−Side

Kick−Front

Kick−Side

Ride−Horse

Run−Side

Skate−Front

Swing−Bench

Swing−Side

Walk−Front

Div

e−Sid

e

Gol

f−Bac

k

Gol

f−Fro

nt

Gol

f−Sid

e

Kick−

Front

Kick−

Side

Rid

e−H

orse

Run

−Sid

e

Skate

−Fro

nt

Swin

g−Ben

ch

Swin

g−Sid

e

Wal

k−Fro

nt

PCA on features (0.444)Linear (0.518)

(Not always feasible)Bilinear (0.648)

Page 32: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

32

UCF action Results

Page 33: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

33

Experiments: Human action classification 2

� Transfer

� We used only two examples for each of 12 action classes

� Once trained independently

� Then trained jointly

� Shared the subspace

� Adjusted the C parameter for best result

Page 34: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

34

Transfer

PCA to initialize

TrainTrain W 1

s Train W 2

sWms

Train W 1

f Train TrainW 2

fWmf

W =WsWTf

Page 35: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

35

Transfer

PCA to initialize

TrainTrain W 1

s Train W 2

sWms

Train W 1

f Train TrainW 2

fWmfTrain Wf

W =WsWTf

minWf

∑m

i=1 L(Wf ,Wis)

Page 36: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

36

Results: Transfer

0.3560.269Joint

bilinear (C=.1)

0.2890.222Independent bilinear (C=.01)

Coordinate decent

iteration 2

Coordinate decent

iteration 1

Average classification rate

Page 37: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

37

Results: Transfer (for “walking”)

Iteration 1 Refined at Iteration 2

Page 38: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

38

Conclusion

� Introduced multi-linear classifiers� Exploit natural matrix/tensor representation of spatio-

temporal data

� Trained with existing efficient linear solvers

� Shared subspace for different problems� A novel form of transfer learning

� Got better performance and about 10X speed up in run-time compared to the linear classifier.

� Easy to apply to most high dimensional features (instead of dimensionality reduction methods like PCA)

� Simple: ~ 20 lines of Matlab code

Page 39: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

39

Thanks!

Page 40: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

40

Bilinear model (Learning details)

� Linear SVM for a given set of training pairs

� For Bilinear formulation

� It is biconvex so solve by coordinate decent

{xn, yn}

minW L(W ) = 1

2Tr(WTW ) + C

∑nmax(0, 1− ynTr(W

TXn))

minL(Wf ,Ws) =1

2Tr(WfW

Ts WsW

Tf )+C

∑nmax(0, 1−ynTr(WfW

Ts Xn))

Page 41: Bilinear Classifiers for Visual Recognitionpeople.csail.mit.edu/hpirsiav/papers/Bilinear_nips09...1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University

41

Bilinear model (Learning details)

� Each coordinate descent iteration:

freeze

where

then freeze

where

minW̃fL(W̃f ,Ws) =

1

2(W̃ T

f W̃f ) + C∑

nmax(0, 1− ynTr(W̃Tf X̃n))

minW̃sL(Wf , W̃s) =

1

2(W̃ T

s W̃s) + C∑

nmax(0, 1− ynTr(W̃Ts X̃n))

Ws

Wf

W̃f = A1

2

sWTf , X̃n = A

−1

2

s WTs Xn, As = WT

s Ws

W̃s = WsA1

2

f , X̃n = XnWfA−1

2

f , Af = WTf Wf