Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... ·...

142
Loss-based Learning with Weak Supervision M. Pawan Kumar

Transcript of Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... ·...

Page 1: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Loss-based Learning with Weak Supervision

M. Pawan Kumar

Page 2: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Segmentation

Information

Log

(Siz

e)

~ 2000

Computer Vision Data

Page 3: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Segmentation

Log

(Siz

e)

~ 2000

Information

Bounding Box

~ 1 M

Computer Vision Data

Page 4: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Segmentation

Log

(Siz

e)

Bounding Box Image-Level ~ 2000

~ 1 M > 14 M

“Car” “Chair” Information

Computer Vision Data

Page 5: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Segmentation

Log

(Siz

e)

Image-Level Noisy Label

~ 2000

> 14 M

> 6 B

Information

Bounding Box

~ 1 M

Computer Vision Data

Page 6: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Learn with missing information (latent variables)

Detailed annotation is expensive

Sometimes annotation is impossible

Desired annotation keeps changing

Computer Vision Data

Page 7: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Two Types of Problems

•  Part I – Annotation Mismatch

•  Part II – Output Mismatch

Outline

Page 8: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Annotation Mismatch

Input x

Annotation y

Latent h

x

y = “jumping”

h

Action Classification

Mismatch between desired and available annotations

Exact value of latent variable is not “important”

Desired output during test time is y

Page 9: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Output Mismatch

Input x

Annotation y

Latent h

x

y = “jumping”

h

Action Classification

Page 10: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Output Mismatch

Input x

Annotation y

Latent h

x

y = “jumping”

h

Action Detection

Mismatch between output and available annotations

Exact value of latent variable is important

Desired output during test time is (y,h)

Page 11: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Part I

Page 12: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Latent SVM

•  Optimization

•  Practice

•  Extensions

Outline – Annotation Mismatch

Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009

Page 13: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Weakly Supervised Data

Input x

Output y ∈ {-1,+1}

Hidden h

x

y = +1

h

Page 14: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Weakly Supervised Classification

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,y,h)

x

y = +1

h

Page 15: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Weakly Supervised Classification

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,+1,h) Φ(x,h)

0 =

x

y = +1

h

Page 16: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Weakly Supervised Classification

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,-1,h) 0

Φ(x,h) =

x

y = +1

h

Page 17: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Weakly Supervised Classification

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,y,h)

Score f : Ψ(x,y,h) è (-∞, +∞)

Optimize score over all possible y and h

x

y = +1

h

Page 18: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Scoring function

wTΨ(x,y,h)

Prediction

y(w),h(w) = argmaxy,h wTΨ(x,y,h)

Latent SVM

Parameters

Page 19: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Learning Latent SVM

Δ(yi, yi(w)) Σi

Empirical risk minimization

minw

No restriction on the loss function

Annotation mismatch

Training data {(xi,yi), i = 1,2,…,n}

Page 20: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Learning Latent SVM

Δ(yi, yi(w)) Σi

Empirical risk minimization

minw

Non-convex

Parameters cannot be regularized

Find a regularization-sensitive upper bound

Page 21: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Learning Latent SVM

- wTΨ(xi,yi(w),hi(w))

Δ(yi, yi(w)) wTΨ(xi,yi(w),hi(w)) +

Page 22: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Learning Latent SVM

Δ(yi, yi(w)) wTΨ(xi,yi(w),hi(w)) +

- maxhi wTΨ(xi,yi,hi)

y(w),h(w) = argmaxy,h wTΨ(x,y,h)

Page 23: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Learning Latent SVM

Δ(yi, y) wTΨ(xi,y,h) + maxy,h

- maxhi wTΨ(xi,yi,hi) ≤ ξi

minw ||w||2 + C Σiξi

Parameters can be regularized

Is this also convex?

Page 24: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Learning Latent SVM

Δ(yi, y) wTΨ(xi,y,h) + maxy,h

- maxhi wTΨ(xi,yi,hi) ≤ ξi

minw ||w||2 + C Σiξi

Convex Convex -

Difference of convex (DC) program

Page 25: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

minw ||w||2 + C Σiξi

wTΨ(xi,y,h) + Δ(yi,y) - maxhi wTΨ(xi,yi,hi) ≤ ξi

Scoring function

wTΨ(x,y,h)

Prediction

y(w),h(w) = argmaxy,h wTΨ(x,y,h)

Learning

Recap

Page 26: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Latent SVM

•  Optimization

•  Practice

•  Extensions

Outline – Annotation Mismatch

Page 27: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Learning Latent SVM

Δ(yi, y) wTΨ(xi,y,h) + maxy,h

- maxhi wTΨ(xi,yi,hi) ≤ ξi

minw ||w||2 + C Σiξi

Difference of convex (DC) program

Page 28: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Concave-Convex Procedure

+

Δ(yi, y) wTΨ(xi,y,h) +

maxy,h

wTΨ(xi,yi,hi)

- maxhi

Linear upper-bound of concave part

Page 29: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Concave-Convex Procedure

+

Δ(yi, y) wTΨ(xi,y,h) +

maxy,h

wTΨ(xi,yi,hi)

- maxhi

Optimize the convex upper bound

Page 30: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Concave-Convex Procedure

+

Δ(yi, y) wTΨ(xi,y,h) +

maxy,h

wTΨ(xi,yi,hi)

- maxhi

Linear upper-bound of concave part

Page 31: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Concave-Convex Procedure

+

Δ(yi, y) wTΨ(xi,y,h) +

maxy,h

wTΨ(xi,yi,hi)

- maxhi

Until Convergence

Page 32: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Concave-Convex Procedure

+

Δ(yi, y) wTΨ(xi,y,h) +

maxy,h

wTΨ(xi,yi,hi)

- maxhi

Linear upper bound?

Page 33: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Linear Upper Bound

- maxhi wTΨ(xi,yi,hi)

-wTΨ(xi,yi,hi*)

hi* = argmaxhi wt

TΨ(xi,yi,hi)

Current estimate = wt

≥ - maxhi wTΨ(xi,yi,hi)

Page 34: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

CCCP for Latent SVM Start with an initial estimate w0

Update

Update wt+1 as the ε-optimal solution of

min ||w||2 + C∑i ξi

wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi

hi* = argmaxhi∈H wtTΨ(xi,yi,hi)

Repeat until convergence

Page 35: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Latent SVM

•  Optimization

•  Practice

•  Extensions

Outline – Annotation Mismatch

Page 36: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Action Classification Input x

Output y = “Using Computer”

PASCAL VOC 2011

80/20 Train/Test Split 5 Folds

Jumping

Phoning

Playing Instrument

Reading

Riding Bike

Riding Horse

Running

Taking Photo

Using Computer

Walking

Train Input xi Output yi

Page 37: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  0-1 loss function

•  Poselet-based feature vector

•  4 seeds for random initialization

•  Code + Data

•  Train/Test scripts with hyperparameter settings

Setup

http://www.centrale-ponts.fr/tutorials/cvpr2013/

Page 38: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Objective

Page 39: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Train Error

Page 40: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Test Error

Page 41: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Time

Page 42: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Latent SVM

•  Optimization

•  Practice –  Annealing the Tolerance –  Annealing the Regularization –  Self-Paced Learning –  Choice of Loss Function

•  Extensions

Outline – Annotation Mismatch

Page 43: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Start with an initial estimate w0

Update

Update wt+1 as the ε-optimal solution of

min ||w||2 + C∑i ξi

wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi

hi* = argmaxhi∈H wtTΨ(xi,yi,hi)

Repeat until convergence

Overfitting in initial iterations

Page 44: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Repeat until convergence

ε’ = ε/K

and ε’ = ε

Start with an initial estimate w0

Update

Update wt+1 as the ε’-optimal solution of

min ||w||2 + C∑i ξi

wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi

hi* = argmaxhi∈H wtTΨ(xi,yi,hi)

Page 45: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Objective

Page 46: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Objective

Page 47: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Train Error

Page 48: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Train Error

Page 49: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Test Error

Page 50: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Test Error

Page 51: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Time

Page 52: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Time

Page 53: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Latent SVM

•  Optimization

•  Practice –  Annealing the Tolerance –  Annealing the Regularization –  Self-Paced Learning –  Choice of Loss Function

•  Extensions

Outline – Annotation Mismatch

Page 54: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Start with an initial estimate w0

Update

Update wt+1 as the ε-optimal solution of

min ||w||2 + C∑i ξi

wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi

hi* = argmaxhi∈H wtTΨ(xi,yi,hi)

Repeat until convergence

Overfitting in initial iterations

Page 55: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Repeat until convergence

C’ = C x K

and C’ = C

Start with an initial estimate w0

Update

Update wt+1 as the ε-optimal solution of

min ||w||2 + C’∑i ξi

wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi

hi* = argmaxhi∈H wtTΨ(xi,yi,hi)

Page 56: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Objective

Page 57: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Objective

Page 58: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Train Error

Page 59: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Train Error

Page 60: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Test Error

Page 61: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Test Error

Page 62: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Time

Page 63: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Time

Page 64: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Latent SVM

•  Optimization

•  Practice –  Annealing the Tolerance –  Annealing the Regularization –  Self-Paced Learning –  Choice of Loss Function

•  Extensions

Outline – Annotation Mismatch

Kumar, Packer and Koller, NIPS 2010

Page 65: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Math is for losers !!

FAILURE … BAD LOCAL MINIMUM

CCCP for Human Learning

Page 66: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Euler was a Genius!!

SUCCESS … GOOD LOCAL MINIMUM

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Self-Paced Learning

Page 67: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Start with “easy” examples, then consider “hard” ones

Easy vs. Hard

Expensive

Easy for human ≠ Easy for machine

Self-Paced Learning

Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances

Page 68: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Start with an initial estimate w0

Update

Update wt+1 as the ε-optimal solution of

min ||w||2 + C∑i ξi

wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi

hi* = argmaxhi∈H wtTΨ(xi,yi,hi)

CCCP for Latent SVM

Page 69: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

min ||w||2 + C∑i ξi

wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y, h) - ξi

Self-Paced Learning

Page 70: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

min ||w||2 + C∑i viξi

wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y, h) - ξi

vi ∈ {0,1}

Trivial Solution

Self-Paced Learning

Page 71: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

vi ∈ {0,1}

Large K Medium K Small K

min ||w||2 + C∑i viξi - ∑ivi/K

wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y, h) - ξi

Self-Paced Learning

Page 72: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

vi ∈ [0,1]

min ||w||2 + C∑i viξi - ∑ivi/K

wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y, h) - ξi

Large K Medium K Small K

Biconvex Problem

Alternating Convex Search

Self-Paced Learning

Page 73: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Start with an initial estimate w0

Update

min ||w||2 + C∑i ξi - ∑i vi/K

wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi

hi* = argmaxhi∈H wtTΨ(xi,yi,hi)

Decrease K ← K/µ

SPL for Latent SVM

Update wt+1 as the ε-optimal solution of

Page 74: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Objective

Page 75: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Objective

Page 76: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Train Error

Page 77: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Train Error

Page 78: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Test Error

Page 79: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Test Error

Page 80: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Time

Page 81: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Time

Page 82: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Latent SVM

•  Optimization

•  Practice –  Annealing the Tolerance –  Annealing the Regularization –  Self-Paced Learning –  Choice of Loss Function

•  Extensions

Outline – Annotation Mismatch

Behl, Jawahar and Kumar, In Preparation

Page 83: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Ranking Rank 1 Rank 2 Rank 3

Rank 4 Rank 5 Rank 6

Average Precision = 1

Page 84: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Ranking Rank 1 Rank 2 Rank 3

Rank 4 Rank 5 Rank 6

Average Precision = 1 Accuracy = 1 Average Precision = 0.92 Accuracy = 0.67 Average Precision = 0.81

Page 85: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Ranking

During testing, AP is frequently used

During training, a surrogate loss is used

Contradictory to loss-based learning

Optimize AP directly

Page 86: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Results

Statistically significant improvement

Page 87: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Speed – Proximal Regularization Start with an good initial estimate w0

Update

Update wt+1 as the ε-optimal solution of

min ||w||2 + C∑i ξi + Ct ||w - wt||2

wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi

hi* = argmaxhi∈H wtTΨ(xi,yi,hi)

Repeat until convergence

Page 88: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Speed – Cascades

Weiss and Taskar, AISTATS 2010 Sapp, Toshev and Taskar, ECCV 2010

Page 89: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Accuracy – (Self) Pacing

Pacing the sample complexity – NIPS 2010

Pacing the model complexity

Pacing the problem complexity

Page 90: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Building Accurate Systems

Model

Inference

Learning

85%

5%

10%

Learning cannot provide huge gains without a good model

Inference cannot provide huge gains without a good model

Page 91: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Latent SVM

•  Optimization

•  Practice

•  Extensions –  Latent Variable Dependent Loss –  Max-Margin Min-Entropy Models

Outline – Annotation Mismatch

Yu and Joachims, ICML 2009

Page 92: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Latent Variable Dependent Loss

- wTΨ(xi,yi(w),hi(w))

Δ(yi, yi(w), hi(w)) wTΨ(xi,yi(w),hi(w)) +

Page 93: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Latent Variable Dependent Loss

Δ(yi, yi(w), hi(w)) wTΨ(xi,yi(w),hi(w)) +

- maxhi wTΨ(xi,yi,hi)

y(w),h(w) = argmaxy,h wTΨ(x,y,h)

Page 94: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Latent Variable Dependent Loss

Δ(yi, y, h) wTΨ(xi,y,h) + maxy,h

- maxhi wTΨ(xi,yi,hi) ≤ ξi

minw ||w||2 + C Σiξi

Page 95: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Optimizing Precision@k

Input X = {xi, i = 1, …, n}

Annotation Y = {yi, i = 1, …, n} ∈ {-1,+1}n

Latent H = ranking

Δ(Y*, Y, H)

1-Precision@k

Page 96: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Latent SVM

•  Optimization

•  Practice

•  Extensions –  Latent Variable Dependent Loss –  Max-Margin Min-Entropy (M3E) Models

Outline – Annotation Mismatch

Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012

Page 97: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Running vs. Jumping Classification

0.00 0.00 0.25 0.00 0.25 0.00 0.25 0.00 0.00

Score wTΨ(x,y,h) è (-∞, +∞)

wTΨ(x,y1,h)

Page 98: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

0.00 0.24 0.00 0.00 0.00 0.00 0.01 0.00 0.00

wTΨ(x,y2,h) 0.00 0.00 0.25 0.00 0.25 0.00 0.25 0.00 0.00

wTΨ(x,y1,h)

Only maximum score used

No other useful cue?

Score wTΨ(x,y,h) è (-∞, +∞)

Uncertainty in h

Running vs. Jumping Classification

Page 99: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Scoring function

Pw(y,h|x) = exp(wTΨ(x,y,h))/Z(x)

Prediction

y(w) = argminy Hα(Pw(h|y,x)) – log Pw(y|x)

Partition Function

Marginalized Probability

Rényi Entropy

Rényi Entropy of Generalized Distribution Gα(y;x,w)

M3E

Page 100: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Gα(y;x,w) = 1

1-α log

Σh Pw(y,h|x)α

Σh Pw(y,h|x)

α = 1. Shannon Entropy of Generalized Distribution

- Σh Pw(y,h|x) log(Pw(y,h|x))

Σh Pw(y,h|x)

Rényi Entropy

Page 101: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Gα(y;x,w) = 1

1-α log

Σh Pw(y,h|x)α

Σh Pw(y,h|x)

α = Infinity. Minimum Entropy of Generalized Distribution

- maxh log(Pw(y,h|x))

Rényi Entropy

Page 102: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Gα(y;x,w) = 1

1-α log

Σh Pw(y,h|x)α

Σh Pw(y,h|x)

α = Infinity. Minimum Entropy of Generalized Distribution

- maxh wTΨ(x,y,h)

Same prediction as latent SVM

Rényi Entropy

Page 103: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Training data {(xi,yi), i = 1,2,…,n}

Highly non-convex in w

Cannot regularize w to prevent overfitting

w* = argminw Σi Δ(yi,yi(w))

Learning M3E

Page 104: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Δ(yi,yi(w)) Gα(yi(w);xi,w) +

Training data {(xi,yi), i = 1,2,…,n}

- Gα(yi(w);xi,w)

Δ(yi,yi(w)) ≤ Gα(yi;xi,w) + - Gα(yi(w);xi,w)

maxy{Δ(yi,y) ≤ Gα(yi;xi,w) + - Gα(y;xi,w)}

Learning M3E

Page 105: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Training data {(xi,yi), i = 1,2,…,n}

minw ||w||2 + C Σiξi

Gα(yi;xi,w) + Δ(yi,y) – Gα(y;xi,w) ≤ ξi

When α tends to infinity, M3E = Latent SVM

Other values can give better results

Learning M3E

Page 106: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Motif + Markov Background Model. Yu and Joachims, 2009

Motif Finding Results

Page 107: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Part II

Page 108: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Output Mismatch

Input x

Annotation y

Latent h

x

y = “jumping”

h

Action Detection

Mismatch between output and available annotations

Exact value of latent variable is important

Desired output during test time is (y,h)

Page 109: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Problem Formulation

•  Dissimilarity Coefficient Learning

•  Optimization

•  Experiments

Outline – Output Mismatch

Page 110: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Weakly Supervised Data

Input x

Output y ∈ {0,1,…,C}

Hidden h

x

y = 0

h

Page 111: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Weakly Supervised Detection

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,y,h)

x

y = 0

h

Page 112: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Weakly Supervised Detection

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,0,h) Φ(x,h)

0 =

x

y = 0

h

0

.

.

.

Page 113: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Weakly Supervised Detection

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,1,h) 0

Φ(x,h) =

x

y = 0

h

0

.

.

.

Page 114: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Weakly Supervised Detection

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,C,h) 0

0 =

x

y = 0

h

Φ(x,h)

.

.

.

Page 115: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Weakly Supervised Detection

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,y,h)

Score f : Ψ(x,y,h) è (-∞, +∞)

Optimize score over all possible y and h

x

y = 0

h

Page 116: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Scoring function

wTΨ(x,y,h)

Prediction

y(w),h(w) = argmaxy,h wTΨ(x,y,h)

Linear Model

Parameters

Page 117: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Minimizing General Loss

minw Σi Δ(yi,hi,yi(w),hi(w))

Unknown latent variable values

Supervised Samples

+ Σi Δ’(yi,yi(w),hi(w)) Weakly

Supervised Samples

Page 118: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Minimizing General Loss

minw Σi Δ(yi,hi,yi(w),hi(w))

A single distribution to achieve two objectives

Pw(hi|xi,yi) Σhi

Page 119: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Problem Formulation

•  Dissimilarity Coefficient Learning

•  Optimization

•  Experiments

Outline – Output Mismatch

Kumar, Packer and Koller, ICML 2012

Page 120: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Problem

Model Uncertainty in Latent Variables

Model Accuracy of Latent Variable Predictions

Page 121: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Solution

Model Uncertainty in Latent Variables

Model Accuracy of Latent Variable Predictions

Use two different distributions for the two different tasks

Page 122: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Solution

Model Accuracy of Latent Variable Predictions

Use two different distributions for the two different tasks

Pθ(hi|yi,xi)

hi

Page 123: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Solution Use two different distributions for the two different tasks

hi

Pw(yi,hi|xi)

(yi,hi) (yi(w),hi(w))

Pθ(hi|yi,xi)

Page 124: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

The Ideal Case No latent variable uncertainty, correct prediction

hi

Pw(yi,hi|xi)

(yi,hi) (yi,hi(w))

Pθ(hi|yi,xi)

hi(w)

Page 125: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

In Practice Restrictions in the representation power of models

hi

Pw(yi,hi|xi)

(yi,hi) (yi(w),hi(w))

Pθ(hi|yi,xi)

Page 126: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Our Framework Minimize the dissimilarity between the two distributions

hi

Pw(yi,hi|xi)

(yi,hi) (yi(w),hi(w))

Pθ(hi|yi,xi)

User-defined dissimilarity measure

Page 127: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Our Framework Minimize Rao’s Dissimilarity Coefficient

hi

Pw(yi,hi|xi)

(yi,hi) (yi(w),hi(w))

Pθ(hi|yi,xi)

Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)

Page 128: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Our Framework Minimize Rao’s Dissimilarity Coefficient

hi

Pw(yi,hi|xi)

(yi,hi) (yi(w),hi(w))

Pθ(hi|yi,xi)

- β Σh,h’ Δ(yi,h,yi,h’)Pθ(h|yi,xi)Pθ(h’|yi,xi)

Hi(w,θ)

Page 129: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Our Framework Minimize Rao’s Dissimilarity Coefficient

hi

Pw(yi,hi|xi)

(yi,hi) (yi(w),hi(w))

Pθ(hi|yi,xi)

- (1-β) Δ(yi(w),hi(w),yi(w),hi(w))

- β Hi(θ,θ) Hi(w,θ)

Page 130: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Our Framework Minimize Rao’s Dissimilarity Coefficient

hi

Pw(yi,hi|xi)

(yi,hi) (yi(w),hi(w))

Pθ(hi|yi,xi)

- β Hi(θ,θ) Hi(w,θ) minw,θ Σi

Page 131: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Problem Formulation

•  Dissimilarity Coefficient Learning

•  Optimization

•  Experiments

Outline – Output Mismatch

Page 132: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Optimization

minw,θ Σi Hi(w,θ) - β Hi(θ,θ)

Initialize the parameters to w0 and θ0

Repeat until convergence

End

Fix w and optimize θ

Fix θ and optimize w

Page 133: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Optimization of θ

minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)

hi

Pθ(hi|yi,xi)

Case I: yi(w) = yi

hi(w)

Page 134: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Optimization of θ

minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)

hi

Pθ(hi|yi,xi)

Case I: yi(w) = yi

hi(w)

Page 135: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Optimization of θ

minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)

hi

Pθ(hi|yi,xi)

Case II: yi(w) ≠ yi

Page 136: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Optimization of θ

minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)

hi

Pθ(hi|yi,xi)

Case II: yi(w) ≠ yi

Stochastic subgradient descent

Page 137: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Optimization of w

minw Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)

Expected loss, models uncertainty

Form of optimization similar to Latent SVM

Δ independent of h, implies latent SVM

Concave-Convex Procedure (CCCP)

Page 138: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

•  Problem Formulation

•  Dissimilarity Coefficient Learning

•  Optimization

•  Experiments

Outline – Output Mismatch

Page 139: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Action Detection Input x

Output y = “Using Computer” Latent Variable h

PASCAL VOC 2011

60/40 Train/Test Split 5 Folds

Jumping

Phoning

Playing Instrument

Reading

Riding Bike

Riding Horse

Running

Taking Photo

Using Computer

Walking

Train Input xi Output yi

Page 140: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Results – 0/1 Loss

0

0.2

0.4

0.6

0.8

1

1.2

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5

Average Test Loss

LSVM Our

Statistically Significant

Page 141: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Results – Overlap Loss

0.63 0.64 0.65 0.66 0.67 0.68 0.69

0.7 0.71 0.72 0.73 0.74

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5

Average Test Loss

LSVM Our

Statistically Significant

Page 142: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)

Questions?

http://www.centrale-ponts.fr/personnel/pawan