Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of...

47
Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania

Transcript of Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of...

Page 1: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Structured Prediction CascadesBen Taskar

David Weiss, Ben Sapp, Alex Toshev

University of Pennsylvania

Page 2: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Supervised Learning

Learn from

• Regression:

• Binary Classification:

• Multiclass Classification:

• Structured Prediction:

Page 3: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Handwriting Recognition

`structured’

x y

Page 4: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Machine Translation

x y

‘Ce n'est pas un autreproblème de classification.’

‘This is not another classification problem.’

Page 5: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Pose Estimation

x y

© Arthur Gretton

Page 6: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Structured Models

space of feasible outputs

scoring function

parts = cliques, productions

Complexity of inference depends on “part” structure

Page 7: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Supervised Structured Prediction

Learning Prediction

Discriminative estimation of θ

Data

Model:

Intractable/impracticalfor complex models

Intractable/impractical for complex models

3rd order OCR model = 1 mil states * lengthBerkeley English grammar = 4 mil productions * length^3Tree-based pose model = 100 mil states * # joints

Page 8: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Approximation vs. Computation

• Usual trade-off: approximation vs. estimation – Complex models need more data– Error from over-fitting

• New trade-off: approximation vs. computation– Complex models need more time/memory – Error from inexact inference

• This talk: enable complex models via cascades

Page 9: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

An Inspiration: Viola Jones Face Detector

Scanning window at every location, scale and orientation

Page 10: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Classifier Cascade

• Most patches are non-face• Filter out easy cases fast!!• Simple features first• Low precision, high recall• Learned layer-by-layer • Next layer more complex

C1

C2

C3

Cn

Non-face

Non-face

Non-face

Non-face Face

Page 11: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Related Work

• Global Thresholding and Multiple-Pass Parsing. J Goodman, 97• A maximum-entropy-inspired parser. E. Charniak, 00.• Coarse-to-fine n-best parsing and MaxEnt discriminative

reranking, E Charniak & M Johnson, 05• TAG, dynamic pro- gramming, and the perceptron for efficient,

feature-rich parsing, X. Carreras, M. Collins, and T. Koo, 08• Coarse-to-Fine Natural Language Processing, S. Petrov, 09• Coarse-to-fine face detection. Fleuret, F., Geman, D, 01• Robust real-time object detection. Viola, P., Jones, M, 02• Progressive search space reduction for human pose estimation,

Ferrari, V., Marin-Jimenez, M., Zisserman, A, 08

Page 12: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

What’s a Structured Cascade?

• What to filter?– Clique assignments

• What are the layers?– Higher order models– Higher resol. models

• How to learn filters?– Novel convex loss– Simple online algorithm– Generalization bounds

F1

F2

F3

Fn

??????

??????

??????

?????? ‘structured’

Page 13: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Trade-off in Learning Cascades

• Accuracy: Minimize the number of errors incurred by each level

• Efficiency: Maximize the number of filtered assignments at each level

Filter 1

Filter 2

Filter D

Predict

Page 14: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Max-marginals (Sequences)

a

b

c

d

a

b

c

d

a

b

c

d

a

b

c

d

Score of an output:

Computemax (*bc*)

Max marginal:

Page 15: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Filtering with Max-marginals

• Set threshold• Filter clique assignment if

a

b

c

d

a

b

c

d

a

b

c

d

a

b

c

d

Page 16: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Filtering with Max-marginals

• Set threshold• Filter clique assignment if

a

b

c

d

a

b

c

d

a

b

c

d

a

b

c

dRemove edge

bc

Page 17: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Why Max-marginals?

• Valid path guarantee– Filtering leaves at least one valid global assignment

• Faster inference – No exponentiation, O(k) vs. O(k log k) in some cases

• Convex estimation– Simple stochastic subgradient algorithm

• Generalization bounds for error/efficiency– Guarantees on expected trade-off

Page 18: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Choosing a Threshold

• Threshold must be specific to input x– Max-marginal scores are not normalized

• Keep top-K max-marginal assignments?– Hard(er) to handle and analyze

• Convex alternative: max-mean-max function:

max score mean max marginalm = #edges

Page 19: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Choosing a Threshold (cont)

Max marginal scores

Mean max marginalMax score

Score of truth

Range of possible thresholdsα = efficiency level

Page 20: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Example OCR Cascade

a b c d e f g h i j k l …144 269 -137 80 52 -42 49 360 -27 -774 368 -24

Mean ≈ 0 Max = 368 α = 0.55 Threshold ≈ 204

a c d e f g i j l144 -137 80 52 -42 49 -27 -774 -24

b h k …b h k

Page 21: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Example OCR Cascade

b h k

a e n u r

a b d g

a c e g n o u

a e m n u w

b h k

Page 22: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

b h k

Example OCR Cascade

a e n u r

a b d g

a c e g n o u

▪b ▪h ▪k

ba ha ka be he …

aa ab ad ag ea …

aa ac ae ag an …

a e m n u waa ae am an au …

Page 23: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Example OCR Cascade

▪b ▪h ▪k

ba ha ka be he …

aa ab ad ag ea …

aa ac ae ag an …

aa ae am an au …

1440 1558 1575

1440 1558 1575 1413 1553

1400 1480 1575 1397 1393

1257 1285 1302 1294 1356

1336 1306 1390 1346 1306

Mean ≈ 1412

Max = 1575

α = 0.44

Threshold ≈ 1502

Page 24: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

▪b

ba be

aa ab ag ea

aa ac ae ag an

aa ae am an au

▪b

ba be

aa ab ag ea

aa ac ae ag an

aa ae am an au

Example OCR Cascade

▪h ▪k

ha ka he …

ad …

1440 1558 1575

1440 1558 1575 1413 1553

1400 1480 1575 1397 1393

1257 1285 1302 1294 1356

1336 1306 1390 1346 1306

Mean ≈ 1412

Max = 15751440

1440 1413

1400 1480 1397 1393

1257 1285 1302 1294 1356

1336 1306 1390 1346 1306

α = 0.44

Threshold ≈ 1502

Page 25: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Example OCR Cascade

ka

▪h ▪k

ha he …

ad …

Mean ≈ 1412

Max = 1575▪h ▪k

ha ka he ke

ad ed

kn

do

ow om

nd

α = 0.44

Threshold ≈ 1502

Page 26: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Example OCR Cascade

▪h ▪k

ha ka he ke

ad ed

kn

do

ow om

nd

▪▪h ▪▪k

▪ha ▪ka ▪he ▪ke

had kad hed ked

ado edo ndo

dow dom

Page 27: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

10x10x12 10x10x24 40x40x24 80x80x24 110×122×24

Example: Pictorial Structure Cascade

Upperarm

Lowerarm

H

T ULA

LLA

URA

LRA

k ≈ 320,000k2 ≈ 100mil

Page 28: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

• Filter loss

If score(truth) > threshold, all true states are safe• Efficiency loss

Proportion of unfiltered clique assignments

Quantifying Loss

Page 29: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Learning One Cascade Level

• Fix α, solve convex problem for θ

Minimize filter mistakes at efficiency level α

No filter mistakes Margin w/ slack

Page 30: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

An Online Algorithm

• Stochastic sub-gradient update:

Features of truth

Convex combination: Features of best guess+ Average features of max marginal “witnesses”

Page 31: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Generalization Bounds

• W.h.p. (1-δ), filtering and efficiency loss observed on training set generalize to new data

Expected loss on true distribution

Page 32: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

)-ramp(

Generalization Bounds

• W.h.p. (1-δ), filtering and efficiency loss observed on training set generalize to new data

-Empirical -ramp upper bound

0

1

Page 33: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Generalization Bounds

• W.h.p. (1-δ), filtering and efficiency loss observed on training set generalize to new data

n number of examples m number of clique assignments number of cliquesB

Page 34: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Generalization Bounds

• W.h.p. (1-δ), filtering and efficiency loss observed on training set generalize to new data

Similar bound holds for Le and all ® 2 [0,1]

Page 35: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

OCR Experiments

Dataset: http://www.cis.upenn.edu/~taskar/ocr

Character Error Word Error0

10

20

30

40

50

60

70

80

22.5

73.35

14.31

50.56

12.05

26.17

7.7515.54

1st order2nd order3rd order4rd order

% E

rror

Page 36: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Efficiency Experiments

• POS Tagging (WSJ + CONLL datasets)– English, Bulgarian, Portuguese

• Compare 2nd order model filters– Structured perceptron (max-sum)– SCP w/ ® 2 [0, 0.2,0.4,0.6,0.8] (max-sum)– CRF log-likelihood (sum-product marginals)

• Tightly controlled– Use random 40% of training set– Use development set to fit regularization parameter ¸

and α for all methods

Page 37: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

POS Tagging Results

Error Cap (%) on Development Set

Max-sum (SCP)Max-sum (0-1)Sum-product (CRF)

English

Page 38: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

POS Tagging Results

Error Cap (%) on Development Set

Max-sum (SCP)Max-sum (0-1)Sum-product (CRF)

Portuguese

Page 39: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

POS Tagging Results

Error Cap (%) on Development Set

Max-sum (SCP)Max-sum (0-1)Sum-product (CRF)

Bulgarian

Page 40: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

English POS Cascade (2nd order)

Full SCP CRF Taglist

Accuracy (%) 96.83 96.82 96.84 ---

Filter Loss (%) 0 0.12 0.024 0.118

Test Time (ms) 173.28 1.56 4.16 10.6

Avg. Num States 1935.7 3.93 11.845 95.39

The cascade is efficient.

DT NN VRB ADJ

Page 41: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

method torso head upper arms

lower arms total

Ferrari et al 08 -- -- -- -- 61.4%

Ferrari et al. 09 -- -- -- -- 74.5%

Andriluka et al. 09 98.3% 95.7% 86.8% 51.7% 78.8%

Eichner et al. 09 98.72% 97.87% 92.8% 59.79% 80.1%

CPS (ours) 100.00% 99.57% 96.81% 62.34% 86.31%

Pose Estimation (Buffy Dataset)

“Ferrari score:” percent of parts correct within radius

Page 42: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Pose Estimation

Page 43: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Cascade Efficiency vs. Accuracy

Page 44: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Features: Shape/Segmentation Match

Page 45: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Features: Contour Continuation

Page 46: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Conclusions

• Novel loss significantly improves efficiency

• Principled learning of accurate cascades

• Deep cascades focus structured inference and allow rich models

• Open questions: – How to learn cascade structure– Dealing with intractable models– Other applications: factored dynamic models, grammars

Page 47: Structured Prediction Cascades Ben Taskar David Weiss, Ben Sapp, Alex Toshev University of Pennsylvania.

Thanks!

Structured Prediction Cascades, Weiss & Taskar, AISTATS10

Cascaded Models for Articulated Pose Estimation, Sapp, Toshev & Taskar, ECCV10