Download - Fast Cross-Validation via Sequential Analysistammok.github.io/talks/fastcv.pdf · Title: Fast Cross-Validation via Sequential Analysis Author: Tammo KruegerDanny PankninMikio Braun

Fast Cross-Validation viaSequentialAnalysis

TammoKruegerDannyPanknin

Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Fast Cross-Validation via Sequential Analysis

Tammo KruegerDanny Panknin

Mikio Braun

Machine Learning GroupTechnische Universitaet Berlin

16.12.2011 Big Learning Workshop



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Motivation

Cross-validation is an indispensable tool for applied MLbut unfortunately very time consuming

Example fitted Q iteration:

Cross-Validation︷︸︸︷10× 10︸︷︷︸parameter

× 10︸︷︷︸fold

× 50︸︷︷︸max. iter.

× 10︸︷︷︸reps.

= 500, 000 reg. problems

Directly optimizing the error landscape to avoidcalculations difficult due to noise

Our approach: use increasing subsets of the training data

1 smaller subsets → less training time2 more training data → better error estimate3 relative behavior of parameter configurations converges



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Motivation – Main Idea (Average over 500 Reps.)

log(σ)

Cla

ss. E

rror

0.1

0.2

0.3

0.4

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

●●

●●

● ● ● ● ● ● ● ●●

●●

●

●●

● ● ● ●● ● ● ● ● ● ● ●

−4 −3 −2 −1 0 1 2

Size

● 50

100

250

500

750

1000



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Motivation – Main Idea (Individual Reps.)

log(σ)

Cla

ss. E

rror

0.1

0.2

0.3

0.4

0.5

0.1

0.2

0.3

0.4

0.5

0.1

0.2

0.3

0.4

0.5

1

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●●

●

●●●

●

●●

●

●

●

●●

●

4

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●

●

●●

●

●●●

●●●●●

●●

7

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●●●

●

●

●●●●●●

●●

●●

−4 −3 −2 −1 0 1 2

2

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●

●●●●

●●

●

●●

●●

5

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●

●●

●●●●

●

●●

●●

●

●

●●●

8

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●

●●●●●●

●●

●●●

●●●

−4 −3 −2 −1 0 1 2

3

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●

●

●

●●●●●●●●●●●

6

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●

●

●●●●

●●●●●●●

●

●●

9

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●

●●

●●●●●●●

●

●

−4 −3 −2 −1 0 1 2

Size

● 50

100

250

500

750

1000



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Motivation – Exploitation

Observations:

Ê Individual runs are noisy, but at least we can see thetendency

Ë A lot of underperforming parameter configurationsÌ We can estimate the correct parameter on a

sufficiently large subset of the data

Exploitation:

Ê Transformation of the pointwise test errors of theconfigurations into a binary top or flop scheme

Ë Dropping of significant loser configurations along theway via tests from the sequential analysis framework

Ì Early stopping of the procedure, when we have seenenough data for a stable parameter estimation



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Fast Cross-Validation Procedure – Algorithm

data points stepsconf. d1 d2 d3 · · · dn−1 dn 1 2 3 4 5 6 7 8 9 10

c1 -2.2 -1.9 -1.8 2.1 1.5 flop 0 0 0 0 1 0 0 0 0 0 (†)c2 -1.9 -2.4 -2.3 · · · 1.9 2.4 flop 0 1 0 0 0 0 0 0 0 0 (†)c3 -1.4 -0.9 -0.7 0.5 0.5 flop 0 1 1 0 0 1 0 0 0 0

......

... Ê ...ck−2 0.6 0.6 0.7 -0.8 -0.4 top → 0 1 0 1 1 1 1 0 1 1ck−1 0.1 0.5 0.7 · · · -0.9 -0.1 top 0 1 1 1 1 1 0 1 1 1

ck 0.5 0.4 0.6 -0.3 0.0 top 1 1 0 1 1 1 0 1 1 1pointwisePerformance matrix trace matrix

Ë↙ Ì ↓

0 5 10 15 20

05

1015

20

Step

Cum

ulat

ive

Sum

0 0 0 01 0 0 0 0 0

11 0

11

1 01

11

X

∆H0(π0, π1, βl, αl)

Sa(π0, π1, βl, αl)WINNER

LOSER

c1

ck

7 8 9 10c3 0 0 0 0

?=

......

ck−2 1 0 1 1ck−1 0 1 1 1

ck 0 1 1 1

∆ = N/20

modelSize = 10∆

n = N − 10∆



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Meta-parameters – Selection of Test Parameters

data points stepsconf. d1 d2 d3 · · · dn−1 dn 1 2 3 4 5 6 7 8 9 10

c1 -2.2 -1.9 -1.8 2.1 1.5 flop 0 0 0 0 1 0 0 0 0 0 (†)c2 -1.9 -2.4 -2.3 · · · 1.9 2.4 flop 0 1 0 0 0 0 0 0 0 0 (†)c3 -1.4 -0.9 -0.7 0.5 0.5 flop 0 1 1 0 0 1 0 0 0 0

......

... Ê ...ck−2 0.6 0.6 0.7 -0.8 -0.4 top → 0 1 0 1 1 1 1 0 1 1ck−1 0.1 0.5 0.7 · · · -0.9 -0.1 top 0 1 1 1 1 1 0 1 1 1

ck 0.5 0.4 0.6 -0.3 0.0 top 1 1 0 1 1 1 0 1 1 1pointwisePerformance matrix trace matrix

Ë↙ Ì ↓

0 5 10 15 20

05

1015

20

Step

Cum

ulat

ive

Sum

0 0 0 01 0 0 0 0 0

11 0

11

1 01

11

X

∆H0(π0, π1, βl, αl)

Sa(π0, π1, βl, αl)WINNER

LOSER

c1

ck

7 8 9 10c3 0 0 0 0

?=

......

ck−2 1 0 1 1ck−1 0 1 1 1

ck 0 1 1 1

∆ = N/20

modelSize = 10∆

n = N − 10∆

(π0, π1) =argmaxπ′0,π

′1

4H0(π′0, π′1, βl , αl)

s.t. Sa(π′0, π′1, βl , αl) ∈ (steps− 1, steps]



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Meta-parameters – False Negative Rate

Change Point

Fals

e N

egat

ive

Rat

e

0.0

0.2

0.4

0.6

0.8

● ● ●

● ●

● ●

● ●

●

● ● ● ● ●

● ● ● ●

5 10 15

Pi

● 0.1

0.2

0.3

0.4

0.5

0 ≤ cp

steps≤

log βl1−αl

log π1π0

log 1−βlαl

log 1−π11−π0︸︷︷︸

security zone (false negative rate of 0)

with steps ≥⌈

log1− βlαl

/ log 2⌉



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Fast Cross-Validation Procedure – Example Run

Step

Con

figur

atio

n

100

200

300

400

500

600

1 2 3 4 5 6 7

Status

flop

top

out

50

100

150

2 3 4

50

100

150

3 4 5

20

40

60

80

4 5 6

10203040506070

5 6 7



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Experimental Setup

8 classification and 7 regression data sets

For each dataset:12 for parameter estimation, 1

2 for test error estimationSVM/SVR and Kernel Ridge Regression/Kernel LogisticRegression with Gaussian kernel using 610 parameterconfigurationsParameter estimation with:

Full 10-fold cross-validationFast cross-validation procedure with 10 steps

Repeated 50 times with different splits for each dataset

Compare:

Test error difference of fast versus full cross-validationRelative speed factor, i.e. time full cross-validation

time fast cross-validation



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Experiments – Test Error

MS

E F

ast C

V −

MS

E F

ull C

V

−0.04

−0.02

0.00

0.02

0.04

0.06Classification

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

bana

na

covt

ype

germ

an

imag

e

ringn

orm

splic

e

twon

orm

wav

efor

m

Regression

●

●●

●

●●

●● ●●

●

●

●

●

●

●

●

●

●

●

●

●

bank

32nm

bloc

ks

bum

ps

dopp

ler

heav

isin

e

kin3

2nm

pum

adyn

32nm

method

kreg

sv



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Experiments – Speed Increase

Tim

e F

ull C

V /

Tim

e Fa

st C

V

20

40

60

80

100

120

140

Classification

●

●

●

●

●●

●

●

●

●

●

●●●

bana

na

covt

ype

germ

an

imag

e

ringn

orm

splic

e

twon

orm

wav

efor

m

Regression

●

●

●

●

●

●●●

●

●

●

●

●

●

bank

32nm

bloc

ks

bum

ps

dopp

ler

heav

isin

e

kin3

2nm

pum

adyn

32nm

method

kreg

sv



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Fast Cross-Validation Procedure – Summary

Motivation: we can estimate the correct parameter on asufficiently large subset of the data

Transformation: Race of configurations evaluated onlinearly increasing subsets of the data

At each step of this race:

1 Transform the test errors on individual data points of theremaining configurations into a binary top or flop scheme

2 Drop significant loser configurations along the way usingtests from the sequential analysis framework

3 Apply distribution free testing techniques to decide,whether we have gathered enough evidence for a stableparameter estimation



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Questions? Remarks?Thanks for your attention!



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Experiments – Traces Classification

variable

valu

e

0

100

200

300

400

500

600

0

100

200

300

400

500

600

0

100

200

300

400

500

600

banana

● ●

●●●●●●●●● ●●●●●●●●●●●●

●

●●●

●

●●●

●●

●

●●●

●●

●

●

●●●

●●●●

●●

image

●

●

●

●●●●●●●

●●●

●

●●●●●

●

●

●

●●●●●●

●●●

●●●●

●

●

●

●

●

●●

●●

●●

●●

●

●●●

●

twonorm

● ●

●●●●●●●●●●

●

●●

●

●

●●●●

●

●●

●●

●

●

●

●●

●●

●●

S1 S2 S3 S4 S5 S6 S7 S8 S9S10

covtype

●

●

●

● ●

●●●

●●●●

●

●●●

●

●●●

●

●●

●

●●●

●●

●

●

●

●

●●●●●●●

●

●●

●

●

●

●

●●●●●

●

●●

●

●● ●

●

●

●●

ringnorm

●●●●●●●●●●●●

●●

●●●●●●●●

●●●

●●●

●●

● ●

waveform

●● ●●

●●●●● ●

●●

●

●

●●

●

●

●●●●●●●●

●●●●

●

●

●

●

●●●●

●●●

●●

S1 S2 S3 S4 S5 S6 S7 S8 S9S10

german

●●

●

●

●●

●

●●

●

●●

●

●

●●

●

●●

●

●●●●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

splice

●●●●●●●●●●●●●●

●●●●●●●●●

●●●

●●●●●●●●

● ●●●●●●●●●●●●

●

●●

●

S1 S2 S3 S4 S5 S6 S7 S8 S9S10

method

kreg

sv



Mikio Braun

Motivation

Fast CV

Algorithm

Meta-parameters

Example Run

Experiments

Test Error

Speed Increase

Conclusion

Experiments – Traces Regression

variable

valu

e

0

100

200

300

400

500

600

0

100

200

300

400

500

600

0

100

200

300

400

500

600

bank32nm

●●●●●●●●

●●●●●●●● ● ●●●● ● ●

doppler

●● ●●

pumadyn32nm

● ●●●●●

●●●●●●●

●●●●●●●●●● ●●

●●●●●● ●●●●●●●● ●●●●● ●●

S1 S2 S3 S4 S5 S6 S7 S8 S9S10

blocks

●● ●● ●●●

heavisine

●●

●●

●●●●●●●●●●

● ●●

S1 S2 S3 S4 S5 S6 S7 S8 S9S10

bumps

●●●●●●●●●●● ●●●●●●●●●●●

kin32nm

●

●

●

●●●●●● ●●●

●●●●●●

●●●●●● ●

●

S1 S2 S3 S4 S5 S6 S7 S8 S9S10

method

kreg

sv