Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Fast Cross-Validation via Sequential Analysis
Tammo KruegerDanny Panknin
Mikio Braun
Machine Learning GroupTechnische Universitaet Berlin
16.12.2011 Big Learning Workshop
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Motivation
Cross-validation is an indispensable tool for applied MLbut unfortunately very time consuming
Example fitted Q iteration:
Cross-Validation︷ ︸︸ ︷10× 10︸ ︷︷ ︸parameter
× 10︸︷︷︸fold
× 50︸︷︷︸max. iter.
× 10︸︷︷︸reps.
= 500, 000 reg. problems
Directly optimizing the error landscape to avoidcalculations difficult due to noise
Our approach: use increasing subsets of the training data
1 smaller subsets → less training time2 more training data → better error estimate3 relative behavior of parameter configurations converges
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Motivation – Main Idea (Average over 500 Reps.)
log(σ)
Cla
ss. E
rror
0.1
0.2
0.3
0.4
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●
●●
●●
● ● ● ● ● ● ● ●●
●●
●
●●
● ● ● ●● ● ● ● ● ● ● ●
−4 −3 −2 −1 0 1 2
Size
● 50
100
250
500
750
1000
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Motivation – Main Idea (Individual Reps.)
log(σ)
Cla
ss. E
rror
0.1
0.2
0.3
0.4
0.5
0.1
0.2
0.3
0.4
0.5
0.1
0.2
0.3
0.4
0.5
1
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●
●
●●●
●
●●
●
●
●
●●
●
4
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●
●
●●
●
●●●
●●●●●
●●
7
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●
●
●
●●●●●●
●●
●●
−4 −3 −2 −1 0 1 2
2
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●●●
●●
●
●●
●●
5
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
●●
●●
●●●●
●
●●
●●
●
●
●●●
8
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●
●●●●●●
●●
●●●
●●●
−4 −3 −2 −1 0 1 2
3
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●
●
●
●●●●●●●●●●●
6
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●
●
●●●●
●●●●●●●
●
●●
9
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●
●●
●●●●●●●
●
●
−4 −3 −2 −1 0 1 2
Size
● 50
100
250
500
750
1000
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Motivation – Exploitation
Observations:
Ê Individual runs are noisy, but at least we can see thetendency
Ë A lot of underperforming parameter configurationsÌ We can estimate the correct parameter on a
sufficiently large subset of the data
Exploitation:
Ê Transformation of the pointwise test errors of theconfigurations into a binary top or flop scheme
Ë Dropping of significant loser configurations along theway via tests from the sequential analysis framework
Ì Early stopping of the procedure, when we have seenenough data for a stable parameter estimation
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Fast Cross-Validation Procedure – Algorithm
data points stepsconf. d1 d2 d3 · · · dn−1 dn 1 2 3 4 5 6 7 8 9 10
c1 -2.2 -1.9 -1.8 2.1 1.5 flop 0 0 0 0 1 0 0 0 0 0 (†)c2 -1.9 -2.4 -2.3 · · · 1.9 2.4 flop 0 1 0 0 0 0 0 0 0 0 (†)c3 -1.4 -0.9 -0.7 0.5 0.5 flop 0 1 1 0 0 1 0 0 0 0
......
... Ê ...ck−2 0.6 0.6 0.7 -0.8 -0.4 top → 0 1 0 1 1 1 1 0 1 1ck−1 0.1 0.5 0.7 · · · -0.9 -0.1 top 0 1 1 1 1 1 0 1 1 1
ck 0.5 0.4 0.6 -0.3 0.0 top 1 1 0 1 1 1 0 1 1 1pointwisePerformance matrix trace matrix
Ë↙ Ì ↓
0 5 10 15 20
05
1015
20
Step
Cum
ulat
ive
Sum
0 0 0 01 0 0 0 0 0
11 0
11
1 01
11
X
∆H0(π0, π1, βl, αl)
Sa(π0, π1, βl, αl)WINNER
LOSER
c1
ck
7 8 9 10c3 0 0 0 0
?=
......
ck−2 1 0 1 1ck−1 0 1 1 1
ck 0 1 1 1
∆ = N/20
modelSize = 10∆
n = N − 10∆
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Meta-parameters – Selection of Test Parameters
data points stepsconf. d1 d2 d3 · · · dn−1 dn 1 2 3 4 5 6 7 8 9 10
c1 -2.2 -1.9 -1.8 2.1 1.5 flop 0 0 0 0 1 0 0 0 0 0 (†)c2 -1.9 -2.4 -2.3 · · · 1.9 2.4 flop 0 1 0 0 0 0 0 0 0 0 (†)c3 -1.4 -0.9 -0.7 0.5 0.5 flop 0 1 1 0 0 1 0 0 0 0
......
... Ê ...ck−2 0.6 0.6 0.7 -0.8 -0.4 top → 0 1 0 1 1 1 1 0 1 1ck−1 0.1 0.5 0.7 · · · -0.9 -0.1 top 0 1 1 1 1 1 0 1 1 1
ck 0.5 0.4 0.6 -0.3 0.0 top 1 1 0 1 1 1 0 1 1 1pointwisePerformance matrix trace matrix
Ë↙ Ì ↓
0 5 10 15 20
05
1015
20
Step
Cum
ulat
ive
Sum
0 0 0 01 0 0 0 0 0
11 0
11
1 01
11
X
∆H0(π0, π1, βl, αl)
Sa(π0, π1, βl, αl)WINNER
LOSER
c1
ck
7 8 9 10c3 0 0 0 0
?=
......
ck−2 1 0 1 1ck−1 0 1 1 1
ck 0 1 1 1
∆ = N/20
modelSize = 10∆
n = N − 10∆
(π0, π1) =argmaxπ′0,π
′1
4H0(π′0, π′1, βl , αl)
s.t. Sa(π′0, π′1, βl , αl) ∈ (steps− 1, steps]
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Meta-parameters – False Negative Rate
Change Point
Fals
e N
egat
ive
Rat
e
0.0
0.2
0.4
0.6
0.8
● ● ●
● ●
● ●
● ●
●
● ● ● ● ●
● ● ● ●
5 10 15
Pi
● 0.1
0.2
0.3
0.4
0.5
0 ≤ cp
steps≤
log βl1−αl
log π1π0
log 1−βlαl
log 1−π11−π0︸ ︷︷ ︸
security zone (false negative rate of 0)
with steps ≥⌈
log1− βlαl
/ log 2⌉
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Fast Cross-Validation Procedure – Example Run
Step
Con
figur
atio
n
100
200
300
400
500
600
1 2 3 4 5 6 7
Status
flop
top
out
50
100
150
2 3 4
50
100
150
3 4 5
20
40
60
80
4 5 6
10203040506070
5 6 7
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Experimental Setup
8 classification and 7 regression data sets
For each dataset:12 for parameter estimation, 1
2 for test error estimationSVM/SVR and Kernel Ridge Regression/Kernel LogisticRegression with Gaussian kernel using 610 parameterconfigurationsParameter estimation with:
Full 10-fold cross-validationFast cross-validation procedure with 10 steps
Repeated 50 times with different splits for each dataset
Compare:
Test error difference of fast versus full cross-validationRelative speed factor, i.e. time full cross-validation
time fast cross-validation
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Experiments – Test Error
MS
E F
ast C
V −
MS
E F
ull C
V
−0.04
−0.02
0.00
0.02
0.04
0.06Classification
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
bana
na
covt
ype
germ
an
imag
e
ringn
orm
splic
e
twon
orm
wav
efor
m
Regression
●
●●
●
●●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
bank
32nm
bloc
ks
bum
ps
dopp
ler
heav
isin
e
kin3
2nm
pum
adyn
32nm
method
kreg
sv
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Experiments – Speed Increase
Tim
e F
ull C
V /
Tim
e Fa
st C
V
20
40
60
80
100
120
140
Classification
●
●
●
●
●●
●
●
●
●
●
●●●
bana
na
covt
ype
germ
an
imag
e
ringn
orm
splic
e
twon
orm
wav
efor
m
Regression
●
●
●
●
●
●●●
●
●
●
●
●
●
bank
32nm
bloc
ks
bum
ps
dopp
ler
heav
isin
e
kin3
2nm
pum
adyn
32nm
method
kreg
sv
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Fast Cross-Validation Procedure – Summary
Motivation: we can estimate the correct parameter on asufficiently large subset of the data
Transformation: Race of configurations evaluated onlinearly increasing subsets of the data
At each step of this race:
1 Transform the test errors on individual data points of theremaining configurations into a binary top or flop scheme
2 Drop significant loser configurations along the way usingtests from the sequential analysis framework
3 Apply distribution free testing techniques to decide,whether we have gathered enough evidence for a stableparameter estimation
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Questions? Remarks?Thanks for your attention!
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Experiments – Traces Classification
variable
valu
e
0
100
200
300
400
500
600
0
100
200
300
400
500
600
0
100
200
300
400
500
600
banana
● ●
●●●●●●●●● ●●●●●●●●●●●●
●
●●●
●
●●●
●●
●
●●●
●●
●
●
●●●
●●●●
●●
image
●
●
●
●●●●●●●
●●●
●
●●●●●
●
●
●
●●●●●●
●●●
●●●●
●
●
●
●
●
●●
●●
●●
●●
●
●●●
●
twonorm
● ●
●●●●●●●●●●
●
●●
●
●
●●●●
●
●●
●●
●
●
●
●●
●●
●●
S1 S2 S3 S4 S5 S6 S7 S8 S9S10
covtype
●
●
●
● ●
●●●
●●●●
●
●●●
●
●●●
●
●●
●
●●●
●●
●
●
●
●
●●●●●●●
●
●●
●
●
●
●
●●●●●
●
●●
●
●● ●
●
●
●●
ringnorm
●●●●●●●●●●●●
●●
●●●●●●●●
●●●
●●●
●●
● ●
waveform
●● ●●
●●●●● ●
●●
●
●
●●
●
●
●●●●●●●●
●●●●
●
●
●
●
●●●●
●●●
●●
S1 S2 S3 S4 S5 S6 S7 S8 S9S10
german
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●●●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
splice
●●●●●●●●●●●●●●
●●●●●●●●●
●●●
●●●●●●●●
● ●●●●●●●●●●●●
●
●●
●
S1 S2 S3 S4 S5 S6 S7 S8 S9S10
method
kreg
sv
Fast Cross-Validation viaSequentialAnalysis
TammoKruegerDannyPanknin
Mikio Braun
Motivation
Fast CV
Algorithm
Meta-parameters
Example Run
Experiments
Test Error
Speed Increase
Conclusion
Experiments – Traces Regression
variable
valu
e
0
100
200
300
400
500
600
0
100
200
300
400
500
600
0
100
200
300
400
500
600
bank32nm
●●●●●●●●
●●●●●●●● ● ●●●● ● ●
doppler
●● ●●
pumadyn32nm
● ●●●●●
●●●●●●●
●●●●●●●●●● ●●
●●●●●● ●●●●●●●● ●●●●● ●●
S1 S2 S3 S4 S5 S6 S7 S8 S9S10
blocks
●● ●● ●●●
heavisine
●●
●●
●●●●●●●●●●
● ●●
S1 S2 S3 S4 S5 S6 S7 S8 S9S10
bumps
●●●●●●●●●●● ●●●●●●●●●●●
kin32nm
●
●
●
●●●●●● ●●●
●●●●●●
●●●●●● ●
●
S1 S2 S3 S4 S5 S6 S7 S8 S9S10
method
kreg
sv
Top Related