Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ......

48
Using Meta Learning to Initialize Bayesian Optimization Albert-Ludwigs-Universität Freiburg Matthias Feurer 1 Jost Tobias Springenberg 2 Frank Hutter 1 1 Research Group on Learning, Optimization, and Automated Algorithm Design 2 Machine Learning Lab Department of Computer Science, University of Freiburg, Germany ECAI-2014 Workshop on Meta-learning & Algorithm Selection, 19 August 2014

Transcript of Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ......

Page 1: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Using Meta Learning to InitializeBayesian Optimization

Albert-Ludwigs-Universität Freiburg

Matthias Feurer1 Jost Tobias Springenberg2 Frank Hutter11Research Group on Learning, Optimization, and Automated Algorithm Design2Machine Learning LabDepartment of Computer Science, University of Freiburg, GermanyECAI-2014 Workshop on Meta-learning & Algorithm Selection, 19 August 2014

Page 2: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Your task: Build an Iris classification system

Choose an algorithmbased on datasetcharacteristics, e.g. forthe Iris dataset this couldbe an SVMManual tuning-> fiddling withhyperparameters.Better: Use automatedmethods like PSO, GA orSMBOBest: AutoWeka

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18

The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: IrisVersicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: IrisVirginica is licensed by C T Johansson under CC BY 3.0.

Page 3: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Your task: Build an Iris classification system

Choose an algorithmbased on datasetcharacteristics, e.g. forthe Iris dataset this couldbe an SVM

Manual tuning-> fiddling withhyperparameters.Better: Use automatedmethods like PSO, GA orSMBOBest: AutoWeka

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18

The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: IrisVersicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: IrisVirginica is licensed by C T Johansson under CC BY 3.0.

Page 4: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Your task: Build an Iris classification system

Choose an algorithmbased on datasetcharacteristics, e.g. forthe Iris dataset this couldbe an SVMManual tuning-> fiddling withhyperparameters.

Better: Use automatedmethods like PSO, GA orSMBOBest: AutoWeka

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18

The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: IrisVersicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: IrisVirginica is licensed by C T Johansson under CC BY 3.0.

Page 5: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Your task: Build an Iris classification system

Choose an algorithmbased on datasetcharacteristics, e.g. forthe Iris dataset this couldbe an SVMManual tuning-> fiddling withhyperparameters.Better: Use automatedmethods like PSO, GA orSMBO

Best: AutoWeka

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18

The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: IrisVersicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: IrisVirginica is licensed by C T Johansson under CC BY 3.0.

Page 6: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Your task: Build an Iris classification system

Choose an algorithmbased on datasetcharacteristics, e.g. forthe Iris dataset this couldbe an SVMManual tuning-> fiddling withhyperparameters.Better: Use automatedmethods like PSO, GA orSMBOBest: AutoWeka

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18

The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: IrisVersicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: IrisVirginica is licensed by C T Johansson under CC BY 3.0.

Page 7: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Adding the Iris Japonica to the dataset

Manual tuning:Use experience and startfrom the parametersfound on the Iris datasetAutomated methods-> start from scratch→ Cast use experienceinto an algorithm.

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 3 / 18

The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: IrisVersicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: IrisVirginica is licensed by C T Johansson under CC BY 3.0; Bottom right: Iris Japonica is licensed by KENPEI underCC BY-SA 3.0

Page 8: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Adding the Iris Japonica to the dataset

Manual tuning:Use experience and startfrom the parametersfound on the Iris dataset

Automated methods-> start from scratch→ Cast use experienceinto an algorithm.

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 3 / 18

The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: IrisVersicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: IrisVirginica is licensed by C T Johansson under CC BY 3.0; Bottom right: Iris Japonica is licensed by KENPEI underCC BY-SA 3.0

Page 9: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Adding the Iris Japonica to the dataset

Manual tuning:Use experience and startfrom the parametersfound on the Iris datasetAutomated methods-> start from scratch

→ Cast use experienceinto an algorithm.

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 3 / 18

The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: IrisVersicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: IrisVirginica is licensed by C T Johansson under CC BY 3.0; Bottom right: Iris Japonica is licensed by KENPEI underCC BY-SA 3.0

Page 10: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Adding the Iris Japonica to the dataset

Manual tuning:Use experience and startfrom the parametersfound on the Iris datasetAutomated methods-> start from scratch→ Cast use experienceinto an algorithm.

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 3 / 18

The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: IrisVersicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: IrisVirginica is licensed by C T Johansson under CC BY 3.0; Bottom right: Iris Japonica is licensed by KENPEI underCC BY-SA 3.0

Page 11: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Sequential Model-based Bayesian Optimization(SMBO)

ML Algorithm A

ConfigurationSpace Λ of A

Dataset D

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 4 / 18

Page 12: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Sequential Model-based Bayesian Optimization(SMBO)

Configuration Task

ML Algorithm A

ConfigurationSpace Λ of A

Dataset D

Configuration λ ∗

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 4 / 18

Page 13: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Sequential Model-based Bayesian Optimization(SMBO)

Configuration Task

ML Algorithm A

ConfigurationSpace Λ of A

Dataset D

Configuration λ ∗

Fit regressionmodel on

(λ ,Aλ (D)) pairs

Select promisingconfiguration

λ ∈ Λ

Evaluate Aλ (D)

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 4 / 18

Page 14: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Metalearning-Initialized SMBO (MI-SMBO)

Configuration Task

ML Algorithm A

ConfigurationSpace Λ of A

Dataset Dnew

Fit regressionmodel on pairs of

(λ ,Aλ (Dnew))

Select promisingconfiguration

λ ∈ Λ

EvaluateAλ (Dnew)

Configuration λ ∗

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 5 / 18

Page 15: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Metalearning-Initialized SMBO (MI-SMBO)

Configuration Task

ML Algorithm A

ConfigurationSpace Λ of A

Dataset Dnew

Fit regressionmodel on pairs of

(λ ,Aλ (Dnew))

Select promisingconfiguration

λ ∈ Λ

EvaluateAλ (Dnew)

Configuration λ ∗

Find Datasets Disimilar to Dnew

Initialize Searchwith λ ∗Di

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 5 / 18

Page 16: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Metafeatures

# training examples: 150# classes: 3# features: 4# numerical features: 4# categorical features: 0missing values? No

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 6 / 18

The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: IrisVersicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: IrisVirginica is licensed by C T Johansson under CC BY 3.0.

Page 17: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Metalearning-Initialized Bayesian Optimization

For a new dataset Dnew :Sort known datasets D1:N by distance to Dnew .For each of these datasets, extract the best knownhyperparameter configuration λ ∗Di

.Initialize SMBO with the first k hyperparameterconfigurations from the sorted list.

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 7 / 18

Page 18: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Similarity of Datasets

wavef

orm

-500

0iri

s

pend

igits

yeas

t

satim

age

balanc

e-sc

ale

ecoli

cmc

page

-block

sta

egl

ass

nurs

ery

spam

base

vehi

cle

segm

ent

car

optd

igits

prim

ary-

tum

or

mfe

at-fo

urier

mfe

at-k

arhu

nen

mfe

at-zer

nike

abalon

e

vowel

mfe

at-fa

ctor

s

derm

atolog

y

lette

r

mfe

at-m

orph

olog

ical

mfe

at-p

ixel

euca

lypt

us

arrh

ythm

ia

zoo

audi

olog

y

auto

s

soyb

ean

cylin

der-b

ands

vote

iono

sphe

re

tic-ta

c-to

e

braz

iltou

rism

hepa

titisla

bor

hear

t-sta

tlog

brea

st-w

mus

hroo

m

hear

t-c

hear

t-h

post

oper

ative-

patie

nt-d

ata

lym

ph

brea

st-c

ance

r

diab

etesliv

er-d

isord

ers

kr-v

s-kp

cred

it-a

anne

al.O

RIG

cred

it-g

sona

r

habe

rman

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 8 / 18

Page 19: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Finding the nearest datasets (1)

waveform-5000

iris

pendigits

yeast

satimage

balance-scale

ecoli

cmc

page-blocks

tae

glass

nursery

spambase

vehicle

segment

car

optdigits

primary-tumor

mfeat-fourier

mfeat-karhunen

mfeat-zernike

abalone

vowel

mfeat-factors

dermatology

letter

mfeat-morphological

mfeat-pixel

eucalyptus

arrhythmia

zoo

audiology

autos

soybean

cylinder-bands

vote

ionosphere

tic-tac-toe

braziltourism

hepatitisla

bor

heart-statlog

breast-w

mushroom

heart-c

heart-h

postoperative-patient-data

lymph

breast-cancer

diabetesliv

er-disorders

kr-vs-kp

credit-a

anneal.ORIG

credit-g

sonar

haberman

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 8 / 18

Page 20: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Finding the nearest datasets (2)

waveform-5000

iris

pendigits

yeast

satimage

balance-scale

ecoli

cmc

page-blocks

tae

glass

nursery

spambase

vehicle

segment

car

optdigits

primary-tumor

mfeat-fourier

mfeat-karhunen

mfeat-zernike

abalone

vowel

mfeat-factors

dermatology

letter

mfeat-morphological

mfeat-pixel

eucalyptus

arrhythmia

zoo

audiology

autos

soybean

cylinder-bands

vote

ionosphere

tic-tac-toe

braziltourism

hepatitisla

bor

heart-statlog

breast-w

mushroom

heart-c

heart-h

postoperative-patient-data

lymph

breast-cancer

diabetesliv

er-disorders

kr-vs-kp

credit-a

anneal.ORIG

credit-g

sonar

haberman

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 8 / 18

Page 21: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Finding the nearest datasets (3)

waveform-5000

iris

pendigits

yeast

satimage

balance-scale

ecoli

cmc

page-blocks

tae

glass

nursery

spambase

vehicle

segment

car

optdigits

primary-tumor

mfeat-fourier

mfeat-karhunen

mfeat-zernike

abalone

vowel

mfeat-factors

dermatology

letter

mfeat-morphological

mfeat-pixel

eucalyptus

arrhythmia

zoo

audiology

autos

soybean

cylinder-bands

vote

ionosphere

tic-tac-toe

braziltourism

hepatitisla

bor

heart-statlog

breast-w

mushroom

heart-c

heart-h

postoperative-patient-data

lymph

breast-cancer

diabetesliv

er-disorders

kr-vs-kp

credit-a

anneal.ORIG

credit-g

sonar

haberman

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 8 / 18

Page 22: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Finding the nearest datasets (3)

waveform-5000

iris

pendigits

yeast

satimage

balance-scale

ecoli

cmc

page-blocks

tae

glass

nursery

spambase

vehicle

segment

car

optdigits

primary-tumor

mfeat-fourier

mfeat-karhunen

mfeat-zernike

abalone

vowel

mfeat-factors

dermatology

letter

mfeat-morphological

mfeat-pixel

eucalyptus

arrhythmia

zoo

audiology

autos

soybean

cylinder-bands

vote

ionosphere

tic-tac-toe

braziltourism

hepatitisla

bor

heart-statlog

breast-w

mushroom

heart-c

heart-h

postoperative-patient-data

lymph

breast-cancer

diabetesliv

er-disorders

kr-vs-kp

credit-a

anneal.ORIG

credit-g

sonar

haberman

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 8 / 18

Page 23: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Finding the nearest datasets (4)

waveform-5000

iris

pendigits

yeast

satimage

balance-scale

ecoli

cmc

page-blocks

tae

glass

nursery

spambase

vehicle

segment

car

optdigits

primary-tumor

mfeat-fourier

mfeat-karhunen

mfeat-zernike

abalone

vowel

mfeat-factors

dermatology

letter

mfeat-morphological

mfeat-pixel

eucalyptus

arrhythmia

zoo

audiology

autos

soybean

cylinder-bands

vote

ionosphere

tic-tac-toe

braziltourism

hepatitisla

bor

heart-statlog

breast-w

mushroom

heart-c

heart-h

postoperative-patient-data

lymph

breast-cancer

diabetesliv

er-disorders

kr-vs-kp

credit-a

anneal.ORIG

credit-g

sonar

haberman

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 8 / 18

Page 24: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Distance metric (1)

Commonly used in literature, the L1 norm:

d(Dnew,Dj) = ∑i|mnew

i −mji | (1)

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 9 / 18

Page 25: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Experimental Setup

57 datasets from the OpenML repository

46 metafeatures from the literature:Split into five different subsets, including landmarking[Pfahringer et al. 2000]

Two case studiesSupport Vector Machine with MI-Spearmint [Snoek et al. 2012]AutoSklearn with MI-SMAC [Hutter et al. 2011]

Tried 5, 10, 20 and 25 initial configurationsran each instantiation 10 times on each dataset→ 26220 optimization runstherefore, precomputed a dense grid for every dataset

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 10 / 18

Page 26: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Experimental Setup

57 datasets from the OpenML repository46 metafeatures from the literature:

Split into five different subsets, including landmarking[Pfahringer et al. 2000]

Two case studiesSupport Vector Machine with MI-Spearmint [Snoek et al. 2012]AutoSklearn with MI-SMAC [Hutter et al. 2011]

Tried 5, 10, 20 and 25 initial configurationsran each instantiation 10 times on each dataset→ 26220 optimization runstherefore, precomputed a dense grid for every dataset

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 10 / 18

Page 27: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Experimental Setup

57 datasets from the OpenML repository46 metafeatures from the literature:

Split into five different subsets, including landmarking[Pfahringer et al. 2000]

Two case studiesSupport Vector Machine with MI-Spearmint [Snoek et al. 2012]AutoSklearn with MI-SMAC [Hutter et al. 2011]

Tried 5, 10, 20 and 25 initial configurationsran each instantiation 10 times on each dataset→ 26220 optimization runstherefore, precomputed a dense grid for every dataset

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 10 / 18

Page 28: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Experimental Setup

57 datasets from the OpenML repository46 metafeatures from the literature:

Split into five different subsets, including landmarking[Pfahringer et al. 2000]

Two case studiesSupport Vector Machine with MI-Spearmint [Snoek et al. 2012]AutoSklearn with MI-SMAC [Hutter et al. 2011]

Tried 5, 10, 20 and 25 initial configurations

ran each instantiation 10 times on each dataset→ 26220 optimization runstherefore, precomputed a dense grid for every dataset

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 10 / 18

Page 29: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Experimental Setup

57 datasets from the OpenML repository46 metafeatures from the literature:

Split into five different subsets, including landmarking[Pfahringer et al. 2000]

Two case studiesSupport Vector Machine with MI-Spearmint [Snoek et al. 2012]AutoSklearn with MI-SMAC [Hutter et al. 2011]

Tried 5, 10, 20 and 25 initial configurationsran each instantiation 10 times on each dataset→ 26220 optimization runs

therefore, precomputed a dense grid for every dataset

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 10 / 18

Page 30: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Experimental Setup

57 datasets from the OpenML repository46 metafeatures from the literature:

Split into five different subsets, including landmarking[Pfahringer et al. 2000]

Two case studiesSupport Vector Machine with MI-Spearmint [Snoek et al. 2012]AutoSklearn with MI-SMAC [Hutter et al. 2011]

Tried 5, 10, 20 and 25 initial configurationsran each instantiation 10 times on each dataset→ 26220 optimization runstherefore, precomputed a dense grid for every dataset

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 10 / 18

Page 31: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Combined Algorithm Selection andHyperparameter Optimization problem (CASH)

Max Features

SVM

gamma C(SVM)

Classifier

Random Forest LinearSVM

Criterion Min Samples Split loss C(LinearSVM)

[Auto-WEKA, Thornton et al. 2013]

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 11 / 18

Page 32: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Combined Algorithm Selection andHyperparameter Optimization problem (CASH)

Max Features

SVM

gamma C(SVM)

Classifier

Random Forest LinearSVM

Criterion Min Samples Split loss C(LinearSVM)

[Auto-WEKA, Thornton et al. 2013]

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 11 / 18

Page 33: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

AutoSklearn: Hyperparameters

Component Hyperparameter # Values

Main λclassifier 3Main preprocessing 2SVM log2(C) 21SVM log2(γ) 19LinearSVM log2(C) 21LinearSVM penalty 2RF min splits 5RF max features 10RF criterion 2PCA variance to keep 2

1623 hyperparameter configurations

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 12 / 18

Page 34: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

AutoSklearn: Hyperparameters

Component Hyperparameter # Values

Main λclassifier 3Main preprocessing 2SVM log2(C) 21SVM log2(γ) 19LinearSVM log2(C) 21LinearSVM penalty 2RF min splits 5RF max features 10RF criterion 2PCA variance to keep 2

1623 hyperparameter configurations

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 12 / 18

Page 35: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

AutoSklearn: Results (1)

0 10 20 30 40 50#Function evaluations

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Diffe

rence

to m

in f

unct

ion v

alu

e

SMAC

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 13 / 18

Page 36: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

AutoSklearn: Results (1)

0 10 20 30 40 50#Function evaluations

0.00

0.02

0.04

0.06

0.08

0.10

Diffe

rence

to m

in f

unct

ion v

alu

e

SMACrandomTPEMI-SMAC(10,L1 ,landmarking)

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 14 / 18

Page 37: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

AutoSklearn: Results (2)

0 10 20 30 40 501.8

2.0

2.2

2.4

2.6

2.8

3.0

SMACrandom

TPE MI-SMAC(10,L1 ,landmarking)

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 15 / 18

Page 38: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

AutoSklearn: Results (3)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 10 20 30 40 50#Function evaluations

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

MI-SMAC(10,L1 ,landmarking) vs SMAC

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 15 / 18

Page 39: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

AutoSklearn: Results (3)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 10 20 30 40 50#Function evaluations

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

MI-SMAC(10,L1 ,landmarking) vs SMAC

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 15 / 18

Page 40: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

AutoSklearn: Results (3)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 10 20 30 40 50#Function evaluations

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

MI-SMAC(10,L1 ,landmarking) vs SMAC MI-SMAC(10,L1 ,landmarking) vs TPE

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 15 / 18

Page 41: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

AutoSklearn: Results (3)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 10 20 30 40 50#Function evaluations

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

MI-SMAC(10,L1 ,landmarking) vs SMAC

MI-SMAC(10,L1 ,landmarking) vs TPE

MI-SMAC(10,L1 ,landmarking) vs random

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 15 / 18

Page 42: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Open questions

Does MI-SMBO scale to larger configuration spaces?What if gridsearch is too expensive?Can the metalearning component be added directly intothe SMBO procedure?

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 16 / 18

Page 43: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

Take home messages

SMBO can be substantially improved by providing goodinitial configurations.Metalearning provides a sound framework to find theseconfigurations.MI-SMAC improves on state-of-the-art methods on a largeconfiguration space, namely AutoSklearn.

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 17 / 18

Page 44: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

The end

Thank you for your attention.

Further questions: [email protected]

This presentation was partially supported by an ECCAI TravelAward and the ECCAI sponsors.

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 18 / 18

Page 45: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

AutoSklearn: Results (5)

0 10 20 30 40 50#Function evaluations

0.00

0.02

0.04

0.06

0.08

0.10

Min

funct

ion v

alu

e

SMACrandomTPEMI-SMAC(10,L1 ,landmarking)

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 18 / 18

Page 46: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

AutoSklearn: Results (7)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0 10 20 30 40 50

Function evaluations

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0.0

MI-SMAC(10,L1,all) vs MI-SMAC(10,L1,landmarking)MI-SMAC(10,L1,all) vs SMAC

MI-SMAC(10,L1,all) vs TPEMI-SMAC(10,L1,all) vs random

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 18 / 18

Page 47: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

AutoSklearn: Results (8)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0 10 20 30 40 50#Function evaluations

0.6

0.5

0.4

0.3

0.2

0.1

0.0

SMAC vs MI-SMAC(10,L1 ,landmarking)

SMAC vs TPE

SMAC vs random

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 18 / 18

Page 48: Using Meta Learning to Initialize Bayesian Optimization · !Castuse experience intoanalgorithm. ... kr-vs-kp credit-a anneal.ORIG credit-g sonar haberman MetaSel’14 Feurer,SpringenbergandHutter–MI-SMBO

AutoSklearn: Results (9)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 10 20 30 40 50

Function evaluations

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0.0

MI-SMAC(25,L1,landmarking) vs MI-SMAC(10,L1,landmarking)MI-SMAC(25,L1,landmarking) vs MI-SMAC(20,L1,landmarking)MI-SMAC(25,L1,landmarking) vs MI-SMAC(5,L1,landmarking)

MI-SMAC(25,L1,landmarking) vs SMACMI-SMAC(25,L1,landmarking) vs TPEMI-SMAC(25,L1,landmarking) vs random

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 18 / 18