Efficient Benchmarking of Hyperparameter Optimizers via ... · Holger H. Hoos and Kevin...

1
Empirical Results Almost perfect for low-dimensional benchmarks and still acceptable for higher dimensions Reduce benchmark overhead to <1 sec Answer: Surrogates work well, especially based on Random Forests Implementation publicly available: Efficient Benchmarking of Hyperparameter Optimizers via Surrogates Katharina Eggensperger and Frank Hutter University of Freiburg {eggenspk,fh}@cs.uni-freiburg.de Holger H. Hoos and Kevin Leyton-Brown University of British Columbia {hoos, kevinlb}@cs.ubc.ca …in 30 seconds Benchmarking hyperparameter optimization methods is costly, as it requires many evaluations of the used benchmark problem. We propose cheap, but realistic surrogate benchmarks based on predictive performance models. Our benchmarks are publicly available and allow extensive white-box tests and fast evaluation of optimization algorithms. Experiments Collect data by conducting optimization runs and random search Evaluate quality of regression models Compare optimizer performance on true vs. surrogate benchmark Problems with existing benchmark functions Realistic benchmark problems: + Complex & interesting - Expensive to evaluate - Complicated to set up (libraries, dependencies, special hardware, etc. ) Examples: Deep Neural Networks, onlineLDA, AutoWEKA Synthetic test functions: + Easy to set up + Cheap to evaluate - Unrealistic shape & too smooth Examples: BBOB, Table-Lookups, Branin Benefit Our surrogate benchmarks provide a 60 to 3600 x speedup require negligible computational resources allow unit tests help to analyze how optimizers work on complex benchmark … and are publicly available: For more information visit: www.automl.org/benchmarks.html https://github.com/KEggensperger/SurrogateBenchmarks Figure: Performance of different regression models predicting configurations for optimizer SMAC Our Approach 1. Collect <configuration, performance> pairs from different benchmarks 2. Train Regression model 3. Replace call to benchmark function with model prediction Question: How well does this work? Setup 8 regression models Tree-based models Gaussian Processes Support Vector Regression K-Nearest Neighbour Linear Regression 9 benchmark functions, e.g. Logistic Regression (Deep) Neural Networks onlineLDA Figures: Best performance found by different optimizers over time. We show median and quartile across 10 runs on the real benchmark (left) and surrogate benchmarks based on random forests (middle) and Gaussian Processes (right). Logistic Regression HP-DBNet mrbi Table: Properties of the benchmarks for which we provide surrogate benchmarks Table: Empirical comparison of three optimizers on various real and surrogate-based benchmarks SMAC: F. Hutter, H. Hoos, and K. Leyton-Brown, Sequential model-based optimization for general algorithm configuration, 2011 TPE: J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, Algorithms for hyper-parameter optimization, 2011 Spearmint: J. Snoek, H. Larochelle, and R. Adams, Practical Bayesian optimization of machine learning algorithms, 2012 HPOlib: K. Eggensperger, M. Feurer, F. Hutter, J. Bergstra, J. Snoek, H. Hoos, and K. Leyton-Brown, Towards an empirical foundation for assessing Bayesian optimization of hyperparameters, 2013

Transcript of Efficient Benchmarking of Hyperparameter Optimizers via ... · Holger H. Hoos and Kevin...

Page 1: Efficient Benchmarking of Hyperparameter Optimizers via ... · Holger H. Hoos and Kevin Leyton-Brown University of British Columbia {hoos, kevinlb}@cs.ubc.ca What is Bayesian Optimization?

Empirical Results

• Almost perfect for low-dimensional

benchmarks and still acceptable for higher

dimensions

• Reduce benchmark overhead to <1 sec

Answer: Surrogates work well,

especially based on Random

Forests

This is Arial bold This is Arial, Anthrazit, 77 79 83

This is Arial bold This is Arial, Mittel-Grün, 115 150 0

This is Arial bold This is Arial, Mittel-Grau, 178 180 179

This is Arial bold This is Arial, Dunkel-Grau

This is Arial bold This is Arial, Hell-Grün, 146 212 0

This is Arial bold This is Arial, Pastell-Grün, 208 235 138

This is Arial bold This is Arial, Mittel-Blau, 42 110 187

This is Arial bold This is Arial, Dunkel-Rot, 123 41 39

This is Arial bold This is Arial, Pastell-Blau, 218 227 234

This is Arial bold This is Arial, Hell-Blau, 167 193 227

This is Arial bold This is Arial, Hell-Rot, 222 56 49

This is Arial bold This is Arial, Pastell-Rot, 255 129 141

This is Arial bold This is Arial, Orange, 233 131 0

This is Arial bold This is Arial, Gelb, 239 198 71

This is Arial bold This is Arial, Pastell-Gelb, 248 228 152

This is Arial bold This is Arial, Pastell-Grau, 234 235 231

This is Arial bold This is Arial, Hell-Grau, 213 214 210

Implementation publicly available:

Efficient Benchmarking of Hyperparameter

Optimizers via Surrogates

Katharina Eggensperger and Frank Hutter University of Freiburg

{eggenspk,fh}@cs.uni-freiburg.de

Holger H. Hoos and Kevin Leyton-Brown University of British Columbia

{hoos, kevinlb}@cs.ubc.ca

What is Bayesian Optimization?

…in 30 seconds

• Benchmarking hyperparameter optimization

methods is costly, as it requires many

evaluations of the used benchmark problem.

• We propose cheap, but realistic surrogate

benchmarks based on predictive performance

models.

• Our benchmarks are publicly available and

allow extensive white-box tests and fast

evaluation of optimization algorithms.

Experiments

• Collect data by conducting

optimization runs and

random search

• Evaluate quality of

regression models

• Compare optimizer

performance on true vs.

surrogate benchmark

Regression Model

Evaluate different regression models with respect to

time needed to predict/train and prediction

quality and select promising models:

• Tree-based model

• Gaussian Processes

• Support Vector Regression

• K-Nearest Neighbour

• Linear Regression

Surrogate Benchmarks

• Train regression model

• Replace call to true

benchmark function with

surrogate prediction

• Evaluate surrogate

benchmark and compare to

optimizer performance on

real benchmark

Problems with existing benchmark functions Realistic benchmark

problems:

+ Complex & interesting

- Expensive to evaluate

- Complicated to set up (libraries,

dependencies, special hardware, etc. )

Examples: Deep Neural Networks,

onlineLDA, AutoWEKA

Synthetic test functions:

+ Easy to set up

+ Cheap to evaluate

- Unrealistic shape &

too smooth

Examples: BBOB, Table-Lookups,

Branin

Figures: Best performance found by different

optimizers over time on the logistic regression

benchmark. We show median and quartile across 10

runs on the real benchmark (top) and surrogate

benchmarks based on random forests (left, top) and

Gaussian Processes (left, bottom).

Benefit

Our surrogate benchmarks …

• provide a 60 to 3600 x

speedup

• require negligible computational

resources

• allow unit tests

• help to analyze how optimizers

work on complex benchmark

… and are publicly available:

For more information visit: www.automl.org/benchmarks.html

Ou

r c

on

trib

uti

on

10 × 10 × 3 × 1

Example usage: Comparing three optimizers on a

single benchmark. Running each optimizer 10 times

with a budget of 100 configuration evaluations:

3 × 10 × 100 × time(𝑓 𝜆 )

↑ True benchmark:

3000 × 1min

← Surrogate benchmarks:

3000 × 1sec

https://github.com/KEggensperger/SurrogateBenchmarks

TODO: Short

description or

center figure

TODO: UBC

Logo

Garantiert

gegen das CD

TODO: QR

Code nach

automl?

Garantiert

gegen das CD

Surrogate Benchmarks

• Train regression model

• Replace call to true benchmark

function with surrogate prediction

• Evaluate surrogate benchmark

and compare to optimizer

performance on real benchmark

…in 15 sec Hyperparameter optimization is essential and there

exists a wide range of automating this process.

However, empirically comparing these different

hyperparameter optimization methods can be very

costly due to many evaluations of the considered

benchmark problem.

To speed up benchmarking we propose a new option,

surrogate benchmark, that feature the complexity of

expensive benchmark functions , but require

negligible computational resources.

Figure: Performance of different

regression models predicting

configurations for optimizer SMAC

Our Approach

1. Collect <configuration, performance> pairs from different benchmarks

2. Train Regression model

3. Replace call to benchmark function with model prediction

Question: How well does this work?

Setup

8 regression models

• Tree-based models

• Gaussian Processes

• Support Vector Regression

• K-Nearest Neighbour

• Linear Regression

9 benchmark

functions, e.g.

• Logistic

Regression

• (Deep) Neural

Networks

• onlineLDA

Data

Collect data from by conducting

optimization runs and random

search on actual benchmark

functions:

• Logistic Regression

• (Deep) Neural Network

• onlineLDA

Figures: Best performance found by

different optimizers over time. We show

median and quartile across 10 runs on

the real benchmark (left) and surrogate

benchmarks based on random forests

(middle) and Gaussian Processes

(right).

Logis

tic R

egre

ssio

n

HP

-DB

Net

mrb

i

Table: Properties of the benchmarks for which we

provide surrogate benchmarks Table: Empirical comparison of three optimizers on various real and surrogate-based benchmarks

SMAC: F. Hutter, H. Hoos, and K. Leyton-Brown, Sequential model-based optimization for general algorithm configuration, 2011

TPE: J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, Algorithms for hyper-parameter optimization, 2011

Spearmint: J. Snoek, H. Larochelle, and R. Adams, Practical Bayesian optimization of machine learning algorithms, 2012

HPOlib: K. Eggensperger, M. Feurer, F. Hutter, J. Bergstra, J. Snoek, H. Hoos, and K. Leyton-Brown,

Towards an empirical foundation for assessing Bayesian optimization of hyperparameters, 2013