Learning to Warm-Start Bayesian Hyperparameter...
Transcript of Learning to Warm-Start Bayesian Hyperparameter...
![Page 1: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive](https://reader033.fdocuments.net/reader033/viewer/2022042313/5edf2b0dad6a402d666a84a3/html5/thumbnails/1.jpg)
1/13
Learning to Warm-StartBayesian Hyperparameter Optimization
and Task-Adaptive Ensemble of Meta-Learnersfor Few-Shot Classification
Jungtaek Kim ([email protected])
Machine Learning Group,Department of Computer Science and Engineering, POSTECH,
77 Cheongam-ro, Nam-gu, Pohang 37673,Gyeongsangbuk-do, Republic of Korea
September 11, 2018
![Page 2: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive](https://reader033.fdocuments.net/reader033/viewer/2022042313/5edf2b0dad6a402d666a84a3/html5/thumbnails/2.jpg)
2/13
Table of Contents
Learning to Warm-Start Bayesian Hyperparameter OptimizationMotivationMain ArchitectureExperiments
Task-Adaptive Ensemble of Meta-Learners for Few-Shot ClassificationMotivationMain ArchitectureExperiments
![Page 3: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive](https://reader033.fdocuments.net/reader033/viewer/2022042313/5edf2b0dad6a402d666a84a3/html5/thumbnails/3.jpg)
3/13
Learning to Warm-Start BayesianHyperparameter Optimization
![Page 4: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive](https://reader033.fdocuments.net/reader033/viewer/2022042313/5edf2b0dad6a402d666a84a3/html5/thumbnails/4.jpg)
4/13
Motivation
I Bayesian hyperparameter optimization usually starts fromrandom initial points.
I Better initializations might help to speed up Bayesianhyperparameter optimization.
I Mappings from hyperparameters to validation error are able tobe trained.
I We attempt to transfer prior knowledge about initializationsto new task.
![Page 5: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive](https://reader033.fdocuments.net/reader033/viewer/2022042313/5edf2b0dad6a402d666a84a3/html5/thumbnails/5.jpg)
5/13
Main Architecture
All weights are shared.
Meta-featureextractor
Meta-featureextractor
Dataset
Deep featureextractor
Dataset
Deep featureextractor
Meta-feature distance
fc layerfc layer
![Page 6: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive](https://reader033.fdocuments.net/reader033/viewer/2022042313/5edf2b0dad6a402d666a84a3/html5/thumbnails/6.jpg)
6/13
Experiments (EI)
0 5 10 15 20Iteration
0.76
0.78
0.80
0.82
Min
imum
valid
atio
ner
ror
(a) AwA2
0 5 10 15 20Iteration
0.56
0.58
0.60
0.62
Min
imum
valid
atio
ner
ror
(b) Caltech-101
0 5 10 15 20Iteration
0.84
0.85
0.86
0.87
0.88
Min
imum
valid
atio
ner
ror
(c) Caltech-256
0 5 10 15 20Iteration
0.30
0.35
0.40
0.45
Min
imum
valid
atio
ner
ror
(d) CIFAR-10
0 5 10 15 20Iteration
0.700
0.725
0.750
0.775
0.800
0.825
Min
imum
valid
atio
ner
ror
(e) CIFAR-100
0 5 10 15 20Iteration
0.960
0.965
0.970
0.975
Min
imum
valid
atio
ner
ror
(f) CUB200-2011
0 5 10 15 20Iteration
0.012
0.014
0.016
0.018
0.020
Min
imum
valid
atio
ner
ror
(g) MNIST
0 5 10 15 20Iteration
0.70
0.71
0.72
0.73
0.74
0.75
0.76
Min
imum
valid
atio
ner
ror
(h) VOC2012
Random init. (Uniform)
Random init. (Latin)
Random init. (Halton)
Nearest best init. (ADF)
Nearest best init. (Bi-LSTM)
![Page 7: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive](https://reader033.fdocuments.net/reader033/viewer/2022042313/5edf2b0dad6a402d666a84a3/html5/thumbnails/7.jpg)
7/13
Experiments (UCB)
0 5 10 15 20Iteration
0.76
0.78
0.80
0.82
Min
imum
valid
atio
ner
ror
(j) AwA2
0 5 10 15 20Iteration
0.56
0.58
0.60
0.62
Min
imum
valid
atio
ner
ror
(k) Caltech-101
0 5 10 15 20Iteration
0.84
0.85
0.86
0.87
0.88
Min
imum
valid
atio
ner
ror
(l) Caltech-256
0 5 10 15 20Iteration
0.30
0.35
0.40
0.45
Min
imum
valid
atio
ner
ror
(m) CIFAR-10
0 5 10 15 20Iteration
0.700
0.725
0.750
0.775
0.800
0.825
Min
imum
valid
atio
ner
ror
(n) CIFAR-100
0 5 10 15 20Iteration
0.960
0.965
0.970
0.975
Min
imum
valid
atio
ner
ror
(o) CUB200-2011
0 5 10 15 20Iteration
0.012
0.014
0.016
0.018
0.020
Min
imum
valid
atio
ner
ror
(p) MNIST
0 5 10 15 20Iteration
0.70
0.72
0.74
0.76
Min
imum
valid
atio
ner
ror
(q) VOC2012
Random init. (Uniform)
Random init. (Latin)
Random init. (Halton)
Nearest best init. (ADF)
Nearest best init. (Bi-LSTM)
![Page 8: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive](https://reader033.fdocuments.net/reader033/viewer/2022042313/5edf2b0dad6a402d666a84a3/html5/thumbnails/8.jpg)
8/13
Task-Adaptive Ensemble ofMeta-Learners for Few-Shot
Classification
![Page 9: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive](https://reader033.fdocuments.net/reader033/viewer/2022042313/5edf2b0dad6a402d666a84a3/html5/thumbnails/9.jpg)
9/13
Motivation
![Page 10: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive](https://reader033.fdocuments.net/reader033/viewer/2022042313/5edf2b0dad6a402d666a84a3/html5/thumbnails/10.jpg)
10/13
Motivation
I Few-shot classification needs to generalize training episodesand outperform in test episodes.
I Domain distribution of meta-learner for few-shot classificationis assumed not to be changed.
I In practice, domain distribution is able to be varied.
I We try to make ensemble of several meta-learners, each ofwhich is trained by the episodes from single dataset.
![Page 11: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive](https://reader033.fdocuments.net/reader033/viewer/2022042313/5edf2b0dad6a402d666a84a3/html5/thumbnails/11.jpg)
11/13
Main Architecture
![Page 12: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive](https://reader033.fdocuments.net/reader033/viewer/2022042313/5edf2b0dad6a402d666a84a3/html5/thumbnails/12.jpg)
12/13
Experiments
![Page 13: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive](https://reader033.fdocuments.net/reader033/viewer/2022042313/5edf2b0dad6a402d666a84a3/html5/thumbnails/13.jpg)
13/13
Experiments