YellowFin and the Art of Momentum Tuning - DAWN · 3/8/2018 · YellowFin and the Art of Momentum...

YellowFin and the Art of Momentum Tuning Jian Zhang1, Ioannis Mitliagkas2

1Stanford University, 2MILA, University of Montreal

Adaptive OptimizationHyperparameter tuning is a big cost of deep learning.

Momentum: a key hyperparameter to SGD and variants.

Adaptive methods, e.g. Adam1, don’t tune momentum.

YellowFin optimizer • Based on the robustness properties of momentum.

• Auto-tuning of momentum and learning rate in SGD.

• Closed-loop momentum control for async. training.

ExperimentsResNet and LSTM YellowFin runs with no tuning.

Adam, mom. SGD etc. are tuned on learning rate grids.

YellowFin can outperform tuned SoA on train/val. metrics.

Facebook Convolutional Seq-to-seq model IWSLT 2014 German-English translationYellowFin outperforms the hand-tuned default optimizer

Extension: Closed-loop YellowFinAsync. distributed training: fast, no sync. barrier.

However, Asynchrony induces additional to .

Can we auto-match total momentum to YF-tuned ?

Closed-loop momentum control

Closed-loop mechanism improves YellowFin in async..

Momentum operation• SGD step: .

• In a 1-D case, matrix form with :

Robust region , given .

• Linear rate robust to curvature variance (middle).• Linear rate robust to a range of learning rates (left).

Robustness of Momentum

YellowFinNoisy quadratic model • Model stochastic setting with gradient variance .• Local quadratic approximation3: 1-D case. In robust region, distance to optimum is approx. by

Greedy tuning strategy

• Solve learning rate and momentum in closed-form sol.

E(xt � x

⇤)2 ⇡ µ

t(x0 � x

⇤)2 + (1� µ

1� µ

E(x0 � x

Async. induced

Effective total value

Algorithmic

µ µ+ � · (µ⇤ � µT )Feedback control

µT = µ+ µ̄

YF target valueµ?

Github for PyTorch: https://github.com/JianGoForIt/YellowFin_PytorchGithub for TensorFlow: https://github.com/JianGoForIt/YellowFin

Principle I: Stay in the robust region

Principle II: Minimize after one step ( )

✓xt+1 � x

xt � x

1�↵h(xt)+µ �µ

�✓xt � x

xt�1 � x

◆,At

✓xt � x

xt�1 � x

h(xt)h

(1�pµ)2

↵ (1 +

Spectral radius ⇢(At)=pµ linear convergence2

xt+1 = xt � ↵rf(xt) + µ(xt � xt�1)

rf(xt)=h(xt)(xt�x

is curvature in quadratics.h

1.[Kingma et. al 15]

3.[Schaul et. al 13]

0k 30k 60k 90k 120k

Iterations

CIFAR100 ResNet 164AdamYellowFinClosed-loopYellowFin

0k 5k 10k 15k 20k 25k 30k

Iterations

xity 2-layer PTB LSTM

Momentum SGDAdamYellowFin

0k 5k 10k 15k 20k

Iterations

2-layer TinyShakespeare Char-RNNMomentum SGDAdamYellowFin

0k 10k 20k 30k 40k

Iterations

10�1

ResNet 110 CIFAR10Momentum SGDAdamYellowFin

0k 30k 60k 90k 120k

Iterations

3-layer WSJ LSTM

Momentum SGDAdamYellowFinAdagradVanilla SGD

Val. loss Val. BLEU@4Default Nesterov Momentum 2.86 30.75

YellowFin 2.75 31.59

2. Not guarrenteed for non-quadratics

[Mitliagkas et. al 16]

“Send us your bug reports!”

YellowFin and the Art of Momentum Tuning - DAWN · 3/8/2018 · YellowFin and the Art of Momentum...

Documents

Transcript of YellowFin and the Art of Momentum Tuning - DAWN · 3/8/2018 · YellowFin and the Art of Momentum...

SKJ Skipjack BET Bigeye tuna YFT Yellowfin tuna · SKJ Skipjack . BET Bigeye tuna . YFT Yellowfin tuna . SCRS 2013

4434501 Seafood Watch Yellowfin Tuna Report

YELLOWFIN TUNA HANDLINE FISHERIES - WWF

Yellowfin Embedded BI Best Practices

YellowFin and the Art of Momentum Tuning - MLSys · 2019. 6. 15. · Accelerated forms of stochastic gradient descent (SGD), pioneered byPolyak(1964) andNesterov(1983), are the de-facto

Yellowfin 7.2 Release Notes Final Yellowfin Business Workflow The Yellowfin Business Workflow provides a clear path and process for business users to directly connect with the BI and

STOCK ASSESSMENT FOR ATLANTIC YELLOWFIN TUNA …€¦ · STOCK ASSESSMENT FOR ATLANTIC YELLOWFIN TUNA USING A NON-EQUILIBRIUM PRODUCTION MODEL. ... la biomasse s'étant rétablie

Yellowfin Business Intelligence

Yellowfin 7.2 launch webinar presentation slides

UNLOCK YOUR POTENTIAL WITHINTELLIGENT SYSTEM TUNING - Home | Workplace … · 2019-03-19 · UNLOCK YOUR POTENTIAL WITHINTELLIGENT SYSTEM TUNING As digital transformation gains momentum,

Yellowfin 7.1 launch webinar slides

Modelling the yellowfin tuna (Thunnus albacares) vertical ...horizon.documentation.ird.fr/exl-doc/pleins_textes/... · Aquat. Living Resour., 1993, 6, 1-14 Modelling the yellowfin

Yellowfin Tuna 1975-2005

SUSHI - Bungalow · SUSHI SASHIMI (3 PC) Yellowfin Tuna 75 Raw, Seared Norwegian Salmon 90 Raw, Seared NIGIRI RAW (3 PC) Yellowfin Tuna 80 Norwegian Salmon 95 MAKI (8 PC) Yellowfin

REPRODUCTIVE BIOLOGY OF YELLOWFIN TUNA Thunnus …

Yellowfin BI 5.1 Launch

YELLOWFIN TUNA HANDLINE FISHERIESawsassets.panda.org/downloads/bfar_factsheet_2013_july2013.pdf · From Crisis to Opportunity: Making the Transition Toward Sustainable Fisheries YELLOWFIN

Yellowfin tuna ( - NOAA

Model Permintaan Yellowfin Segar Indonesia di Pasar Jepang

Webinar Emerasoft presenta yellowfin