Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan...

38
Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random walk Dynamic HMC adaptive simulation time Adaptation of algorithm parameters mass matrix and step size adaptation during warm-up Dynamic HMC specific diagnostics Aki.Vehtari@aalto.fi – @avehtari

Transcript of Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan...

Page 1: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Dynamic Hamiltonian Monte Carlo in Stan

Hamiltonian Monte Carlouse of gradient information and dynamic simulation reducerandom walk

Dynamic HMCadaptive simulation time

Adaptation of algorithm parametersmass matrix and step size adaptation during warm-up

Dynamic HMC specific diagnostics

[email protected] – @avehtari

Page 2: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Extra material for dynamic HMC

Michael Betancourt (2018). Scalable Bayesian Inferencewith Hamiltonian Monte Carlohttps://www.youtube.com/watch?v=jUSZboSq1zgMichael Betancourt (2018). A Conceptual Introduction toHamiltonian Monte Carlo. https://arxiv.org/abs/1701.02434http://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/Cole C. Monnahan, James T. Thorson, and Trevor A.Branch (2016) Faster estimation of Bayesian models inecology using Hamiltonian Monte Carlo.https://dx.doi.org/10.1111/2041-210X.12681

[email protected] – @avehtari

Page 3: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Demos

https://github.com/avehtari/BDA_R_demos/tree/master/demos_ch12

demos_ch12/demo12_1.R

http://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/

[email protected] – @avehtari

Page 4: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

[email protected] – @avehtari

Page 5: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

−2

0

2

−2 0 2

theta1

thet

a2

● Samples Steps of the sampler 90% HPD

[email protected] – @avehtari

Page 6: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

−2

0

2

−2 0 2

theta1

thet

a2

● Samples Steps of the sampler 90% HPD

[email protected] – @avehtari

Page 7: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

−2

0

2

−2 0 2

theta1

thet

a2

● Samples Steps of the sampler 90% HPD

[email protected] – @avehtari

Page 8: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

● ●

−2

0

2

−2 0 2

theta1

thet

a2

● Samples Steps of the sampler 90% HPD

[email protected] – @avehtari

Page 9: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

● ●●

−2

0

2

−2 0 2

theta1

thet

a2

● Samples Steps of the sampler 90% HPD

[email protected] – @avehtari

Page 10: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

● ●●

−2

0

2

−2 0 2

theta1

thet

a2

● Samples Steps of the sampler 90% HPD

[email protected] – @avehtari

Page 11: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

● ●●

−2

0

2

−2 0 2

theta1

thet

a2

● Samples Steps of the sampler 90% HPD

[email protected] – @avehtari

Page 12: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

● ●●

−2

0

2

−2 0 2

theta1

thet

a2

● Samples Steps of the sampler 90% HPD

[email protected] – @avehtari

Page 13: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

● ●●

−2

0

2

−2 0 2

theta1

thet

a2

● Samples Steps of the sampler 90% HPD

[email protected] – @avehtari

Page 14: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

● ●●

−2

0

2

−2 0 2

theta1

thet

a2

● Samples Steps of the sampler 90% HPD

[email protected] – @avehtari

Page 15: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

−2

0

2

0 250 500 750 1000iter

valu

e

theta1 theta2

Trends

[email protected] – @avehtari

Page 16: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

0.00

0.25

0.50

0.75

1.00

0 5 10 15 20iter

valu

e

theta1 theta2

Autocorrelation function

[email protected] – @avehtari

Page 17: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte CarloUses gradient information for more efficient samplingAugments parameter space with momentum variables

−1

0

1

0 250 500 750 1000iter

valu

e

theta1 theta2 95% interval for MCMC error 95% interval for independent MC

Cumulative averages

[email protected] – @avehtari

Page 18: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte Carlo

Uses gradient information for more efficient samplingAugments parameter space with momentum variablesSimulation of Hamiltonian dynamics reduces random walk

http://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/

[email protected] – @avehtari

Page 19: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte Carlo

Uses gradient information for more efficient samplingAlternating dynamic simulation and sampling of the energylevel

Parameters: step size, number of steps in each chainNo U-Turn Sampling (NUTS) and dynamic HMC

adaptively selects number of steps to improve robustnessand efficiencydynamic HMC refers to dynamic trajectory lengthto keep reversibility of Markov chain, need to simulate intwo directionshttp://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/

Dynamic simulation is discretizedsmall step size gives accurate simulation, but requires morelog density evaluationslarge step size reduces computation, but increasessimulation error which needs to be taken into account in theMarkov chain

[email protected] – @avehtari

Page 20: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte Carlo

Uses gradient information for more efficient samplingAlternating dynamic simulation and sampling of the energylevelParameters: step size, number of steps in each chain

No U-Turn Sampling (NUTS) and dynamic HMCadaptively selects number of steps to improve robustnessand efficiencydynamic HMC refers to dynamic trajectory lengthto keep reversibility of Markov chain, need to simulate intwo directionshttp://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/

Dynamic simulation is discretizedsmall step size gives accurate simulation, but requires morelog density evaluationslarge step size reduces computation, but increasessimulation error which needs to be taken into account in theMarkov chain

[email protected] – @avehtari

Page 21: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte Carlo

Uses gradient information for more efficient samplingAlternating dynamic simulation and sampling of the energylevelParameters: step size, number of steps in each chainNo U-Turn Sampling (NUTS) and dynamic HMC

adaptively selects number of steps to improve robustnessand efficiencydynamic HMC refers to dynamic trajectory lengthto keep reversibility of Markov chain, need to simulate intwo directionshttp://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/

Dynamic simulation is discretizedsmall step size gives accurate simulation, but requires morelog density evaluationslarge step size reduces computation, but increasessimulation error which needs to be taken into account in theMarkov chain

[email protected] – @avehtari

Page 22: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Hamiltonian Monte Carlo

Uses gradient information for more efficient samplingAlternating dynamic simulation and sampling of the energylevelParameters: step size, number of steps in each chainNo U-Turn Sampling (NUTS) and dynamic HMC

adaptively selects number of steps to improve robustnessand efficiencydynamic HMC refers to dynamic trajectory lengthto keep reversibility of Markov chain, need to simulate intwo directionshttp://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/

Dynamic simulation is discretizedsmall step size gives accurate simulation, but requires morelog density evaluationslarge step size reduces computation, but increasessimulation error which needs to be taken into account in theMarkov chain

[email protected] – @avehtari

Page 23: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Adaptive dynamic HMC in Stan

Dynamic HMC using growing tree to increase simulationtrajectory until no-U-turn criterion stopping

max treedepth to keep computation in controlpick a draw along the trajectory with probabilities adjustedto take into account the error in the discretized dynamicsimulation

Mass matrix and step size adaptation in Stanmass matrix refers to having different scaling for differentparameters and optionally also rotation to reducecorrelationsmass matrix and step size adjustment and are estimatedduring initial adaptation phasestep size is adjusted to be as big as possible while keepingdiscretization error in control

After adaptation the algorithm parameters are fixed andsome further iterations included in the warmupAfter warmup store iterations for inferenceSee more details in Stan reference manual

[email protected] – @avehtari

Page 24: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Adaptive dynamic HMC in Stan

Dynamic HMC using growing tree to increase simulationtrajectory until no-U-turn criterion stopping

max treedepth to keep computation in controlpick a draw along the trajectory with probabilities adjustedto take into account the error in the discretized dynamicsimulation

Mass matrix and step size adaptation in Stanmass matrix refers to having different scaling for differentparameters and optionally also rotation to reducecorrelationsmass matrix and step size adjustment and are estimatedduring initial adaptation phasestep size is adjusted to be as big as possible while keepingdiscretization error in control

After adaptation the algorithm parameters are fixed andsome further iterations included in the warmupAfter warmup store iterations for inferenceSee more details in Stan reference manual

[email protected] – @avehtari

Page 25: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Adaptive dynamic HMC in Stan

Dynamic HMC using growing tree to increase simulationtrajectory until no-U-turn criterion stopping

max treedepth to keep computation in controlpick a draw along the trajectory with probabilities adjustedto take into account the error in the discretized dynamicsimulation

Mass matrix and step size adaptation in Stanmass matrix refers to having different scaling for differentparameters and optionally also rotation to reducecorrelationsmass matrix and step size adjustment and are estimatedduring initial adaptation phasestep size is adjusted to be as big as possible while keepingdiscretization error in control

After adaptation the algorithm parameters are fixed andsome further iterations included in the warmup

After warmup store iterations for inferenceSee more details in Stan reference manual

[email protected] – @avehtari

Page 26: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Adaptive dynamic HMC in Stan

Dynamic HMC using growing tree to increase simulationtrajectory until no-U-turn criterion stopping

max treedepth to keep computation in controlpick a draw along the trajectory with probabilities adjustedto take into account the error in the discretized dynamicsimulation

Mass matrix and step size adaptation in Stanmass matrix refers to having different scaling for differentparameters and optionally also rotation to reducecorrelationsmass matrix and step size adjustment and are estimatedduring initial adaptation phasestep size is adjusted to be as big as possible while keepingdiscretization error in control

After adaptation the algorithm parameters are fixed andsome further iterations included in the warmupAfter warmup store iterations for inference

See more details in Stan reference manual

[email protected] – @avehtari

Page 27: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Adaptive dynamic HMC in Stan

Dynamic HMC using growing tree to increase simulationtrajectory until no-U-turn criterion stopping

max treedepth to keep computation in controlpick a draw along the trajectory with probabilities adjustedto take into account the error in the discretized dynamicsimulation

Mass matrix and step size adaptation in Stanmass matrix refers to having different scaling for differentparameters and optionally also rotation to reducecorrelationsmass matrix and step size adjustment and are estimatedduring initial adaptation phasestep size is adjusted to be as big as possible while keepingdiscretization error in control

After adaptation the algorithm parameters are fixed andsome further iterations included in the warmupAfter warmup store iterations for inferenceSee more details in Stan reference manual

[email protected] – @avehtari

Page 28: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Dynamic HMC

Comparison of algorithms on highly correlated 250-dimensional Gaussian distribution

•Do 1,000,000 draws with both Random Walk Metropolis and Gibbs, thinning by 1000

•Do 1,000 draws using Stan’s NUTS algorithm (no thinning)

•Do 1,000 independent draws (we can do this for multivariate normal)

Source: Jonah [email protected] – @avehtari

Page 29: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

Max tree depth diagnostic

Dynamic HMC specific diagnosticIndicates inefficiency in sampling leading to higherautocorrelations and lower neff

Different parameterizations matter

[email protected] – @avehtari

Page 30: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

DivergencesHMC specific: Indicates that Hamiltonian dynamicsimulation has problems going to narrow places

indicates possibility of biased estimatesDifferent parameterizations matterhttp://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

[email protected] – @avehtari

Page 31: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

DivergencesHMC specific: Indicates that Hamiltonian dynamicsimulation has problems going to narrow places

indicates possibility of biased estimatesDifferent parameterizations matterhttp://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

[email protected] – @avehtari

Page 32: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

DivergencesHMC specific: Indicates that Hamiltonian dynamicsimulation has problems going to narrow places

indicates possibility of biased estimatesDifferent parameterizations matterhttp://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

[email protected] – @avehtari

Page 33: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

DivergencesHMC specific: Indicates that Hamiltonian dynamicsimulation has problems going to narrow places

indicates possibility of biased estimatesDifferent parameterizations matterhttp://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

[email protected] – @avehtari

Page 34: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

DivergencesHMC specific: Indicates that Hamiltonian dynamicsimulation has problems going to narrow places

indicates possibility of biased estimatesDifferent parameterizations matterhttp://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

[email protected] – @avehtari

Page 35: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

DivergencesHMC specific: Indicates that Hamiltonian dynamicsimulation has problems going to narrow places

indicates possibility of biased estimatesDifferent parameterizations matterhttp://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

[email protected] – @avehtari

Page 36: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

DivergencesHMC specific: Indicates that Hamiltonian dynamicsimulation has problems going to narrow places

indicates possibility of biased estimatesDifferent parameterizations matterhttp://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

[email protected] – @avehtari

Page 37: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

DivergencesHMC specific: Indicates that Hamiltonian dynamicsimulation has problems going to narrow places

indicates possibility of biased estimatesDifferent parameterizations matterhttp://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

[email protected] – @avehtari

Page 38: Dynamic Hamiltonian Monte Carlo - GitHub Pages · Dynamic Hamiltonian Monte Carlo in Stan Hamiltonian Monte Carlo use of gradient information and dynamic simulation reduce random

DivergencesHMC specific: Indicates that Hamiltonian dynamicsimulation has problems going to narrow places

indicates possibility of biased estimatesDifferent parameterizations matterhttp://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

[email protected] – @avehtari