Handbook of Markov Chain Monte Carlo

Chapman & Hall/CRC Handbooks of Modern Statistical Methods

Series Editor

Garrett FitzmauriceDepartment o f Biostatistics

Harvard School o f Public Health Boston, MA, U.S.A.

Aims and Scope

The objective o f the series is to provide high-quality volumes covering the state-of-the-art in the theory

and applications o f statistical methodology. The hooks in the series are thoroughly edited and present

com prehensive, coherent, and unified summaries o f specific methodological topics from statistics. The

chapters are written by the leading researchers in the field, and present a good balance o f theory and

application through a synthesis o f the key methodological developments and examples and case studies

using real data.

The scope o f the series is wide, covering topics o f statistical methodology that are well developed and

find application in a range o f scientific disciplines. The volumes are primarily o f interest to researchers

and graduate students from statistics and biostatistics, but also appeal to scientists from fields where the

methodology is applied to real problem s, including medical research, epidemiology and public health,

engineering, biological science, environmental science, and the social sciences.

Published Titles

Longitudinal Data AnalysisEdited by Garrett Fitzmaurice, Marie Davidian,

Geert Verheke, and Geert Molenherghs

Handbook of Spatial StatisticsEdited by Alan E, Gelfand, Peter J. Diggle,

Montserrat Fuentes, and Peter Guttorp

Handbook of M arkov Chain Monte CarloEdited by Steve Brooks, Andrew Gelman,

Galin L. Jones, andXiao-Li Meng

Chapman & Hall /CRC

Handbooks of Modern Statistical Methods

Handbook of Markov Chain Monte Carlo

Edited by

Steve Brooks Andrew Gelman

Galin L. Jones Xiao-Li Meng

C R C Press is an im p rin l of theTaylor & Francis C ro u p , an inform a business

A C H A P M A N & H A L L B O O K

CRC PressTaylor S. Francis Group Boca Ralon London New York

M A T L A B is a tradem ark o f T he M athW orks, Inc. and is used with perm ission, T he M athW orks does not w arrant the accuracy o f the tex t or exercises in th is book. T h is book's use or discussion o f M A T L A B 7, softw are or related products does not constitu te endorsem ent or sponsorship by T he M athW orks o f a particu lar pedagogical approach or particu lar use o f the M A TLA IT softw are.

Chapm an & Hall/CRCTaylor & Francis Group6 0 0 0 Broken Sound Parkway NW, Suite 3 0 0Boca Raton, FL 33487-2742

2011 by Taylor and Francis Group, LLCChapm an & Hall/CRC Ls an im print o f Taylor & Francis G roup, an Inform a business

No claim to original U.S. G overnm ent works

International Standard Book Number-13: 978-1-4200-7942-5 (eBook - PDF)

T h is book contains inform ation obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and inform ation, but the author and publisher cannot assume responsibility for the validity o f all m aterials or the consequences o f their use. T he authors and publishers have attem pted to trace the copyright holders o f all m aterial reproduced in th is publication and apologize to copyright holders if perm ission to publish in this form has not been obtained. I f any copyright m aterial has not been acknowledged please w rite and let us know so we may rectify in any future reprint.

Except as perm itted under U.S. Copyright Law, no part o f th is book m ay be reprinted, reproduced, transm itted , or utilized in any form by any electronic, m echanical, or oth er m eans, now known or hereafter invented, including photocopying, m icrofilm ing, and recording, or in any inform ation storage or retrieval system, w ithout w ritten p erm ission from the publishers.

For perm ission to photocopy or use m aterial electronically from th is work, please access ww w.copyright.com (http:// www.copyright.com/) or co n tact the Copyright C learance Center, Inc. (CC C), 222 Rosewood Drive, Danvers, M A 01923, 9 7 8 -7 5 0 -8 4 0 0 . C C C is a not-for-profit organization that provides licenses and registration for a variety o f users. For organizations th at have been granted a photocopy license by the CCC , a separate system o f payment has been arranged.

T ra d e m a rk N otice : Product or corporate nam es may be tradem arks or registered tradem arks, and are used only for identification and explanation without in tent to infringe.

Visit the Taylor & Francis Web site at http: //w w w,taylor a nd f ranci s .com

and the CRC Press Web site at http://www,crcpress,com

http://www.copyright.comhttp://www.copyright.com/http://www,crcpress,com

Contents

Preface.......................................................................................................................................... xixEditors.......................................................................................................................................... xxiContributors.............................................................................................................................. xxiii

Part I Foundations, Methodology, and Algorithms

1. Introduction to Markov Chain Monte Carlo..................................................................... 3Charles J. Geyer

1.1 History............................................................................................................................31.2 Markov Chains..............................................................................................................41.3 Computer Programs and Markov Chains .................................................................51.4 Stationaiity.....................................................................................................................51.5 Reversibility...................................................................................................................61.6 Functionals.....................................................................................................................61.7 The Theory of Ordinary Monte Carlo ...................................................................... 61.8 The Theory of MCMC.................................................................................................. 8

1.8.1 Multivariate Theory......................................................................................... 81.8.2 The Autocovariance Function.........................................................................9

1.9 AR{1) Example..............................................................................................................91.9.1 A Digression, on Toy Problems.................................................................. 101.9.2 Supporting Technical R eport.....................................................................111.9.3 The Example................................................................................................ 11

1.10 Variance Estimation ................................................................................................. 131.10.1 Nonoverlapping Batch M eans.................................................................. 131.10.2 Initial Sequence Methods........................................................................... 161.10.3 Initial Sequence Methods and Batch M ean s...........................................17

1 11 The Practice of M C M C ............................................................................................171.11.1 Black Box MCMC .......................................................................................181.11.2 Pseudo-Convergence.................................................................................. 181.11.3 One Long Run versus Many Short Runs.................................................. 181.11.4 Burn-In......................................................................................................... 191.11.5 Diagnostics...................................................................................................21

1.12 Elementary Theory of MCMC................................................................................ 221.12.1 The Metropolis-Hastings Update............................................................. 221.12.2 The Metropolis-Hastings Theorem........................................................... 231.12.3 The Metropolis U pdate..............................................................................241.12.4 The Gibbs U pdate.......................................................................................241.12.5 Vaiiable-at-a-Time Metropolis-Hastings.................................................. 251.12.6 Gibbs Is a Special Case of Metropolis-Hastings .................................... 261.12.7 Combining Updates.....................................................................................26

1.12.7.1 Composition................................................................................. 261.12.7.2 Palindromic Composition..........................................................26

1.12.8 State-Independent M ixing......................................................................... 261.12.9 Subsampling................................................................................................ 271.12.10 Gibbs and Metropolis Revisited................................................................ 28

v

vi Contents

1.13 A Metropolis Example............................................................................................... 291.14 Checkpointing...........................................................................................................341.15 Designing MCMC C o d e ...........................................................................................351.16 Validating and Debugging MCMC Code...............................................................361.17 The Metropolis-Hastings Green Algorithm.......................................................... 37

1.17.1 State-Dependent Mixing ........................................................................... 381.17.2 Radon-Nikodym Derivatives.................................................................... 391.17.3 Measure-Theoretic Metropolis-Hastings ............................................... 40

1.17.3.1 Metropolis-Hastings-Green Elementary U pdate...................401.17.3.2 The MHG Theorem..................................................................... 42

1.17.4 MHG with Jacobians and Augmented State S p a ce ............................... 451.17.4.1 The MHGJ Theorem ................................................................... 46

Acknowledgments............................................................................................................. 47References........................................................................................................................... 47

2. A Short History of MCMC: Subjective Recollections from Incomplete Data....... 49Christian. Rd>ert and George Casella

2.1 Introduction................................................................................................................492.2 Before the Revolution............................................................................................... 50

2.2.1 The Metropolis et al. (1953) Paper............................................................. 502.2.2 The Hastings (1970) Paper.........................................................................52

2.3 Seeds of the Revolution.............................................................................................532.3.1 Besag and the Fundamental (Missing) Theorem....................................532.3.2 EM and Its Simulated Versions as Precursors........................................ 532.3.3 Gibbs and Beyond....................................................................................... 54

2.4 The Revolution...........................................................................................................542.4.1 Advances in MCMC Theory...................................................................... 562.4.2 Advances in MCMC Applications...........................................................57

2.5 After the Revolution..................................................................................................582.5.1 A Brief Glimpse atPartide System s.........................................................582.5.2 Perfect Sampling......................................................................................... 582.5.3 Reversible Jump and Variable Dimensions.............................................592.5.4 Regeneration and the Central Limit Theorem........................................ 59

2.6 Conclusion..................................................................................................................60Acknowledgments............................................................................................................. 61References........................................................................................................................... 61

3. Reversible Jump MCMC................................................................................................... 67Yimtvi Fail mid Scoft A. Sisson

3.1 Introduction................................................................................................................673.1.1 From Metropolis-Hastings to Reversible Jum p...................................... 673.1.2 Application A reas.......................................................................................68

3.2 Implementation.........................................................................................................713.2.1 Mapping Functions and Proposal Distributions....................................723.2.2 Marginalization and Augmentation.........................................................733.2.3 Centering and Order Methods..................................................................743.2.4 Multi-Step Proposals.................................................................................. 773.2.5 Generic Samplers......................................................................................... 78

3.3 Post Simulation............................................................................................................ 803.3.1 Label Switching.............................................................................................. 803.3.2 Convergence Assessment.............................................................................. 813.3.3 Estimating Bayes Factors.............................................................................. 82

3.4 Related Multi-Model Sampling Methods................................................................. 843.4.1 Jump Diffusion.............................................................................................. 843.4.2 Product Space Formulations..........................................................................853.4.3 Point Process Formulations..........................................................................853.4.4 Multi-Model Optimization............................................................................ 853.4.5 Population MCMC ....................................................................................... 863.4.6 Multi-Model Sequential Monte C arlo......................................................... 86

3.5 Discussion and Future Directions............................................................................86Acknowledgments............................................................................................................... 87References............................................................................................................................. 87

4. Optimal Proposal Distributions and Adaptive MCMC...............................................93Jeffrey 5. Rosenthal

4.1 Introduction.................................................................................................................934.1.1 The MetropoEs-Hastings Algorithm ......................................................... 934.1.2 Optimal Scaling.............................................................................................. 934.1.3 Adaptive MCMC............................................................................................944.1.4 Comparing Markov Chains......................................................................... 94

4.2 Optimal Scaling of Random-Walk Metropolis........................................................954.2.1 Basic Principles.............................................................................................. 954.2.2 Optimal Acceptance Rate as d o o ........................................................... 964.2.3 Inliomogeneous Target Distributions......................................................... 984.2.4 Metropolis Ad jus ted Langevm Algorithm................................................994.2.5 Numerical Examples..................................................................................... 99

4.2.5.1 Ctff-Diagonal Covariance.......................................................... 1004.2.5.2 Inhomogeneous Covariance..................................................... 100

4.2.6 Frequently Asked Questions .................................................................. 1014.3 Adaptive MCMC ...................................................................................................102

4.3.1 Ergodidty of Adaptive MCMC................................................................ 1034.3.2 Adaptive Metropolis................................................................................ 1044.3.3 Adaptive Metropolis-within-Gibbs......................................................... 1054.3.4 State-Dependent Proposal Scalings......................................................... 1074.3.5 Limit Theorems..........................................................................................1074.3.6 Frequently Asked Questions .................................................................. 108

4.4 Conclusion...............................................................................................................109References.........................................................................................................................110

5. MCMC Using Hamiltonian Dynamics.......................................................................... 113Radford M. Neat

5.1 Introduction............................................................................................................ 1135.2 Hamiltonian Dynamics..........................................................................................114

5.2.1 Hamilton's Equations.............................................................................. 1145.2.1.1 Equations of M otion.................................................................1145.2.1.2 Potential and Kinetic Energy................................................... 1155.2.1.3 A One-Dimensional Example................................................... 116

Contents vii

viii Contents

5.2.2 Properties of Hamiltonian Dynamics....................................................1165.2.2.1 Reversibility............................................................................. 1165.2.2.2 Conservation of the Hamiltonian...........................................1165.2.2.3 Volume Preservation................................................................1175.2.2.4 Symplecticness.........................................................................119

5.2.3 Discretizing Hamilton's EquationsThe Leapfrog Method...............1195.2.3.1 Euler's Method.........................................................................1195.2.3.2 A Modification of Euler's M ethod........................................ 1215.2.3.3 The Leapfrog M ethod............................................................. 1215.2.3.4 Local and Global Error of Discretization M ethods............ 122

5.3 MCMC from Hamiltonian Dynamics................................................................... 1225.3.1 Probability and the Hamiltonian: Canonical Distributions ...............1225.3.2 The Hamiltonian Monte Carlo Algorithm.............................................123

5.3.2.1 The Two Steps of the HMC Algorithm................................. 1245.3.2.2 Proof That HMC Leaves the Canonical

Distribution Invariant...............................................................1265.3.2.3 Ergodidty of H M C ..................................................................127

5.3.3 Illustrations of HMC and Its Benefits....................................................1275.3.3.1 Trajectories for a Two-Dimensional Problem ......................1275.3.3.2 Sampling from a Two-Dimensional Distribution.................1285.3.3.3 The Benefit of Avoiding Random Walks...............................1305.3.3.4 Sampling from a 100-Dimensional Distribution.................130

5.4 HMC in Practice and Theory................................................................................. 1335.4.1 Effect of Linear Transformations............................................................. 1335.4.2 Tuning H M C............................................................................................. 134

5.4.2.1 Preliminary Runs and Trace Plots ........................................ 1345.4.2.2 What Stepsize?.........................................................................1355.4.2.3 What Trajectory Length?.........................................................1375.4.2.4 Using Multiple Stepsizes.........................................................137

5.4.3 Combining HMC with Other MCMC Updates....................................1385.4.4 Scaling with Dimensionality ..................................................................139

5.4.4.1 Creating Distributions of Increasing Dimensionalityby Replication............................................................................ 139

5.4.4.2 Scaling of HMC and Random-Walk Metropolis.................1395.4.4.3 Optimal Acceptance Rates...................................................... 1415.4.4.4 Exploring the Distribution of Potential Eneigy................... 142

5.4.5 HMC for Hierarchical M odels................................................................1425.5 Extensions of and Variations on HM C................................................................. 144

5.5.1 Discretization by Splitting: Handling Constraints and OtherApplications.............................................................................................1455.5.1.1 Splitting the Hamiltonian...................................................... 1455.5.1.2 Splitting to Exploit Partial Analytical Solutions.................1465.5.1.3 Splitting Potential Energies with Variable Computation

Costs.............................................................................................1465.5.1.4 Splitting According to Data Subsets...................................... 1475.5.1.5 Handling Constraints............................................................. 148

5.5.2 Taking One Step at a TimeThe Langevin Method............................ 1485.5.3 Partial Momentum Refreshment: Another Way to Avoid

Random Walks ........................................................................................ 150

5.5.4 Acceptance Using Windows of States........................................................1525.5.5 Using Approximations to Compute the Trajectoiy .................................1555.5.6 Short-Cut Trajectories: Adapting the Step size without Adaptation .1565.5.7 Tempering during a Trajectory................................................................... 157

Acknowledgment............................................................................................................. 160References.........................................................................................................................160

6 . Inference from Simulations and Monitoring Convergence....................................... 163Andrew Gelm/vi and Kenneth Shirley

6 .1 Quick Summary of Recommendations................................................................1636.2 Key Differences between Point Estimation and MCMC Inference.................. 1646 .3 Inference for Functions of the Parameters vs. Inference for Functions of the

Target Distribution................................................................................................. 1666 .4 Inference from Noniterative Simulations........................................................... 1676.5 Burn-In..................................................................................................................... 1686.6 Monitoring Convergence Comparing between and within Chains............... 1706.7 Inference from Simulations after Approximate Convergence..........................1716.8 Summary.................................................................................................................172Acknowledgments...........................................................................................................173References.........................................................................................................................173

7. Implementing MCMC: Estimating with Confidence.................................................. 175James M. Flegnl mid Gdin L. Jones

7.1 Introduction............................................................................................................ 1757.2 Initial Examination of Output..............................................................................1767.3 Point Estimates of 6^ .............................................................................................. 178

7.3.1 Expectations..................................................................................................1787.3.2 Quantiles.........................................................................................................181

7.4 Interval Estimates of 6* ..............................................................................................1827.4.1 Expectations..................................................................................................182

7.4.1.1 Overlapping Batch M eans...........................................................1827.4.1.2 Parallel C hain s..............................................................................184

7.4.2 Functions of Moments..................................................................................1857.4.3 Quantiles........................................................................................................ 187

7.4.3.1 Subsampling Bootstrap............................................................... 1877.4.4 Multivariate Estimation............................................................................... 189

7.5 Estimating Marginal Densities..............................................................................1897.6 Terminating the Simulation.................................................................................. 1927.7 Markov Cham Central Limit Theorems..............................................................1937.8 Discussion.............................................................................................................. 194Acknowledgments...........................................................................................................195References.........................................................................................................................195

8 . Perfection within Reach: Exact MCMC Sampling....................................................... 199Rndn V. Crain and Xino-Li Meng

8.1 Intended Readership.............................................................................................. 1998.2 Coupling from the P ast......................................................................................... 199

8.2.1 Moving from Time-Forward to Time-Backward..................................... 199

Contents ix

X Contents

8.2.2 Hitting the L im it....................................................................................... 200S. 2.3 Challenges for Routine Applications .................................................... 201

8.3 Coalescence Assessment....................................................................................... 201S. 3.1 Illustrating Monotone Coupling..............................................................201S. 3.2 Illustrating Brute-Force Coupling........................................................... 202S. 3.3 General Classes of Monotone Coupling................................................ 203S. 3.4 Bounding Chains....................................................................................... 204

8.4 Cost-Saving Strategies for Implementing Perfect Sam pling...........................206S. 4.1 Read-Once CFTP....................................................................................... 2065.4.2 Fill's Algorithm..........................................................................................208

8.5 Coupling M ethods................................................................................................ 2108.5.1 Splitting Technique...................................................................................2115.5.2 Coupling via a Common Proposal......................................................... 2128.5.3 Coupling via Discrete Data Augmentation........................................... 2135.5.4 Perfect Slice Sampling..............................................................................215

8.6 Swindles................................................................................................................... 217S. 6.1 Efficient Use of Exact Samples via Concatenation............................... 218S.6.2 Multistage Perfect Sampling .................................................................. 219S. 6.3 Antithetic Perfect Sampling.....................................................2208.6.4 Integrating Exact and Approximate MCMC Algorithms....................221

8.7 Where Are the Applications?................................................................................ 223Acknowledgments...........................................................................................................223References.........................................................................................................................223

9. Spatial Point Processes..................................................................................................... 227Mark Huber

9.1 Introduction............................................................................................................ 2279.2 Setu p ........................................................................................................................2279.3 Metropolis-Hastings Reversible Jump Chains.................................................. 230

9.3.1 Examples..................................................................................................... 2329.3.2 Convergence.............................................................................................. 232

9.4 Continuous-Time Spatial Birth-Death C hains.................................................. 2339.4.1 Examples..................................................................................................... 2359.4.2 Shifting Moves with Spatial Birth and Death Chains...........................2369.4.3 Convergence.............................................................................................. 236

9.5 Perfect Sampling..................................................................................................... 2369.5.1 Acceptance/Rejection Method................................................................2369.5.2 Dominated Coupling from the P a s t .......................................................2389.5.3 Examples..................................................................................................... 242

9.6 Monte Carlo Posterior D raw s..............................................................................2439.7 Running Time Analysis...........................................................................................245

9.7.1 Running Time of Perfect Simulation Methods...................................... 248Acknowledgment............................................................................................................. 251References.........................................................................................................................251

10. The Data Augmentation Algorithm: Theory and Methodology.............................. 253James P. Hd>ei't

10.1 Basic Ideas and Examples.....................................................................................253

10.2 Properties of the DA Markov Chain......................................................................26110.2.1 Basic Regularity Conditions...................................................................26110.2.2 Basic Convergence Properties.................................................................26310.2.3 Geometric Ergodidty...............................................................................26410.2.4 Central Limit Theorems..........................................................................267

10.3 Choosing the Monte Carlo Sample S ize ...............................................................26910.3.1 Classical Monte C a rlo ............................................................................ 26910.3.2 Three Markov Chains Closely Related to X ......................................... 27010.3.3 Minorization, Regeneration and an Alternative CLT......................... 27210.3.4 Simulating the Split Chain..................................................................... 27510.3.5 A General Method for Constructing the Minorization Condition . . . 277

10.4 Improving the DA Algorithm...............................................................................27910.4.1 The PX-D A and Marginal Augmentation Algorithms...................... 28010.4.2 The Operator As so dated with a Reversible Markov C hain ............. 28410.4.3 A Theoretical Comparison of the DA and PX-D A Algorithms.........28610.4.4 Is There a Best PX-D A Algorithm?....................................................... 288

Acknowledgments...........................................................................................................291References.........................................................................................................................291

11. Importance Sampling, Simulated Tempering, and Umbrella Sampling..................295Charles J. Geyer

11.1 Importance Sampling.............................................................................................29511.2 Simula ted Tempering.............................................................................................297

11.2.1 Parallel Tempering Update..................................................................... 29911.2.2 Serial Tempering Update........................................................................30011.2.3 Effectiveness of Tempering..................................................................... 30011.2.4 Tuning Serial Tempering....................................................................... 30111.2.5 Umbrella Sampling.................................................................................302

11.3 Bayes Factors and Normalizing Constants.......................................................... 30311.3.1 Theory........................................................................................................30311.3.2 Practice......................................................................................................305

11.3.2.1 Setup .......................................................................................... 30511.3.2.2 Trial and Error.............................................................................30711.3.2.3 Monte Carlo Approximation................................................... 308

11.3.3 Discussion................................................................................................. 309Acknowledgments...........................................................................................................310References.........................................................................................................................310

12. Likelihood-Free MCMC.................................................................................................... 313Scott A. Sisson and Yanan Fan

12.1 Introduction............................................................................................................. 31312.2 Review of Likelihood-Free Theory and Methods.............................................. 314

12.2.1 Likelihood-Free Basics............................................................................ 31412.2.2 The Nature of the Posterior Approximation.......................................31512.2.3 A Simple Example...................................................................................316

12.3 Likelihood-Free MCMC Samplers........................................................................31712.3.1 Marginal Space Samplers........................................................................31912.3.2 Error-Distribution Augmented Sam plers............................................320

Contents xi

12.3.3 Potential Alternative MCMC Samplers................................................. 32112.4 A Practical Guide to Likelihood-Free MCMC..................................................... 322

12.4.1 An Exploratory Analysis.........................................................................32212.4.2 The Effect of e ........................................................................................... 32412.4.3 The Effect of the Weighting Density...................................................... 32612.4.4 The Choice of Summary Statistics...........................................................32712.4.5 Improving Mixing.................................................................................... 32912.4.6 Evaluating Model Misspedfication........................................................ 330

12.5 Discussion............................................................................................................... 331Acknowledgments...........................................................................................................333References.........................................................................................................................333

Part II A p p lica tio n s and C ase S tu d ies

13. MCMC in the Analysis of Genetic Data on Related Individuals..............................339Elizabeth Thompson

13.1 Introduction............................................................................................................. 33913.2 Pedigrees, Genetic Variants, and the Inheritance of Genome............................34013.3 Conditional Independence Structures of Genetic D ata .....................................341

13.3.1 Genotypic Structure of Pedigree Data ..................................................34213.3.2 Inheritance Structure of Genetic Data ................................................. 34413.3.3 Identical by Descent Structure of Genetic D ata....................................34713.3.4 ibd-Graph Computations for Markers and Traits................................. 348

13.4 MCMC Sampling of Latent Variables................................................................... 34913.4.1 Genotypes and Meioses........................................................................... 34913.4.2 Some Block Gibbs Samplers.................................................................... 34913.4.3 Gibbs Updates and Restricted Updates on Larger B lo ck s................. 350

13.5 MCMC Sampling of Inheritance Given Marker D a ta ....................................... 35113.5.1 Sampling Inheritance Conditional on Marker D ata.............................35113.5.2 Monte Carlo EM and Likelihood Ratio Estimation.............................35113.5.3 Importance Sampling Reweighting...................................................... 353

13.6 Using MCMC Realizations for Complex Trait Inference...................................35413.6.1 Estimating a Likelihood Ratio or lod Score ........................................ 35413.6.2 Uncertainty in Inheritance and Tests for

Linkage Detection....................................................................................35613.6.3 Localization of Causal Lod Using Latent p-V alues.............................357

13.7 Summary..................................................................................................................358Acknowledgment............................................................................................................. 359References.........................................................................................................................359

14. An MCMC-Based Analysis of a Multilevel Model for Functional MRI D ata__ 363Bnivi Caffo, DnBois Boivwan, h/tui Eberty, mid Sustm Spear Bassett

14.1 Introduction............................................................................................................. 36314.1.1 Literature R eview .................................................................................... 36414.1.2 Example D a ta ........................................................................................... 365

14.2 Data Preprocessing and First-Level Analysis..................................................... 36714.3 A Multilevel Model for Incorporating Regional Connectivity......................... 368

14.3.1 M odel......................................................................................................... 368

xii Contents

14.3.2 Simulating the Markov C h a in ................................................................ 36914.4 Analyzing the Chain............................................................................................... 371

14.4.1 Activation Results.....................................................................................37114.5 Connectivity Results............................................................................................... 374

14.5.1 Intra-Regional Connectivity.....................................................................37414.5.2 Inter-Regional Connectivity.....................................................................375

14.6 Discussion............................................................................................................... 376References.........................................................................................................................379

15. Partially Collapsed Gibbs Sampling and Path-AdaptiveMetropolis-Hastings in High-Energy Astrophysics.................................................... 383David A. mui. Di/k and Tacyonug Park

15.1 Introduction............................................................................................................. 38315.2 Partially Collapsed Gibbs Sam pler......................................................................38415.3 Path- Adaptive Metropolis-Hastings Sam pler................................................... 38815.4 Spectral Analysis in High-Energy Astrophysics.................................................39215.5 Efficient MCMC in Spectral Analysis................................................................... 39315.6 Conclusion............................................................................................................... 397Acknowledgments...........................................................................................................397References.........................................................................................................................397

16. Posterior Exploration for Computationally Intensive Forward Mo dels................... 401David Higdon, C. Shtuie Reese, J. David Moulton, Jasper A. Vr/igt, and Colin Fox

16.1 Introduction............................................................................................................. 40116.2 Ati Inverse Problem in Electrical Impedance Tomography.............................. 402

16.2.1 Posterior Exploration via Single-Site Metropolis Updates................. 40516.3 Multivariate Updating Schem es.......................................................................... 408

16.3.1 Random-'Walk Metropolis....................................................................... 40816.3.2 Differential Evolution and Variants .......................................................409

16.4 Augmenting with Fast, Approximate Simulators.............................................. 41116.4.1 Delayed Acceptance Metropolis..............................................................41316.4.2 An Augmented Sampler........................................................................... 414

16.5 Discussion............................................................................................................... 415Appendix: Formulation Based on a Process Convolution Prior .............................. 416Acknowledgments...........................................................................................................417References.........................................................................................................................417

17. Statistical Ecology...............................................................................................................419Ruth King

17.1 Introduction............................................................................................................. 41917.2 Analysis of Ring-Recoveiy D a ta .......................................................................... 420

17.2.1 Covariate Analysis.....................................................................................42217.2.1.1 Posterior Conditional Distributions....................................... 42317.2.1.2 R esults........................................................................................ 424

17.2.2 Mixed Effects M od el................................................................................ 42517.2.2.1 Obtaining Posterior Inference .................................................42617.2.2.2 Posterior Conditional Distributions....................................... 42717.2.2.3 R esults........................................................................................ 427

Contents xiii

17.2.3 Model Uncertainty.....................................................................................42817.2.3.1 Model Specification................................................................... 43017.2.3.2 Reversible Jump Algorithm..................................................... 43017.2.3.3 Proposal Distribution...............................................................43117.2.3.4 R esults........................................................................................ 43117.2.3.5 Comments....................................................................................432

17.3 Analysis of Count D ata...........................................................................................43317.3.1 State-Space M odels.................................................................................. 434

17.3.1.1 System Process.......................................................................... 43417.3.1.2 Observation Process................................................................. 43417.3.1.3 M odel...........................................................................................43517.3.1.4 Obtaining Inference................................................................... 435

17.3.2 Integrated Analysis.................................................................................. 43517.3.2.1 MCMC Algorithm..................................................................... 43617.3.2.2 R esults........................................................................................ 437

17.3.3 Model Selection......................................................................................... 43917.3.3.1 R esults........................................................................................ 44017.3.3.2 Comments....................................................................................442


18. Gaussian Random Field Models for Spatial Data........................................................449Miii'tili Htwm

18.1 Introduction............................................................................................................. 449IS. 1.1 Some Motivation for Spatial Modeling..................................................44918.1.2 MCMC and Spatial Models: A Shared H istory....................................451

18.2 Linear Spatial Models.............................................................................................45118.2.1 Linear Gaussian Process Models ........................................................... 452

18.2.1.1 MCMC for Linear GPs...............................................................45318.2.2 Linear Gaussian Markov Random Reid Models................................. 454

18.2.2.1 MCMC for Linear GMRFs........................................................45718.2.3 Summary.....................................................................................................457

18.3 Spatial Generalized Linear M od els..................................................................... 45818.3.1 The Generalized Linear Model Framework...........................................45818.3.2 Examples.....................................................................................................459

18.3.2.1 Binary Data................................................................................. 45918.3.2.2 Count D ata ................................................................................. 46018.3.2.3 Zero-Inflated Data..................................................................... 462

18.3.3 MCMC for SGLMs.....................................................................................46318.3.3.1 Langevin-Hastings MCMC..................................................... 46318.3.3.2 Approximating an SGLMby a Linear Spatial Model........... 465

18.3.4 Maximum Likelihood Inference for SG LM s.........................................46718.3.5 Summary.....................................................................................................467

18.4 Non-Gaussian Markov Random Field Models.................................................... 46818.5 Extensions............................................................................................................... 47018.6 Conclusion............................................................................................................... 471Acknowledgments...........................................................................................................473References.........................................................................................................................473

xiv Contents

19. Mo deling Preference Changes via a Hidden Markov Item ResponseTheory Model..................................................................................................................... 479Jong Hee Park

19.1 Introduction............................................................................................................. 47919.2 Dynamic Ideal Point Estimation.......................................................................... 48019.3 Hidden Markov Item Response Theory Model................................................... 48119.4 Preference Changes in US Supreme Court Justices............................................ 48719.5 Conclusions............................................................................................................. 490Acknowledgments...........................................................................................................490References.........................................................................................................................490

20. Parallel Eayesian MCMC Imputation for Multiple DistributedLag Models: A Case Study in Environmental Epidemiology....................................493Brian Caffo, Roger Peng, Francesca Dominici, Thomas A. Louis, and Scott Zeger

20.1 Introduction............................................................................................................. 49320.2 The Data S e t ............................................................................................................. 49420.3 Bayesian Imputation............................................................................................... 496

20.3.1 Single-Lag Models....................................................................................49620.3.2 Distributed Lag Models.......................................................................... 496

20.4 Model and Notation............................................................................................... 49820.4.1 Prior and Hierarchical Model Specification..........................................501

20.5 Bayesian Imputation............................................................................................... 50120.5.1 Sampler...................................................................................................... 50120.5.2 A Parallel Imputation Algorithm.......................................................... 502

20.6 Analysis of the Medicare Data...............................................................................50420.7 Summary..................................................................................................................507Appendix: Full Conditionals.......................................................................................... 509Acknowledgment............................................................................................................. 510References.........................................................................................................................510

21. MCMC for State-Space Models......................................................................................513Paul Feartihcad

21.1 Introduction: State-Space M odels........................................................................51321.2 Bayesian Analysis and MCMC Framework........................................................51521.3 Updating the State ................................................................................................. 515

21.3.1 Single-Site Updates of the S ta te ............................................................ 51521.3.2 Block Updates for the State ................................................................... 51821.3.3 Other Approaches....................................................................................523

21.4 Updating the Parameters ......................................................................................52321.4.1 Conditional Updates of the Parameters.............................................. 52321.4.2 Reparameterization of Hie Model.......................................................... 52521.4.3 Joint Updates of the Parameters and State ..........................................526


Contents xv

22. MCMC in Educational Research.................................................................................... 531Rci/ Levy, Rdierf J Mislevy, mid John T. Behreits

22.1 Introduction............................................................................................................. 53122.2 Statistical Models in Education Research............................................................ 53222.3 Historical and Current Research Activity............................................................ 534

22.3.1 Multilevel M odels....................................................................................53422.3.2 Psychometric Modeling.......................................................................... 535

22.3.2.1 Continuous Latent and Observable Variables....................... 53522.3.2.2 Continuous Latent Variables and Discrete Observable

Variables......................................................................................53622.3.2.3 Discrete Latent Variables and Discrete

Observable Variables................................................................. 53722.3.2.4 Combinations of M odels.......................................................... 538

22.4 NAEP Example........................................................................................................ 53822.5 Discussion: Advantages of M C M C ......................................................................54122.6 Conclusion............................................................................................................... 542References.........................................................................................................................542

23. Applications of MCMC in Fisheries Science................................................................547Russell B. Millar

23.1 Background............................................................................................................. 54723.2 The Current Situation.............................................................................................549

23.2.1 Software.................................................................................................... 55023.2.2 Perception of MCMC in Fisheries...........................................................551

23.3 AD M B ...................................................................................................................... 55123.3.1 Automatic Differentiation ......................................................................55123.3.2 Metropolis-Hastings Implementation ................................................. 552

23.4 Bayesian Applications to Fisheries........................................................................55323.4.1 Capturing Uncertainty............................................................................. 553

23.4.1.1 State-Space Models of South AtlanticAlbacore Tuna Biomass............................................................ 553

23.4.1.2 Implementation.......................................................................... 55523.4.2 Hierarchical Modeling of Research Trawl

Catchability .............................................................................................55523.4.3 Hierarchical Modeling of Sto ck-Recruitnient

Relationship .............................................................................................55723.5 Concluding Remarks .............................................................................................560Acknowledgment............................................................................................................. 561References.........................................................................................................................561

24. Model Comparison and Simulation for Hierarchical Models:Analyzing Rural-Urban Migration in Thailand..........................................................563Filiz Garip and Brnce Western

24.1 Introduction............................................................................................................. 56324.2 Thai Migration D ata ............................................................................................... 56424.3 Regression R esults..................................................................................................56824.4 Posterior Predictive Checks....................................................................................569

x v i Contents

245 Exploring Model Implications with Simulation................................................... 57024.6 Conclusion..................................................................................................................572References........................................................................................................................... 574

Index............................................................................................................................................ 575

Contents xvii

Preface

Over the past 20 years or so, Markov Chain Monte Carlo (MCMC) methods have revolutionized statistical computing, They have impacted the practice of Bayesian statistics profoundly by allowing intricate models to be posited and used in an. astonishing array of disciplines as diverse as fisheries science and economics, Of course, Bayesians are not the only ones to benefit from using MCMC, and there continues to be increasing use of MCMC in other statistical settings. The practical importance of MCMC has also sparked expansive and deep investigation into fundamental Markov chain theory. As the use of MCMC methods mature, we see deeper theoretical questions addressed, more complex applications undertaken and their use spreading to new fields of study. It seemed to us that it was a good time to try to collect an overview of MCMC research and its applications.

This book is intended to be a reference {not a text) for a broad audience and to be of use both to developers and users of MCMC methodology. There is enough introductory material in the book to help graduate students as well as researchers new to MCMC who wish to become acquainted with the basic theory, algorithms and applications. The book should also be of particular interest to those involved in the development or application of new and advanced MCMC methods. Given the diversity of disciplines that use MCMC, it seemed prudent to have many of the chapters devoted to detailed examples and case studies of realistic scientific problems. Those wanting to see current practice in MCMC will find a wealth of material to choose from here.

Roughly speaking, we can divide the book into two parts. The first part encompasses 12 chapters concerning MCMC foundations, methodology and algorithms. Hie second part consists of 12 chapters which consider the use of MCMC in practical applications. Within the first part, the authors take such a wide variety of approaches that it seems pointless to try to classify the chapters into subgroups. For example, some chapters attempt to appeal to a broad audience by taking a tutorial approach while oilier chapters, even if introductory, are either more specialized or present more advanced material Yet others present original research. In the second part, the focus shifts to applications. Here again, we see a variety of topics, but there are two basic approaches taken by the authors of these chapters. The first is to provide an overview of an application area with the goal of identifying best MCMC practice in the area through extended examples. The second approach is to provide detailed case studies of a given problem while dearly identifying the statistical and MCMC-related issues encountered in the application.

When we were planning this book, we quickly realized that no single source can give a truly comprehensive overview of cutting-edge MCMC research and applicationsthere is just too much of it and its development is moving too fast. Instead, the editorial goal was to obtain contributions of high quality that may stand the test of time. To this end, all of the contributions (induding those written by members of the editorial panel) were submitted to a rigorous peer review process and many underwent several revisions. Some contributions, even after revisions, were deemed unacceptable for publication here, and we certainly welcome constructive feedback on the chapters that did survive our editorial process. We thank all the authors for their efforts and patience in this process, and we ask for understanding from those whose contributions are not indude din this book. Webelieve the breadth and depth of the contributions to this book, induding some diverse opinions expressed, imply a continuously bright and dynamic future for MCMC research. We hope

xix

XX Preface

this book inspires further worktheoretical, methodological, and appliedin this exciting and rich area.

Finally, no project of this magnitude could be completed with satisfactory outcome without many individuals' help. We especially want to thank Robert Calver of Chapman & Hall/CRC for his encouragements, guidelines, and particularly his patience during the entire process of editing this book. We also offer our heartfelt thanks to the numerous referees for their insightful and rigorous review, often multiple times. Of course, the ultimate appreciation for all individuals involved in this project comes from your satisfaction with the book or at least a part of it. So we thank you for reading it.

MATLAB is a registered trademark of The MathWorks, Inc. For product information, please contact:

The MathWorks, Inc.3 Apple Hill DriveNatick, MA 01760-2098 USATel: 508 647 7000Fax: 508-647-7001E-mail: [email protected]: www.mathworks.com

Steve Brooks Andrew Gelman

Galin L. Jones Xiao-Li Meng

mailto:[email protected]://www.mathworks.com

Editors

Steve Brooks is company director of ATASS, a statistical consultancy business based, in the United Kingdom. He was formerly professor of Statistics at Cambridge University and received the Royal Statistical Society Guy medal in Bronze in 2005 and the Philip Leverhulme prize in 2004. Like his co-editors, he has served on numerous professional committees both in the United Kingdom and elsewhere, as well as sitting on numerous editorial boards. He is co-author of Bayesian Analysts for Population Ecology (Chapman & Hall/ CRC, 2009) and co-founder of the National Centre for Statistical Ecology. His research interests include the development and application of computational statistical methodology across a broad range of application areas.

Andrew Celman is a professor of statistics and political science and director of the Applied Statistics Center at Columbia University. He has received the Outstanding Statistical Application award from the American Statistical Association, the award for best artide published in the American Political Science Review, and the Committee of Presidents of Statistical Societies award for outstanding contributions by a person under the age of 40. Hisbooks include Bayesian DataAnalysis (with John Carlin, Hal Stem, and Don Rubin), Teaching Shit is tics: A Bag o f Tricks (with Deb Nolan), Data Analysis Using Regression mid Multilevel/Hierarchical Models (with Jennifer Hill), and, most recently, Rat State, Bine State, Rich State, Poor State: Win/ Americans Vote the Way They Do (with David Park, Boris Shor, Joe Bafumi, and Jeronimo Cortina).

Andrew has done research on a wide range of topics, including: why it is rational to vote; why campaign polls are so variable when elections are so predictable; why redistricting is good for democracy; reversals of death sentences, police stops in NewT York City; the statistical challenges of estimating small effects; the probability that your vote will be decisive; seats and votes in Congress; social network structure; arsenic in Bangladesh; radon in y our b a sement; toxicology; medical imaging; and methods in surveys, ex perimental de sign, statistical inference, computation, and graphics.

Galin L. Jones is an associate professor in the School of Statistics at the University of Minnesota. He has served on many professional committees and is currently serving on the editorial board for the Journal of Computational and Graphical Statistics. His research interests indude Markov chain Monte Carlo, Markov chains in decision theory, and applications of statistical methodology in agricultural, biological, and environmental settings.

Xiao Li Meng is the Whipple V. N. Jones professor of statistics and chair of the Department of Statistics at Harvard University; previously he taught at the University of Chicago {1991-2001). He wras the redpient of the 1997-1998 University of Chicago Faculty Award for Excellence in Graduate Teaching, the 2001 Committee of Presidents of Statistical Sodeties Award, the 2003 Distinguished Achievement Award and the 2008 Distinguished Service Award from the International Chinese Statistical Assodation, and the 2010 Medallion Lecturer from the Institute of Mathematical Statistics (IMS). He has served on numerous professional committees, induding chairing the 2004 Joint Statistical Meetings and the Committee on Meetings of American Statistical Assodation (ASA) from 2004 until 2010. He is an eleded fellow of the ASA and the IMS. He has also served on editorial boards for The Annals o f Statistics, Bayesian Analysis, Bernoulli, Biometrika, Journal o f the American Statistical

xxi

xxii Editors

Association, as well as the coeditor of Statistica Sinica. Currently, he is the statistics editor for the IMS Monograph and Textbook Series. He is also a coeditor of Applied Bm/csiwi Maiding iwd Causal Inference front Iucomplete-Data Perspectives (Gelmanand Meng, 2004, Wiley) and Strength in Numbers: Tlte Rising of Academic Statistics Departments in the U.S. (Agresti and Meng, 2012, Springer). His research interests include inference foundations and philosophies, models of all flavors, deterministic and stochastic algorithms, signal extraction in physical, sodal and medical stiences, and occasionally elegant mathematical statistics.

Contributors

Susan Spear BassettDepartment of Psychiatry and

B eliavioral Sciences Johns Hopkins Hospital Baltimore, Maryland

John T. BehrensCisco Systems, Inc. Mishawaka, Indiana

DuBois BowmanCenter for Biomedical Imaging

Statistics (CBIS)Emory University Atlanta, Georgia

Steve BrooksATASS LtdExeter, United Kingdom

Brian CaffoDepartment of Biostatistics Johns Hopkins University Baltimore, Maryland

George Casella Department of Statistics University of Honda Gainesville, Honda

Radu V. CraiuDepartment of Statistics University of Toronto Toronto, Ontario, Canada

Francesca DominiciDepartment of Biostatistics Johns Hopkins University Baltimore, Maryland

Lynn EberlyDivision of Biostatistics University of Minnesota Minneapolis, Minnesota

Vanan FanSchool of Mathematics and Statistics University of New South Wales Sydney, Australia

Paul FearnheadDepartment of Mathematics

and Statistics Lancaster University Lancaster, United Kingdom

James M. FlegalDepartment of Statistics University of California Riverside, California

Colin FoxDepartment of Physics University of Otago Dunedin, New Zealand

Filiz GaripDepartment of Sociology Harvard University Cambridge, Massachusetts

Andrew GelmanDepartment of Statistics

and

Department of Political Sdence Columbia University New York, New York

Charles J. Geyer School of Statistics University of Minnesota Minneapolis, Minnesota

Murali Hai anCenter for Ecology and

Environmental Statistics Pennsylvania State University University Park, Pennsylvania

xxiv Contributors

David HigdonLos Alamos National Laboratory Los Alamos, New Mexico

James P. HobertDepartment of Statistics University of Florida Gainsville, Florida

Mark HuberDepartment of Mathematical Sciences Claremont McKenna College Claremont, California

Galin L. JonesSchool of Statistics University of Minnesota Minneapolis, Minnesota

Ruth KingSchool of Mathematics and Statistics University of St. Andrews St. Andrews, United Kingdom

Roy LevySchool of Social and Family Dynamics Arizona State University Tempe, Arizona

Thomas A. LouisDepartment of Biostatistics Johns Hopkins University Baltimore, Maryland

Xiao-Li MengDepartment of Statistics Harvard University Cambridge, Massachusetts

Russell B. MillarDepartment of Statistics University of Auckland Auckland., New Zealand

Robert J. MislevyDepartment of Measurement,

Statistics and Evaluation University of Maryland Sevema Park, Maryland

J. David MoultonLos Alamos National Laboratory Los Alamos, New Mexico

Radford M. NealDepartment of Statistics University of Toronto Toronto, Ontario, Canada

Jong Hee ParkDepartment of Political Science University of Chicago Chicago, Illinois

Taeyoung ParkDepartment of Applied Statistic Yonsei University Seoul, South Korea

Roger PengDepartment of Biostatistics Johns Hopkins University Baltimore, Maiyland

C. Shane ReeseDepartment of Statistics Brigham Young University Provo, Utah

Christian RobertCEREMAD EUniversity

Paris-Dauphine Paris, France

Jeffrey S. RosenthalDepartment of Statistics University of Toronto Toronto, Ontario, Canada

Kenneth ShirleyThe Earth Institute Columbia University New York, New York

Scott A. SissonSchool of Mathematics and Statistics University of New South Wales Sydney, Australia

Contributors xxv

Elizabeth ThompsonDepartment of Statistics University of Washington Seattle, Washington

David A. van DykDepartment of Statistics University of California Irvine, California

Jasper A. VrugtCenter for Non-Linear Studies Irvine, California

Bruce WesternDepartment of Sociology Harvard University Cambridge, Massachusetts

Scott ZegerDepartment of Biostatistics Johns Hopkins University Baltimore, Maryland

Part I

Foundations, Methodology, andAlgorithms

1

Introduction to Markov Chain Monte Carlo

Charles J. Geyer

1.1 H istory

Despite a few notable uses of simulation of random processes in the pre-computer era (Hammersley and Handscomb, 1964, Section 1.2; Stigler, 2002, Chapter 7), practical widespread use of simulation had to await the invention of computers. Almost as soon as computers were invented, they were used for simulation (Hammersley and Handscomb, 1964, Section 1.2). The name "Monte Carlo" started as cutenessgambling was then (around 1950) illegal in most places, and the casino at Monte Carlo was the most famous in the worldbut it soon became a colorless technical tenn for simulation of random processes.

Markov chain Monte Carlo (MCMC) was invented soon after ordinary Monte Carlo at Los Alamos, one of the few places where computers were available at the time. Metropolis et al (1953)* simulated a liquid in equilibrium with its gas phase. Hie obvious way to find out about the thermodynamic equilibrium is to simulate the dynamics of the system, and let it run until it reaches equilibrium. The tom tie jo ire was their realization that they did not need to simulate the exact dynamics; they only needed to simulate some Markov chain having the same equilibrium distribution. Simulations following the scheme of Metropolis et al. (1953) are said to use the Metropolis algorithm. As computers became more widely available, the Metropolis algorithm was widely used by chemists and physicists, but it did not become widely known among statisticians until after 1990. Hastings (1970) generalized the Metropolis algorithm, and simulations following his scheme are said to use the MetropolisHastutgs algorithm. A special case of the Metropolis-Hastings algorithm was introduced by Genian and Genian (1984), apparently without knowiedge of earlier work. Simulations following their scheme are said to use the Gibbs sampler. Much of Genian and Genian (1984) discusses optimization to find the posterior mode rattier than simulation, and it took some tune for it to be understood in the spatial statistics community that the Gibbs sampler simulated the posterior distribution, thus enabling full Bayesian inference of all kinds. A methodology that was later seen to be very similar to the Gibbs sampler was introduced by Tanner and Wong (1987), again apparently without knowledge of earlier work. To this day, some refer to the Gibbs sampler as "data augmentation" following these authors. Gelfand and Smith (1990) made the wider Bayesian community aware of the Gibbs sampler, which up to that tune had been known only in the spatial statistics community Then it took off; as of this writing, a search for Gelfand and Smith (1990) on Google Scholar yields 4003 links to other works. It was rapidly realized that most B ayesian inf erence could

* The fifth author was Edward Teller the "father of the hydrogen bom b."

3

4 Handbook o f Markov Chain Monte Carlo

be done by MCMC, whereas very little could be done without MCMC. It took a while for researchers to properly understand the theory of MCMC (Geyer, 1992; Tierney 1994) and that all of the aforementioned work was a spedal case of the notion of MCMC. Green (1995) generalized the Metropolis-Hastings algorithm, as much as it can be generalized. Although this terminology is not widely used, we say that simulations following his scheme use the Metropolis-Hastings-Green algorithm MCMC is not used only for Bayesian inference. Likelihood inference in cases where the likelihood cannotbe calculated exphdtly due to missing data or complex dependence can also use MCMC (Geyer, 1994,1999; Geyer and Thompson, 1992, 1995, and references dted therein).

1.2 M arkov Chains

A sequence Xi, X?, ... of random elements of some set is a Markov chain if the conditional distribution of X?I+i given X i , . . . , Xn depends on X only. The set in which the X, take values is called the state space of the Markov chain.

A Markov chain has stationary ttwisition probabilities if the conditional distribution of X+i given X does not depend on n. This is the main kind of Markov chain of interest in MCMC. Some kinds of adaptive MCMC (Chapter 4, this volume) have nonstationary transition probabilities hi this chapter we always assume stationary transition probabilities.

The joint distribution of a Markov chain is determined by

The marginal distribution of Xi, called the initial, distribution The conditional distribution of X+i given X, called the transition p iduilnlih/ dis

tribution (because of the assumption of stationary transition probabilities, this does not depend on n)

People introduced to Markov chains through a typical course on stochastic processes have usually only seen examples where the state space is finite or countable. If the state space is finite, written {:*i,. . . , jc }, then the initial distribution can be assodated with a vector X = (Xi, defined by

Pr(Xi = jk,') = XE, i = 1 ,. . . , n,

and the transition probabilities can be associated with a matrix P having elements p,y defined by

PrfX^+i = Xj | X = Xi) = p;j, i= and j = 1,. ,.,n

When the state space is countably infinite, we can think of an infinite vedor and matrix. But most Markov chains of interest in MCMC have uncountable state space, and then we cannot think of the initial distribution as a vedor or the transition probability distribution as a matrix. We must think of them as an unconditional probability distribution and a conditional probability distribution.

Introduction to Markov Chain Monte Carlo 5

1.3 Com puter Program s and M arkov Chains

Suppose you have a computer program

Initialize x re p e a t {

Generate pseudorandom change to x Output x

}

If is the entire state of the computer program exclusive of random number generator seeds {which we ignore, pretending pseudorandom is random), this is MCMC. It is important that x must be the entire state of the program. Otherwise the resulting stochastic process need not be Markov

There is not much structure here. Most simulations can be fit into this format. Thus most simulations can be thought of as MCMC if the entire state of the computer program is considered the state of the Markov chain. Hence, MCMC is a very general simulation methodology.

1.4 Stationarity

A sequence Xi, X2, .. . of random elements of some set is called a stochastic process (Markov chains are a special case). Astochastic process is stationary if for every positive integer h the distribution of the /r-tuple

1

does not depend on/i. A Markov chain is stationary if it is a stationary stochastic pro cess. In a Markov chain, the conditional distribution of (X+2/ , X ) given X+1 does not depend on 11. It follows that a Markov chain is stationary if and only if the marginal distribution of X does not depend on 11.

An initial distribution is said to be stationary or invariant or equilibrium for some transition probability distribution if the Markov chain specified by this initial distribution and transition probability distribution is stationary. We also indicate this by saying that the transition probability distribution preserves the initial distribution

Stationarity implies stationary transition probabilities, but not vice versa. Consider an initial distribution concentrated at one point. The Markov chain can be stationary if and only if all iterates are concentrated at the same point, that is, Xi = X? = ..., so the chain goes nowhere and does nothing. Conversely, any transition probability distribution canbe combined with any initial distribution, including those concentrated at one point. Such a chain is usually not stationary (even though the transition probabilities are stationary).

Having an equilibrium distribution is an important property of a Markov chain transition probability. In Section 1.8 below, we shall see that MCMC samples the equilibrium distribution, whether the chain is stationary or not. Not all Markov chains have equilibrium distributions, but all Markov chains used in MCMC do. The Metropolis-Hastings-Green (MHG) algorithm (Sections 1.12.2, 1.17.3.2, and 1.17.4.1 below) constructs transition probability mechanisms that preserve a specified equilibrium distribution.

6 Handbook o f Markov Chain Monte Carlo

1.5 R eversibility

A transition probability distribution is reversible with respect to an initial distribution if, for the Markov chain X i,X 2, . . . they specify, the distribution of pairs (X X(+i )is exchangeable.

A Markov chain is reversible if its transition probability is reversible with respect to its initial distribution. Reversibility implies stationarity, butnot vice versa. Areversible Markov chain has the same laws rumrnig forward or backward hi time, that is, for any i and k Hie distributions of (XI+i , . . . , X,+i:) and (Xi+Jt , . . . ,X i+i) are the same. Hence the name.

Reversibility plays two roles hi Markov chain theory. All known methods for constructing transition probability mechanisms that preserve a specified equihbrium distribution in non-toy problems are special cases of the MHG algorithm, and all of the elementary updates constructed by the MHG algorithm are reversible (which accounts for its other name, the "reversible jump" algorithm). Combining elementary updates by composition (Section 1.12.7 below) may produce a combined update mechanism that is not reversible, but this does not diminish the key role played by reversibility in constructing transition probability mechanisms for MCMC. The other role of reversibility is to simplify' the Markov chain central limit theorem (CLT) and asymptotic variance estimation, hi the presence of reversibility the Markov chain CLT (Kipnis and Varadhan, 1986; Roberts and Rosenthal, 1997) is much sharper and the conditions are much simpler than without reversibility. Some methods of asymptotic variance estimation (Section 1.10.2 below) only work for reversible Markov chains but are much simpler and more reliable than analogous methods for nonreversible chains.

1.6 Functionals

If Xi, X2, .. . is a stochastic process and g is a real-valued function on its state space, then the stochastic process g(Xi),g(X2) , . . . having state space ffi is said to be a functional of X i , X 2 , . . . .

If Xi, X2, . . . is a Markov chain, then a functional g(Xi),(X2) , . . . is usually not a Markov chain. The conditional distribution of XTI+i given X\, ...,X depends only on X, but this does not, in general, imply that the conditional distribution of (X?,+i) given (X i),.. . ,g(X) depends only on^(X). Nevertheless, functionals of Markov chains have important properties not shared by other stochastic processes.

1.7 The Theoiy of O rdinary Monte Carlo

Ordinary Monte Carlo (OMC), also called "independent and identically distributed (i.i.d.) Monte Carlo" or "good old-fashioned Monte Carlo," is the special case of MCMC in which Xi, X2, ... are independent and identically distributed, in which case the Markov chain is stationary and reversible.

Suppose you wish to calculate an expectation

|1 = E{#(X)}, ( 1.1)

Introduction to Markov Chain Monte Carlo 7

where g is a real-valued function on the state space, but you cannot do it by exact methods (integration or summation using pencil and paper, a computer algebra system, or exact numerical methods). Suppose you can simulate Xi, X?,. .. i.i.d. having the same distribution as X. Define

1 71

i=i

If we introduce the notation Y, = (X,), then the Y) are i.i.d. with mean |i and variance

a2 = var{g(X)}, (1.3)

|1 is the sample mean of the Yu and the CLT says that

The variance in the CLT can be estimated by

1a- = 7 ;E te < x > - M 2'

i=i

which is the empirical variance of the Y, . Using the terminology of Section 1.6, we can alsosay that jl is the sample mean of the functional g (Xi), g (X?), . . . of X i , X?,_______

The theory of OMC is just elementary statistics. For example, |i 1.96 a j /Ti is an asymptotic 95% confidence interval for |i. Note that OMC obeys what an elementary statistics text (Freedman et al., 2007) calls the square root Imv: statistical accuracy is inversely proportional to the square root of the sample size. Consequently, the accuracy of Monte Carlo methods is limited. Each additional significant figure, a tenfold increase in accuracy, requires a hundredfold increase in the sample size.

The only tricky issue is that the randomness involved is the pseudorandomness of computer simulation, rather than randomness of real-world phenomena. Thus it is a good idea to use terminology that emphasizes the difference. We call Equation 1.2 the Monte Carlo approximation or Mo

Handbook of Markov Chain Monte Carlo

Documents

Transcript of Handbook of Markov Chain Monte Carlo