Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP)...

14
MAKE Health T01 Holzinger Group hci-kdd.org 1 185.A83 Machine Learning for Health Informatics 2017S, VU, 2.0 h, 3.0 ECTS Tutorial 02 - 04.04.2017 Tutorial on Probabilistic Programming with PyMC3 [email protected] http://hci-kdd.org/machine-learning-for-health-informatics-course

Transcript of Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP)...

Page 1: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 1

185.A83 Machine Learning for Health Informatics2017S, VU, 2.0 h, 3.0 ECTS

Tutorial 02 - 04.04.2017

Tutorial on Probabilistic Programming

with PyMC3

[email protected]://hci-kdd.org/machine-learning-for-health-informatics-course

Page 2: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 2

▪ 01. Introduction to Probabilistic Programming

▪ 02. PyMC3

▪ 03. linear regression – the Bayesian way

▪ 04. generalized linear models with PyMC3

Schedule

Page 3: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 3

▪ Probabilistic Programming (PP)

▪ allows automatic Bayesian inference

▪ on complex, user-defined probabilistic models

▪ utilizing “Markov chain Monte Carlo” (MCMC)sampling

▪ PyMC3

▪ a PP framework

▪ compiles probabilistic programs on-the-fly to C

▪ allows model specification in Python code

01. Introduction and Overview

Salvatier J, Wiecki TV, Fonnesbeck C. (2016) Probabilistic programming in Python using PyMC3. PeerJ Computer Science 2:e55 https://doi.org/10.7717/peerj-cs.55

Page 4: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 4

▪ IS NOT▪ Software that behaves probabilistically

▪ General programming language

▪ IS▪ Toolset for statistical / Bayesian modeling

▪ Framework to describe probabilistic models

▪ Tool to perform (automatic) inference

▪ Closely related to graphical models and Bayesian networks

▪ Extension to basic language (e.g. PyMC3 for Python)

“does in 50 lines of code what used to take thousands”

Properties of Probabilistic Programs

Kulkarni, T. D., Kohli, P., Tenenbaum, J. B. & Mansinghka, V. Picture: A probabilisticprogramming language for scene perception. in Proceedings of the ieee conference on computer vision and pattern recognition 4390–4399 (2015).

Page 5: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 5

▪ Machine learning algorithms / models often a black box

PP “open box”

▪ Simple approach

1. Define and build model

2. Automatic inference

3. Interpretation of results

not much equations anymore!

▪ “inference”: guess latent variables based on observations, using e.g. MCMC

Probabilistic Programs

Page 6: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 6

▪ Markov chain

▪ Stochastic process

▪ “memoryless” (Markov property)

▪ Conditional probability distribution of future states depends only upon the present state

▪ Sampling from probability distributions

▪ State of chain sample of distribution

▪ Quality improves with number of steps

▪ Class of algorithms / methods

▪ Numerical approximation of complex integrals

Markov chain Monte Carlo (MCMC)

Page 7: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 7

Markov chain Monte Carlo (MCMC)

(animated)

Page 8: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 8

▪ Metropolis-Hastings: random walk

▪ Gibbs-sampling: popular, complex, no tuning

▪ PyMC3

▪ No-U-Turn Sampler (NUTS)

▪ Hamiltonian Monte Carlo (HMC)

▪ Metropolis

▪ Slice

▪ BinaryMetropolis

Markov chain Monte Carlo (MCMC)

Page 9: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 9

▪ Quantity of interest: 𝜃 (theta)

▪ Prior = probability distribution▪ Uncertainty before observation: p(𝜃)▪ Belief in absence of data

▪ Posterior = probability distribution▪ Uncertainty after observation X: p(𝜃|X)

▪ Likelihood: p(𝑋|𝜃)𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 ∝ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 × 𝑃𝑟𝑖𝑜𝑟

𝑝 𝜃 𝑥 =𝑝 𝑥 𝜃 𝑝(𝜃)

𝑝(𝑥)▪ Calculating posterior from observation and prior =

updating beliefs

Prior & posterior distributions

Coin toss example

Page 10: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 10

▪ Python 3 package / framework

▪ Probabilistic machine learning

▪ specification and fitting of Bayesian models

▪ Inference by MCMC & variational fitting algorithms

▪ Performance enhancements: cross-compilation to C(Python numerical computation package “Theano”)

▪ Accessible, natural syntax

▪ Various capabilities: GPU computing, sampling backends, object-oriented, extendable design

PyMC3

Page 11: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 11

▪ PyMC3 syntax introduction

▪ linear regression – the Bayesian way

▪ generalized linear models with PyMC3

Live presentation

Page 12: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 12

Thank you!

Page 13: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 13

▪ What is probabilistic programming? What type of problems can be solved?

▪ What are the typical steps in a probabilistic program?

▪ What is inference?

▪ What is PyMC3?

▪ What is the posterior distribution?

Sample Questions

Page 14: Tutorial on Probabilistic Programming with PyMC3 · 4/4/2017  · Probabilistic Programming (PP) allows automatic Bayesian inference on complex, user-defined probabilistic models

MAKE Health T01Holzinger Group hci-kdd.org 14

▪ Main sources

▪ Salvatier J, Wiecki TV, Fonnesbeck C. (2016) Probabilistic programming in Python using PyMC3. PeerJ Computer Science 2:e55 https://doi.org/10.7717/peerj-cs.55

▪ Davidson-Pilon, C. Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference. (Addison-Wesley Professional, 2015).

▪ PyMC3’s documentationhttp://pymc-devs.github.io/pymc3/index.html

Appendix