Probabilistic programming

Probabilistic Programming: a Broad Overview

Eli Sennesh

Bayesian Statistics in One Graph

Bayesian Role-Playing Games

● With enough paper, we could just write out every possible play based on every possible dice-roll and the fixed player levels.

● How many ways could we have gotten from the start of the game to the board we see? That's a likelihood.

● Which dice rolls were more common in games that gave us the real play? That's a posterior probability.

● Bayesianism: unknown levels are just more dice.– And are the actual dice weighted?

Bayesian Reasoning is Hard!

● P(H | E) = P(E | H)*P(H) / ∫ P(E | H)*P(H) dH● Bayesian reasoning: how to update beliefs in response to

evidence● But the integral in the denominator often cannot be solved

analytically.● Approaches: conjugate priors and posteriors (for which we don't

need to evaluate it), various numerical methods

The Trouble with Bayesian Modeling (1)

Calculating probabilities takes time exponential in the number of relevant variables!


● Exact Bayesian inference methods only support possible-worlds with a finite number of things in them.

● Every combination of model and inference method currently has to be combined manually into non-reusable code.

● Probabilistic modeling as we think we know it is inexpressive and slow, ranging in complexity from NP-complete to DEXPTIME.

Programs have generative structure

● Generative model = Rules + Randomness + Random Parameters– A program for generating random outcomes, given unknown

parameters.

● Probabilistic Program = Probability Distribution● The Stochastic Lambda Calculus: a Turing-complete language

with random choices.– In jargon: the 'probability monad'.

What sorts of queries?

● Sampling: 'What are some possibilities?'● Expectation: 'What should we expect to see?'● Support: 'What could happen?'● Probability mass/density: 'What's the chance this happens?'● Conditional query: 'Given this, what about that? What about

when X happens more than Y?'– 'Given that I saw 18 black ravens, what proportion of all ravens are

black?'

What sorts of inference?

● Mostly sampling-based approximate inference, but there are some exact algorithms.

● By sampling, we can reason about models where each possible world might contain a different number of actual entities.

● Approximate inference only has a small cost of its own after we generate many samples from the distribution.– And advanced inference algorithms can re-use parts of computations

to sample even faster.

What can probabilistic programs express?

● In a 'code as data' language like Church (based on Scheme), programs can include eval and learn arbitrary code.

● Or they can do a query about a query: inference about inference.

● Generative structure generalizes logical and physical structure: 'The λ-calculus does not focus on the sequence of time, but rather on which events influence which other events.'

https://probmods.org/introduction.html

The Elegance of Probabilistic Programming

● Programs = Distributions.– Running a program = sampling an outcome from the distribution

– Monadic semantics: every function maps input values to output distributions.

● Queries are just functions; language runtime performs inference.

● '[H]allucinate possible worlds', and reason about them.

Probabilistic Programs Can Explain Themselves

Slide taken from Olivier Grisel's original talk

https://speakerdeck.com/ogrisel/trends-in-machine-learning#

How can we use probabilistic programming?

● 'We would like to build computing machines that can interpret and learn from their sensory experience, act effectively in real time, and - ultimately - design and program themselves.'

● Professor Vikash Mansinghka, 'Natively Probabilistic Computation'

● Applications in: computer vision, cognitive science, machine learning, natural language processing, and artificial intelligence.

How can I use probabilistic programming?

● Write a program simulating whatever you want to reason about. Where you don't know which specific choice to make, choose randomly.

● Present the language runtime's inference engine with real-world data compatible with your model.– And go get some coffee.

● The inference engine will learn which random choices generate data like yours.

Computer Vision in 20 Lines of Code (1)

Computer Vision in 20 Lines of Code (2)

● The prior model 'knows' how to render pictures, but chooses what to render completely at random. Inference then 'learns' which choices most likely drew the real image.

● 'Our probabilistic graphics program did not originally support rotation, which was needed for the AOL CAPTCHAs; adding it required only 1 additional line of probabilistic code.'– http://probcomp.csail.mit.edu/gpgp/

● Homework takes more than 20 lines of code!

http://probcomp.csail.mit.edu/gpgp/

Which bar should we meet at?

Social reasoning and cooperation in 11 LOC.

Modeling where to meet for drinks

(define (sample-location) (if (flip .55) 'popular-bar 'unpopular-bar))

(define (alice depth)

(query

(define alice-location (sample-location))

alice-location

(equal? alice-location (bob (- depth 1)))))

(define (bob depth)

(query

(define bob-location (sample-location))

bob-location

(or (= depth 0) (equal? bob-location (alice depth)))))

The correct answer is actually Libirah!

Generating Scenes Under Constraints

● Sample fresh coffee shops, conditioned on the constraints of good design.● Scene generation is an open-world task: not just where to put furniture but how

much to generate is random.● Exact inference can't even handle these models!

What more needs doing?

● Sampling-based approximation helps, but we need better inference algorithms. No real-time reasoning yet.

● Techniques from compilers research help.– One recent paper got a 600x speed-up by caching the nonrandom parts of

each computation and applying Just-In-Time compilation.

● Halting Problem, Rice's Theorem: we can't prove one inference algorithm supreme in all circumstances.

● Reusable data: export what we've learned and pass it on to others.● Ways to treat intractability probabilistically

Conclusions

● Rules = Programs, Uncertain knowledge = Random choice.● Rules + Uncertainty = Programs + Randomness = Probabilistic

Programming● Probabilistic programs can express any complex model, but need to

improve our inference algorithms to make reasoning tractable.● Fields of application: cognitive science, procedural content

generation, machine learning, computer vision, Bayesian statistics, artificial intelligence.

Bibliography (1)

● Probabilistic Models of Cognition● http://probabilistic-programming.org/research/, http://forestdb.org/● Vikash Mansinghka. Natively Probabilistic Computation. PhD thesis, Massachusetts

Institute of Technology, 2009.● Noah D. Goodman, Vikash K. Mansinghka, Daniel M. Roy, Keith Bonawitz, and

Joshua B. Tenenbaum. Church: a language for generative models. In Proc. of Uncertainty in Artificial Intelligence, 2008.

● Fritz Obermeyer. Automated equational reasoning in nondeterministic lambda-calculi modulo theories H*. PhD thesis, Carnegie-Mellon University, 2009.

https://probmods.org/

http://probabilistic-programming.org/research/

Bibliography (2)

● Vikash Mansinghka, Tejas Kulkarni, Yura Perov, Josh Tenenbaum. Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs. Neural Information Processing Systems 2013.

● Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. Probabilistic programming. In International Conference on Software Engineering (ICSE, FOSE track), 2014.

● Andreas Stuhlmüller and Noah D. Goodman. Reasoning about Reasoning by Nested Conditioning: Modeling Theory of Mind with Probabilistic Programs. In Cognitive Systems Research, 2013.

● Yi-Ting Yeh, Lingfeng Yang, Matthew Watson, Noah D. Goodman, and Pat Hanrahan. 2012. Synthesizing open worlds with constraints using locally annealed reversible jump MCMC. ACM Trans. Graph. 31, 4, Article 56 (July 2012).

Probabilistic programming

Science

Transcript of Probabilistic programming