Trustworthy, Useful Languages for Probabilistic Modeling and...
Transcript of Trustworthy, Useful Languages for Probabilistic Modeling and...
Trustworthy, Useful Languages forTrustworthy, Useful Languages forTrustworthy, Useful Languages for
Probabilistic Modeling and InferenceProbabilistic Modeling and InferenceProbabilistic Modeling and Inference
Neil Toronto
Dissertation Defense
Brigham Young University
2014/06/11
Master’s Research: Super-ResolutionMaster’s Research: Super-ResolutionMaster’s Research: Super-Resolution
Toronto et al. Super-Resolution via Recapture and Bayesian EffectModeling. CVPR 2009 1111111111111111
Master’s Research: Super-ResolutionMaster’s Research: Super-ResolutionMaster’s Research: Super-Resolution
• Model and query: Half a page of beautiful math
2222222222222222
Master’s Research: Super-ResolutionMaster’s Research: Super-ResolutionMaster’s Research: Super-Resolution
• Query implementation: 600 lines of Python
2222222222222222
Main Results: Super-ResolutionMain Results: Super-ResolutionMain Results: Super-Resolution
• Competitor and BEI on 4x super-resolution:
Resolution Synthesis
3333333333333333
Main Results: Super-ResolutionMain Results: Super-ResolutionMain Results: Super-Resolution
• Competitor and BEI on 4x super-resolution:
Resolution Synthesis Bayesian Edge Inference
3333333333333333
Main Results: Super-ResolutionMain Results: Super-ResolutionMain Results: Super-Resolution
• Competitor and BEI on 4x super-resolution:
Resolution Synthesis Bayesian Edge Inference
• Beat state-of-the-art on “objective” measures
3333333333333333
Main Results: Super-ResolutionMain Results: Super-ResolutionMain Results: Super-Resolution
• Competitor and BEI on 4x super-resolution:
Resolution Synthesis Bayesian Edge Inference
• Beat state-of-the-art on “objective” measures
• Was capable of other reconstruction tasks with few changes 3333333333333333
Only Mostly SatisfyingOnly Mostly SatisfyingOnly Mostly Satisfying
Problem 1: Still not sure the program is right
4444444444444444
Only Mostly SatisfyingOnly Mostly SatisfyingOnly Mostly Satisfying
Problem 1: Still not sure the program is right
Problem 2: smooth edges instead of discontinuous
4444444444444444
Only Mostly SatisfyingOnly Mostly SatisfyingOnly Mostly Satisfying
Problem 1: Still not sure the program is right
Problem 2: smooth edges instead of discontinuous
“To approximate blurring with a spatially varying point-spreadfunction (PSF), we assign each facet a Gaussian PSF andconvolve each analytically before combining outputs.”
4444444444444444
Only Mostly SatisfyingOnly Mostly SatisfyingOnly Mostly Satisfying
Problem 1: Still not sure the program is right
Problem 2: smooth edges instead of discontinuous
“To approximate blurring with a spatially varying point-spreadfunction (PSF), we assign each facet a Gaussian PSF andconvolve each analytically before combining outputs.”
i.e. “We can’t model it correctly so here’s a hack.” 4444444444444444
Solution Idea: Probabilistic LanguageSolution Idea: Probabilistic LanguageSolution Idea: Probabilistic Language
5555555555555555
Solution Idea: Probabilistic LanguageSolution Idea: Probabilistic LanguageSolution Idea: Probabilistic Language
• Also somehow let me model correctly 5555555555555555
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
6666666666666666
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice
6666666666666666
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice
Mimic human translation
6666666666666666
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice
Mimic human translation
Can’t tell error from feature
6666666666666666
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice
Mimic human translation
Can’t tell error from feature
Limited: usually no recursion orloops; conditions
6666666666666666
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice Designed for functionalprogrammers or FP theorists
Mimic human translation
Can’t tell error from feature
Limited: usually no recursion orloops; conditions
6666666666666666
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice Designed for functionalprogrammers or FP theorists
Mimic human translation May not be implemented
Can’t tell error from feature
Limited: usually no recursion orloops; conditions
6666666666666666
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice Designed for functionalprogrammers or FP theorists
Mimic human translation May not be implemented
Can’t tell error from feature Behavior is well-defined
Limited: usually no recursion orloops; conditions
6666666666666666
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice Designed for functionalprogrammers or FP theorists
Mimic human translation May not be implemented
Can’t tell error from feature Behavior is well-defined
Limited: usually no recursion orloops; conditions
Limited: usually finitedistributions, no conditioning
6666666666666666
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice Designed for functionalprogrammers or FP theorists
Mimic human translation May not be implemented
Can’t tell error from feature Behavior is well-defined
Limited: usually no recursion orloops; conditions
Limited: usually finitedistributions, no conditioning
Best of all worlds: define language using functional programmingtheory, make it for Bayesians, and remove limitations 6666666666666666
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
7777777777777777
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
• Useful: let you think abstractly and handle details for you
7777777777777777
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
• Useful: let you think abstractly and handle details for you
• Trustworthy: defined mathematically
7777777777777777
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
• Useful: let you think abstractly and handle details for you
• Trustworthy: defined mathematically
• Functional programming theory has the tools to defineprogramming languages mathematically
7777777777777777
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
• Useful: let you think abstractly and handle details for you
• Trustworthy: defined mathematically
• Functional programming theory has the tools to defineprogramming languages mathematically
• Measure-theoretic probability is the most complete account ofprobability; should allow shedding common limitations 7777777777777777
Simple Example ProcessSimple Example ProcessSimple Example Process
• Example process: Normal-Normal
8888888888888888
Simple Example ProcessSimple Example ProcessSimple Example Process
• Example process: Normal-Normal
• Intuition: Sample , then sample using
8888888888888888
Simple Example ProcessSimple Example ProcessSimple Example Process
• Example process: Normal-Normal
• Intuition: Sample , then sample using
• Density model :
8888888888888888
Simple Example ProcessSimple Example ProcessSimple Example Process
• Example process: Normal-Normal
• Intuition: Sample , then sample using
• Compute query by integrating:
8888888888888888
Conditional QueriesConditional QueriesConditional Queries
• Compute query using Bayes’ law:
9999999999999999
Conditional QueriesConditional QueriesConditional Queries
• Compute query using Bayes’ law:
9999999999999999
Conditional QueriesConditional QueriesConditional Queries
• Compute query using Bayes’ law:
9999999999999999
Conditional QueriesConditional QueriesConditional Queries
• Compute query using Bayes’ law:
9999999999999999
Conditional QueriesConditional QueriesConditional Queries
• Compute query using Bayes’ law:
9999999999999999
Conditional QueriesConditional QueriesConditional Queries
• Compute query using Bayes’ law:
9999999999999999
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
10101010101010101010101010101010
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
• Tons of useful things that are easy to write down
10101010101010101010101010101010
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
• Tons of useful things that are easy to write down
Distributions given non-axial, zero-probability conditions
10101010101010101010101010101010
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
• Tons of useful things that are easy to write down
Distributions given non-axial, zero-probability conditions
Discontinuous change of variable (e.g. a thermometer)
10101010101010101010101010101010
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
• Tons of useful things that are easy to write down
Distributions given non-axial, zero-probability conditions
Discontinuous change of variable (e.g. a thermometer)
Distributions of variable-dimension random variables
10101010101010101010101010101010
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
• Tons of useful things that are easy to write down
Distributions given non-axial, zero-probability conditions
Discontinuous change of variable (e.g. a thermometer)
Distributions of variable-dimension random variables
Nontrivial distributions on infinite products
10101010101010101010101010101010
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
• Tons of useful things that are easy to write down
Distributions given non-axial, zero-probability conditions
Discontinuous change of variable (e.g. a thermometer)
Distributions of variable-dimension random variables
Nontrivial distributions on infinite products
• Tricks to get around limitations aren’t general enough 10101010101010101010101010101010
Measure-Theoretic ProbabilityMeasure-Theoretic ProbabilityMeasure-Theoretic Probability
• Main ideas:
Don’t assign probability-like quantities to values, assignprobabilities to sets — the probability query is king
11111111111111111111111111111111
Measure-Theoretic ProbabilityMeasure-Theoretic ProbabilityMeasure-Theoretic Probability
• Main ideas:
Don’t assign probability-like quantities to values, assignprobabilities to sets — the probability query is king
Confine assumed randomness to one place by makingrandom variables deterministic functions that observe arandom source
11111111111111111111111111111111
Measure-Theoretic ProbabilityMeasure-Theoretic ProbabilityMeasure-Theoretic Probability
• Main ideas:
Don’t assign probability-like quantities to values, assignprobabilities to sets — the probability query is king
Confine assumed randomness to one place by makingrandom variables deterministic functions that observe arandom source
• Measure-theoretic model of example process:
11111111111111111111111111111111
Measure-Theoretic ProbabilityMeasure-Theoretic ProbabilityMeasure-Theoretic Probability
• Main ideas:
Don’t assign probability-like quantities to values, assignprobabilities to sets — the probability query is king
Confine assumed randomness to one place by makingrandom variables deterministic functions that observe arandom source
• Measure-theoretic model of example process:
11111111111111111111111111111111
Measure-Theoretic QueriesMeasure-Theoretic QueriesMeasure-Theoretic Queries
• Specific query:
12121212121212121212121212121212
Measure-Theoretic QueriesMeasure-Theoretic QueriesMeasure-Theoretic Queries
• Specific query:
• Generalized:
12121212121212121212121212121212
Measure-Theoretic QueriesMeasure-Theoretic QueriesMeasure-Theoretic Queries
• Specific query:
• Generalized:
• Conditional query: if then
12121212121212121212121212121212
Measure-Theoretic QueriesMeasure-Theoretic QueriesMeasure-Theoretic Queries
• Specific query:
• Generalized:
• Conditional query: if then
Can we avoid densities when ?
12121212121212121212121212121212
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)
14141414141414141414141414141414
Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)
14141414141414141414141414141414
Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)
14141414141414141414141414141414
Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)
14141414141414141414141414141414
Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)
14141414141414141414141414141414
Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)
• Integration is hard!
15151515151515151515151515151515
Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)
• Integration is hard!
• But random variables and are an abstraction boundaryhiding and , so we can choose convenient ones
15151515151515151515151515151515
Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)
• Integration is hard!
• But random variables and are an abstraction boundaryhiding and , so we can choose convenient ones
A uniform random source model:
15151515151515151515151515151515
Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)
• Integration is hard!
• But random variables and are an abstraction boundaryhiding and , so we can choose convenient ones
A uniform random source model:
where is the Normal CDF
15151515151515151515151515151515
Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)
• Integration is hard!
• But random variables and are an abstraction boundaryhiding and , so we can choose convenient ones
A uniform random source model:
where is the Normal CDF
• Stretches instead of integrates 15151515151515151515151515151515
Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)
• Generalized query:
16161616161616161616161616161616
Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)
• Generalized query:
i.e. output distributions are defined by preimages
16161616161616161616161616161616
Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)
• Generalized query:
i.e. output distributions are defined by preimages
• For a uniform random source model,
Compute probabilities by computing preimage areas
16161616161616161616161616161616
Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)
• Generalized query:
i.e. output distributions are defined by preimages
• For a uniform random source model,
Compute probabilities by computing preimage areas
Compute conditional probabilities as quotients of preimageareas
16161616161616161616161616161616
Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)
• Generalized query:
i.e. output distributions are defined by preimages
• For a uniform random source model,
Compute probabilities by computing preimage areas
Compute conditional probabilities as quotients of preimageareas
• Is this really more feasible than integrating?
16161616161616161616161616161616
Queries Using PreimagesQueries Using PreimagesQueries Using Preimages
17171717171717171717171717171717
Queries Using PreimagesQueries Using PreimagesQueries Using Preimages
Uniform Random Source Original Model
17171717171717171717171717171717
Queries Using PreimagesQueries Using PreimagesQueries Using Preimages
Uniform Random Source Original Model
17171717171717171717171717171717
Queries Using PreimagesQueries Using PreimagesQueries Using Preimages
Uniform Random Source Original Model
17171717171717171717171717171717
Queries Using PreimagesQueries Using PreimagesQueries Using Preimages
Uniform Random Source Original Model
17171717171717171717171717171717
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
18181818181818181818181818181818
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
18181818181818181818181818181818
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
Efficient representation of arbitrary sets
18181818181818181818181818181818
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
Efficient representation of arbitrary sets
Efficient way to compute areas of preimage sets
18181818181818181818181818181818
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
Efficient representation of arbitrary sets
Efficient way to compute areas of preimage sets
Proof of correctness w.r.t. standard interpretation
18181818181818181818181818181818
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
Efficient representation of arbitrary sets
Efficient way to compute areas of preimage sets
Proof of correctness w.r.t. standard interpretation
• Completely infeasible! But...
18181818181818181818181818181818
What About Approximating?What About Approximating?What About Approximating?
Conservative approximation with rectangles:
19191919191919191919191919191919
What About Approximating?What About Approximating?What About Approximating?
Conservative approximation with rectangles:
19191919191919191919191919191919
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
What About Approximating?What About Approximating?What About Approximating?
Sampling: exponential to quadratic (e.g. days to minutes)
21212121212121212121212121212121
What About Approximating?What About Approximating?What About Approximating?
Sampling: exponential to quadratic (e.g. days to minutes)
21212121212121212121212121212121
Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute preimage sets
• Efficient representation of arbitrary sets
• Efficient way to compute volumes of preimage sets
• Proof of correctness w.r.t. standard interpretation
22222222222222222222222222222222
Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute approximate preimage subsets
• Efficient representation of arbitrary sets
• Efficient way to compute volumes of preimage sets
• Proof of correctness w.r.t. standard interpretation
22222222222222222222222222222222
Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute approximate preimage subsets
• Efficient representation of approximating sets
• Efficient way to compute volumes of preimage sets
• Proof of correctness w.r.t. standard interpretation
22222222222222222222222222222222
Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute approximate preimage subsets
• Efficient representation of approximating sets
• Efficient way to sample uniformly in preimage sets
• Proof of correctness w.r.t. standard interpretation
22222222222222222222222222222222
Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute approximate preimage subsets
• Efficient representation of approximating sets
• Efficient way to sample uniformly in preimage sets
Efficient domain partition sampling
• Proof of correctness w.r.t. standard interpretation
22222222222222222222222222222222
Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute approximate preimage subsets
• Efficient representation of approximating sets
• Efficient way to sample uniformly in preimage sets
Efficient domain partition sampling
Efficient way to determine whether a domain sample isactually in the preimage (just use standard interpretation)
• Proof of correctness w.r.t. standard interpretation
22222222222222222222222222222222
Standard InterpretationStandard InterpretationStandard Interpretation
• Grammar:
23232323232323232323232323232323
Standard InterpretationStandard InterpretationStandard Interpretation
• Grammar:
• Semantic function
23232323232323232323232323232323
Standard InterpretationStandard InterpretationStandard Interpretation
• Grammar:
• Semantic function
• Math has no general recursion, so (i.e. interpretation ofprogram ) is a λ-calculus term
23232323232323232323232323232323
Standard InterpretationStandard InterpretationStandard Interpretation
• Grammar:
• Semantic function
• Math has no general recursion, so (i.e. interpretation ofprogram ) is a λ-calculus term
• Easy implementation in any language with lambdas
23232323232323232323232323232323
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
24242424242424242424242424242424
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
• Advantage: proofs about all programs by structural induction
24242424242424242424242424242424
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
• Advantage: proofs about all programs by structural induction
• Example: meaning of
24242424242424242424242424242424
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
• Advantage: proofs about all programs by structural induction
• Example: meaning of
24242424242424242424242424242424
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
• Advantage: proofs about all programs by structural induction
• Example: meaning of
• Nonexample:
24242424242424242424242424242424
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
• Advantage: proofs about all programs by structural induction
• Example: meaning of
• Nonexample:
24242424242424242424242424242424
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
• Advantage: proofs about all programs by structural induction
• Example: meaning of
• Nonexample:
• Can preimages be computed compositionally? 24242424242424242424242424242424
Pair PreimagesPair PreimagesPair Preimages
25252525252525252525252525252525
Pair PreimagesPair PreimagesPair Preimages
25252525252525252525252525252525
Pair PreimagesPair PreimagesPair Preimages
:
25252525252525252525252525252525
Pair PreimagesPair PreimagesPair Preimages
and :
25252525252525252525252525252525
Pair PreimagesPair PreimagesPair Preimages
:
25252525252525252525252525252525
Nonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing Preimages
• Preimage computation:
26262626262626262626262626262626
Nonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing Preimages
• Preimage computation:
26262626262626262626262626262626
Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing
• Pairing types:
27272727272727272727272727272727
Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing
• Pairing types:
Theorem (correctness under pairing). If
computes preimages under
27272727272727272727272727272727
Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing
• Pairing types:
Theorem (correctness under pairing). If
computes preimages under
computes preimages under
27272727272727272727272727272727
Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing
• Pairing types:
Theorem (correctness under pairing). If
computes preimages under
computes preimages under
then computes preimages under .
27272727272727272727272727272727
Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing
• Pairing types:
Theorem (correctness under pairing). If
computes preimages under
computes preimages under
then computes preimages under .
Proof sketch. Preimages distribute over cartesian products.
27272727272727272727272727272727
Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing
• Pairing types:
Theorem (correctness under pairing). If
computes preimages under
computes preimages under
then computes preimages under .
Proof sketch. Preimages distribute over cartesian products.
• Similar theorems for every kind of term 27272727272727272727272727272727
Nonstandard Interpretation: CorrectnessNonstandard Interpretation: CorrectnessNonstandard Interpretation: Correctness
Theorem. For all programs , computes preimages under.
Proof. By structural induction on program terms.
28282828282828282828282828282828
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
29292929292929292929292929292929
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
A. Yes. Yes, they do.
29292929292929292929292929292929
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
A. Yes. Yes, they do.
• Q. Where do I get a computer that runs them?
29292929292929292929292929292929
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
A. Yes. Yes, they do.
• Q. Where do I get a computer that runs them?
A. Nowhere, but we’ll approximate them soon.
29292929292929292929292929292929
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
A. Yes. Yes, they do.
• Q. Where do I get a computer that runs them?
A. Nowhere, but we’ll approximate them soon.
• Q. Why interpret programs as uncomputable functions, then?
29292929292929292929292929292929
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
A. Yes. Yes, they do.
• Q. Where do I get a computer that runs them?
A. Nowhere, but we’ll approximate them soon.
• Q. Why interpret programs as uncomputable functions, then?
A. So we know exactly what to approximate.
29292929292929292929292929292929
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
A. Yes. Yes, they do.
• Q. Where do I get a computer that runs them?
A. Nowhere, but we’ll approximate them soon.
• Q. Why interpret programs as uncomputable functions, then?
A. So we know exactly what to approximate.
• Q. Where did you get a λ-calculus that could operate on arbitrary,possibly infinite sets, anyway?
A. Well...
29292929292929292929292929292929
Lambda-ZFCLambda-ZFCLambda-ZFC
λ calculus
30303030303030303030303030303030
Lambda-ZFCLambda-ZFCLambda-ZFC
λ calculus
+
Infinite sets and operations
30303030303030303030303030303030
Lambda-ZFCLambda-ZFCLambda-ZFC
λ calculus
+
Infinite sets and operations
=
λZFC
30303030303030303030303030303030
Lambda-ZFCLambda-ZFCLambda-ZFC
λ calculus
+
Infinite sets and operations
=
λZFC
• Contemporary math, but with lambdas and general recursion; orfunctional programming, but with infinite sets
30303030303030303030303030303030
Lambda-ZFCLambda-ZFCLambda-ZFC
λ calculus
+
Infinite sets and operations
=
λZFC
• Contemporary math, but with lambdas and general recursion; orfunctional programming, but with infinite sets
• Can express uncountably infinite operations, can’t solve its ownhalting problem
30303030303030303030303030303030
Lambda-ZFCLambda-ZFCLambda-ZFC
λ calculus
+
Infinite sets and operations
=
λZFC
• Contemporary math, but with lambdas and general recursion; orfunctional programming, but with infinite sets
• Can express uncountably infinite operations, can’t solve its ownhalting problem
• Can use contemporary mathematical theorems directly
30303030303030303030303030303030
Rectangular ApproximationRectangular ApproximationRectangular Approximation
• A rectangle is
An interval or union of intervals
for rectangles and
31313131313131313131313131313131
Rectangular ApproximationRectangular ApproximationRectangular Approximation
• A rectangle is
An interval or union of intervals
for rectangles and
• Easy representation; easy intersection and join (union-like)operation, empty test, other operations
31313131313131313131313131313131
Rectangular ApproximationRectangular ApproximationRectangular Approximation
• A rectangle is
An interval or union of intervals
for rectangles and
• Easy representation; easy intersection and join (union-like)operation, empty test, other operations
• Recall:
31313131313131313131313131313131
Rectangular ApproximationRectangular ApproximationRectangular Approximation
• A rectangle is
An interval or union of intervals
for rectangles and
• Easy representation; easy intersection and join (union-like)operation, empty test, other operations
• Recall:
• Define:
31313131313131313131313131313131
Rectangular ApproximationRectangular ApproximationRectangular Approximation
• A rectangle is
An interval or union of intervals
for rectangles and
• Easy representation; easy intersection and join (union-like)operation, empty test, other operations
• Recall:
• Define:
• Derive 31313131313131313131313131313131
In Theory...In Theory...In Theory...
Theorem (sound). computes overapproximations of thepreimages computed by .
• Consequence: Sampling within preimages doesn’t leave anythingout
32323232323232323232323232323232
In Theory...In Theory...In Theory...
Theorem (sound). computes overapproximations of thepreimages computed by .
• Consequence: Sampling within preimages doesn’t leave anythingout
Theorem (monotone). is monotone.
• Consequence: Partitioning the domain never increasesapproximate preimages
32323232323232323232323232323232
In Theory...In Theory...In Theory...
Theorem (sound). computes overapproximations of thepreimages computed by .
• Consequence: Sampling within preimages doesn’t leave anythingout
Theorem (monotone). is monotone.
• Consequence: Partitioning the domain never increasesapproximate preimages
Theorem (decreasing). never returns preimages larger thanthe given subdomain.
• Consequence: Refining preimage partitions never explodes32323232323232323232323232323232
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
Importance SamplingImportance SamplingImportance Sampling
• Alternative to arbitrarily low-rate rejection sampling:
34343434343434343434343434343434
Importance SamplingImportance SamplingImportance Sampling
• Alternative to arbitrarily low-rate rejection sampling:
First, refine using preimage computation:
34343434343434343434343434343434
Importance SamplingImportance SamplingImportance Sampling
• Alternative to arbitrarily low-rate rejection sampling:
Second, randomly choose from arbitrarily fine partition:
34343434343434343434343434343434
Importance SamplingImportance SamplingImportance Sampling
• Alternative to arbitrarily low-rate rejection sampling:
Third, refine again:
34343434343434343434343434343434
Importance SamplingImportance SamplingImportance Sampling
• Alternative to arbitrarily low-rate rejection sampling:
Fourth, sample uniformly:
34343434343434343434343434343434
Importance SamplingImportance SamplingImportance Sampling
• Alternative to arbitrarily low-rate rejection sampling:
Do process “in the limit”; i.e. choose :
34343434343434343434343434343434
What About Recursion?What About Recursion?What About Recursion?
• General recursion, programs that halt with probability 1; e.g.
(define/drbayes (geometric p) (if (bernoulli p)
0(+ 1 (geometric p))))
35353535353535353535353535353535
What About Recursion?What About Recursion?What About Recursion?
• General recursion, programs that halt with probability 1; e.g.
(define/drbayes (geometric p) (if (bernoulli p)
0(+ 1 (geometric p))))
• Consider programs as being fully inlined (thus infinite):
(if (bernoulli p)0(+ 1 (if (bernoulli p)
0(+ 1 (if (bernoulli p)
0(+ 1 ...))))))
35353535353535353535353535353535
What About Recursion?What About Recursion?What About Recursion?
• General recursion, programs that halt with probability 1; e.g.
(define/drbayes (geometric p) (if (bernoulli p)
0(+ 1 (geometric p))))
• Consider programs as being fully inlined (thus infinite):
(if (bernoulli p)0(+ 1 (if (bernoulli p)
0(+ 1 (if (bernoulli p)
0(+ 1 ...))))))
• Random domain needs to be big enough and the right shape35353535353535353535353535353535
Program Domain ValuesProgram Domain ValuesProgram Domain Values
• Values are infinite binary trees:
36363636363636363636363636363636
Program Domain ValuesProgram Domain ValuesProgram Domain Values
• Values are infinite binary trees:
• Every expression in a program is assigned a node
36363636363636363636363636363636
Program Domain ValuesProgram Domain ValuesProgram Domain Values
• Values are infinite binary trees:
• Every expression in a program is assigned a node
• Implemented using lazy trees of random values
36363636363636363636363636363636
Program Domain ValuesProgram Domain ValuesProgram Domain Values
• Values are infinite binary trees:
• Every expression in a program is assigned a node
• Implemented using lazy trees of random values
• No probability density for domain, but there is a measure 36363636363636363636363636363636
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
• Normal-Normal process:
37373737373737373737373737373737
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
• Normal-Normal process:
• Objective: Find the distribution of
37373737373737373737373737373737
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
• Normal-Normal process:
• Objective: Find the distribution of
• Implementation:
(define/drbayes e (let* ([x (normal 0 1)]
[y (normal x 1)]) (list x y (sqrt (+ (sqr x) (sqr y))))))
37373737373737373737373737373737
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
• Normal-Normal process:
• Objective: Find the distribution of
• Implementation:
(define/drbayes e (let* ([x (normal 0 1)]
[y (normal x 1)]) (list x y (sqrt (+ (sqr x) (sqr y))))))
• Goal: Sample in the preimage of
(set-list reals reals (interval (- 1 ε) (+ 1 ε)))
37373737373737373737373737373737
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
For ε = 0.01:
Preimage rectangles
38383838383838383838383838383838
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
For ε = 0.01:
Preimage samples
38383838383838383838383838383838
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
For ε = 0.01:
Preimage samples Output samples
38383838383838383838383838383838
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
For ε = 0.01:
Preimage samples Output samples
• Works fine with much smaller ε38383838383838383838383838383838
Demo: ThermometerDemo: ThermometerDemo: Thermometer
• Normal-Normal thermometer process:
39393939393939393939393939393939
Demo: ThermometerDemo: ThermometerDemo: Thermometer
• Normal-Normal thermometer process:
• Objective: Find the distribution of
39393939393939393939393939393939
Demo: ThermometerDemo: ThermometerDemo: Thermometer
• Normal-Normal thermometer process:
• Objective: Find the distribution of
• Implementation:
(define/drbayes e (let* ([x (normal 90 10)]
[y (normal x 1)]) (list x (if (> y 100) 100 y))))
39393939393939393939393939393939
Demo: ThermometerDemo: ThermometerDemo: Thermometer
• Normal-Normal thermometer process:
• Objective: Find the distribution of
• Implementation:
(define/drbayes e (let* ([x (normal 90 10)]
[y (normal x 1)]) (list x (if (> y 100) 100 y))))
• Goal: Sample in the preimage of
(set-list reals (interval 100 100)) 39393939393939393939393939393939
Demo: ThermometerDemo: ThermometerDemo: Thermometer
Preimage rectangles
40404040404040404040404040404040
Demo: ThermometerDemo: ThermometerDemo: Thermometer
Preimage samples
40404040404040404040404040404040
Demo: ThermometerDemo: ThermometerDemo: Thermometer
Preimage samples Density of
40404040404040404040404040404040
Demo: ThermometerDemo: ThermometerDemo: Thermometer
Preimage samples Density of
Calculated from samples: mean 105.1, stddev 4.6
40404040404040404040404040404040
Demo: Stochastic Ray TracingDemo: Stochastic Ray TracingDemo: Stochastic Ray Tracing
• Idea: Model light transmission and reflection, condition on pathsthat pass through aperture
41414141414141414141414141414141
Demo: Stochastic Ray TracingDemo: Stochastic Ray TracingDemo: Stochastic Ray Tracing
• Idea: Model light transmission and reflection, condition on pathsthat pass through aperture
41414141414141414141414141414141
Demo: Stochastic Ray TracingDemo: Stochastic Ray TracingDemo: Stochastic Ray Tracing
• Part of the implementation (totals ~50 lines):(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (vec-dot v n))])
(if (positive? denom)(let ([t (/ (+ d (vec-dot p0 n)) denom)]) (if (positive? t)
(collision t (vec+ p0 (vec-scale v t)) n)#f))
#f)))
42424242424242424242424242424242
Demo: Stochastic Ray TracingDemo: Stochastic Ray TracingDemo: Stochastic Ray Tracing
• Part of the implementation (totals ~50 lines):(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (vec-dot v n))])
(if (positive? denom)(let ([t (/ (+ d (vec-dot p0 n)) denom)]) (if (positive? t)
(collision t (vec+ p0 (vec-scale v t)) n)#f))
#f)))
• Constrained light path outputs:
Paths Through Aperture Projected and Accumulated
42424242424242424242424242424242
Other Inference TasksOther Inference TasksOther Inference Tasks
• Typical
Hierarchical models
Bayesian regression
Model selection
43434343434343434343434343434343
Other Inference TasksOther Inference TasksOther Inference Tasks
• Typical
Hierarchical models
Bayesian regression
Model selection
• Atypical
Programs that halt with probability < 1, or never halt
Probabilistic program verification (sample in preimage of errorcondition)
43434343434343434343434343434343
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
44444444444444444444444444444444
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
True.
44444444444444444444444444444444
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
True.
• Was it falsifiable?
44444444444444444444444444444444
MeasurabilityMeasurabilityMeasurability
• Only measurable sets can have probabilities
45454545454545454545454545454545
MeasurabilityMeasurabilityMeasurability
• Only measurable sets can have probabilities
• Computing preimages under must preserve measurability—wesay itself is measurable
45454545454545454545454545454545
MeasurabilityMeasurabilityMeasurability
• Only measurable sets can have probabilities
• Computing preimages under must preserve measurability—wesay itself is measurable
Theorem (measurability). For all programs , is measurable,regardless of errors or nontermination, if language primitives aremeasurable.
45454545454545454545454545454545
MeasurabilityMeasurabilityMeasurability
• Only measurable sets can have probabilities
• Computing preimages under must preserve measurability—wesay itself is measurable
Theorem (measurability). For all programs , is measurable,regardless of errors or nontermination, if language primitives aremeasurable.
• Primitives include uncomputable operations like limits
45454545454545454545454545454545
MeasurabilityMeasurabilityMeasurability
• Only measurable sets can have probabilities
• Computing preimages under must preserve measurability—wesay itself is measurable
Theorem (measurability). For all programs , is measurable,regardless of errors or nontermination, if language primitives aremeasurable.
• Primitives include uncomputable operations like limits
• Applies to all probabilistic programming languages45454545454545454545454545454545
What I DidWhat I DidWhat I Did
45454545454545454545454545454545
What I DidWhat I DidWhat I Did
The core calculus for this:
45454545454545454545454545454545
Future WorkFuture WorkFuture Work
• Expressiveness
Lambdas and macros
Exceptions, parameters (or continuations and marks)
46464646464646464646464646464646
Future WorkFuture WorkFuture Work
• Expressiveness
Lambdas and macros
Exceptions, parameters (or continuations and marks)
• Optimization
Direct implementation is in depth; cut to
Incremental computation
Adaptive sampling algorithms
Static analysis
46464646464646464646464646464646
Future WorkFuture WorkFuture Work
• Expressiveness
Lambdas and macros
Exceptions, parameters (or continuations and marks)
• Optimization
Direct implementation is in depth; cut to
Incremental computation
Adaptive sampling algorithms
Static analysis
• Branching out: investigate preimage computation connection withtype systems and predicate transformer semantics 46464646464646464646464646464646