Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… ·...

66
Introduction to Bayesian Concepts and Methods Cornelius L. Pieterse [email protected] National Centre of Excellence in Mass Spectrometry Imaging National Physical Laboratory Teddington, UK Max Planck Institute for the Structure and Dynamics of Matter Center for Free-Electron Laser Science Hamburg, Germany Chris Engelbrecht Summer School January 2018

Transcript of Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… ·...

Page 1: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction to Bayesian Concepts and Methods

Cornelius L. [email protected]

National Centre of Excellence in Mass Spectrometry ImagingNational Physical Laboratory

Teddington, UK

Max Planck Institute for the Structure and Dynamics of MatterCenter for Free-Electron Laser Science

Hamburg, Germany

Chris Engelbrecht Summer SchoolJanuary 2018

Page 2: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Course overview

This course will provide an introduction to the Bayesian foundationalconcepts and methods with the primary emphasis to discuss the

fundamental conceptual mindset shift:

probability as logic rather than a relative frequency.

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 1 / 65

Page 3: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Outline of the course

Introduction

Background and expectationsBayesian approach and corollaries

Parameter Estimation

Fair coin exampleEffect of different priorsBest estimates and error barsGaussian noise and averages

Parameter Estimation II

Presence of backgroundMarginalization of backgroundPractical numerical considerations

More relevant applications will be presented by the other speakers

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 2 / 65

Page 4: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Outline of the course

The course almost exclusivelyfollows Sivia and Skilling.

A paperback copy is availablefor reference/consultation.

Data Analysis: A BayesianTutorial (second edition) byD. S. Sivia and J. Skilling,Oxford University Press (2006).

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 3 / 65

Page 5: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Statistics education

Undergraduate (BSc Physics and Applied Mathematics @ SU):courses such as Probability Theory and Statistics 114

Postgraduate (MEng in Electronic Engineering @ SU):statistics was non-existent and rarely considered

Postgraduate (PhD in Physics @ MPSD):statistics tend to complicate things...

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 4 / 65

Page 6: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Why do we care?

NPL has launched the 3DOrbiSIMS: anew molecular imaging technology withthe highest reported simultaneousspatial and mass resolutions.

To the pharmaceutical industry one ofthe major challenges is measurement ofthe intracellular drug concentration.

This instrument can help identify wheredrugs go at the cellular level, answeringlong-standing questions about whetherdrug concentrations are sufficiently highin the right places to have an effect.

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 5 / 65

Page 7: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

There are three kinds of lies: lies, damned lies and statistics.(Mark Twain, 1924)

The basics

A surprisingly large amount of scientists feel uneasy about statistics

Bayes and Laplace provided us with the more logical approach

This is an introductory tutorial to the Bayesian approach

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 6 / 65

Page 8: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Deductive Logic versus Plausible Reasoning

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 7 / 65

Page 9: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Notation

X ≡ Logical statement

I ≡ Background information

| ≡ Vertical bar read as given

Example

Assign plausibility: P(X |I ) where X ≡ It will rain tomorrow

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 8 / 65

Page 10: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Rules of Reasoning

Logical NOT:Not X = X

Logical AND:X and Y = X ,Y

Logical Sum Rule:P(X |I ) + P(X |I ) = 1

Logical Product Rule:P(X ,Y |I ) = P(X |Y , I )× P(Y |I )

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 9 / 65

Page 11: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Assigning Probabilities

Important: An absolute probability does not exist!

We have made the probabilities P(X |I ) conditional on I

Where I denotes the background information and assumptions

The failure to explicitly include all of the relevant background informationand assumptions is often the cause of disagreement about data analysis

Example

I1 ≡ The newspaper predicted thunderstorms

I2 ≡ There are dark clouds in the sky

I3 ≡ It is already raining heavily

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 10 / 65

Page 12: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Corollary I (Bayes theorem)

Replace X = H ≡ Hypothesis and Y = D ≡ Data

Substitute the above and apply the product rule:

P(H,D|I ) = P(H|D, I )︸ ︷︷ ︸Posterior

P(D|I )︸ ︷︷ ︸Evidence

= P(D|H, I )︸ ︷︷ ︸Likelihood

P(H|I )︸ ︷︷ ︸Prior

= P(D,H|I )

P(H|D, I ) =P(D|H, I )P(H|I )

P(D|I )

∝ P(D|H, I )P(H|I )

Bayes theorem

Often normalisation is not required (hence omitting the evidence)

Note that the evidence does not explicitly depend on the hypothesis

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 11 / 65

Page 13: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Corollary II (Marginalization)

Can we integrate over a logical statement, as Y is a proposition?

P(X |I ) =

∫P(X ,Y |I ) dY = P(X ,Y |I ) + P(X ,Y |I )

These two terms can be expanded with the product rule:

P(X ,Y |I ) = P(Y ,X |I ) = P(Y |X , I )P(X |I )P(X ,Y |I ) = P(Y ,X |I ) = P(Y |X , I )P(X |I )

Summing the above two equations:

P(X ,Y |I ) + P(X ,Y |I ) =

P(Y |X , I ) + P(Y |X , I )︸ ︷︷ ︸Unity from Sum Rule

P(X |I )

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 12 / 65

Page 14: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Corollary III (Marginalization)

Consider a set of propositions:

{Yk} = Y1,Y2,Y3, · · · ,YM

Decompose over the entire discrete set:

P(X |I ) =M∑k=1

P(X ,Yk |I )

A generalization which requires that the set is normalized:M∑k=1

P(Yk |X , I ) = 1

Therefore: the set is required to be mutually exclusive and exhaustive

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 13 / 65

Page 15: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Corollary III (Marginalization)

Arbitrarily large number of propositions → continuum limit

A mutually exclusive and exhaustive set of propositions should have

intervals chosen to have a common/contiguous borderthe intervals cover a big enough range of values

Normalization condition within the continuum limit:∫P(Y |X , I ) dY = 1

Therefore: here Y represents the parameter of interest

Marginalization allows nuisance parameters to be tortured

Unwanted background signals are golden nuisance parameters

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 14 / 65

Page 16: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Corollary III (Marginalization)

Laplace considered the mass M of Saturn, given orbital data D:∫P(M|D, I ) dM = 1

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 15 / 65

Page 17: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Bayes, Laplace and some orthodox statistics

Bernoulli, Bayes and Laplace pondered a lack of certainty

They interpreted probability as a degree-of-belief or plausibility

Unfortunately, other scholars considered this too vague and subjective

They preferred the relative frequency at which the event occurred

However, this approach assumes infinitely many repeated trials

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 16 / 65

Page 18: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Introduction

Bayes, Laplace and some orthodox statistics

The frequentist definition has a limited range of validity/use

Example: mass of Saturn is constant and thus not a random variable

Therefore: no frequency distribution (cannot use probability theory)

2XKCD. Frequentists versus Bayesians.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 17 / 65

Page 19: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Coin example

How can we determine if this coin is fair?

Consider several hypotheses for the bias-weighting H

H = 0 represents a coin which produces a tail on every flipH = 1 represents a coin which produces a head on every flip

As an example, we proposed the hypotheses:

0.00 ≤ H1 < 0.250.25 ≤ H2 < 0.490.49 ≤ H3 < 0.510.51 ≤ H4 < 0.750.75 ≤ H5 < 1.00

A fair coin would therefore be represented by hypothesis H3

Fairness of the coin is summarized by P(H|D, I ) dH

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 18 / 65

Page 20: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Coin example

We can use Bayes Theorem to determine the posterior:

P(H|D, I )︸ ︷︷ ︸Posterior

∝ P(D|H, I )︸ ︷︷ ︸Likelihood

P(H|I )︸ ︷︷ ︸Prior

Notice that we omitted the constant evidence

Often the evidence does not depend on H

If required, we can determine it using marginalization:∫ 1

0P(H|D, I ) dH = 1

Therefore: we need to determine the likelihood and prior

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 19 / 65

Page 21: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Coin example (prior)

The prior represents what we know given only the information I

As a start, we are ignorant about the nature of this coin:

P(H|I ) =

{1 for 0 ≤ H ≤ 1,0 for otherwise.

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 20 / 65

Page 22: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Coin example (likelihood)

Our ignorance is modified by the data through the likelihood

A measure of the chance that we would have obtained the data thatwe actually observed, if the value of the bias-weighting was known.

Probability of obtaining the data R heads in N flips is given:

P(D|H, I ) ∝ HR(1− H)N−R

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 21 / 65

Page 23: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Coin example (posterior)

Helpful to see how the posterior evolves as we obtain more data

Left-hand corner figure indicates the result of the previous flipRight-hand corner figure shows the total number of trails

Probability of obtaining the data R heads in N flips is given:

P(D|H, I ) ∝ HR(1− H)N−R

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 22 / 65

Page 24: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Coin example (posterior)

Helpful to see how the posterior evolves as we obtain more data

Left-hand corner figure indicates the result of the previous flipRight-hand corner figure shows the total number of trails

Probability of obtaining the data R heads in N flips is given:

P(D|H, I ) ∝ HR(1− H)N−R

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 23 / 65

Page 25: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Coin example (posterior)

The width of the posterior becomes narrower with more data

The position of the posterior maximum eventually settles

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 24 / 65

Page 26: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Coin example (different prior)

The uniform prior was chosen for its simplicity.

In other words: it represented our initial ignorance.

For illustrative purposes, consider two additional priors:

Our subconscious believe that most coins are fair (H = 0.5)Reliable gossip that the coin is heavily biased (H = 0 and H = 1)

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 25 / 65

Page 27: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Coin example (different prior)

Few data: resulting posteriors are quite different in detail

Ample data: all become sharp and converge to the same answer

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 26 / 65

Page 28: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Coin example (different prior)

Few data: resulting posteriors are quite different in detail

Ample data: all become sharp and converge to the same answer

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 27 / 65

Page 29: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Sequential versus one-step analysis

Consider a set of data {Dk} = {D1,D2,D3, · · · ,DN} for N coin flips.

Previously, we considered this to be an one-step process:

P(H|D2,D1, I ) ∝ P(D2,D1|H, I )P(H|I ) for N = 2

However, we can also consider a sequential process:

P(H|D1, I ) ∝ P(D1|H, I )P(H|I )P(H|D2,D1, I ) ∝ P(D2|H,D1, I )P(H|D1, I )

We, therefore, used the posterior as the subsequent prior.

This only works if we have logical independence:

P(D2|H, I ) = P(D2|H,D1, I ).

Be careful not to use the resulting posterior on the same data set.

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 28 / 65

Page 30: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Reliabilities: best estimates, error bars and confidence intervals

Posterior encodes our inference about the value of a parameter X ,given the data Y and the relevant background information I

Convenient to summarize the posterior with two numbers:

best estimate (maximum)measure of its reliability (width)

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 29 / 65

Page 31: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Reliabilities: best estimates, error bars and confidence intervals

The best estimate X0 is therefore easy to determine:

dP

dX

∣∣∣X0

= 0 andd2P

dX 2

∣∣∣X0

≤ 0.

The posterior is peaky, so we consider the logarithm L and expand it:

L = L(X0) +1

2

d2L

dX 2

∣∣∣X0

(X − X0)2 + · · ·

Posterior is approximated by the Gaussian distribution (not shown)

Quadratic term is the dominant factor determining the width:

σ =

(− d2L

dX 2

∣∣∣X0

)−1/2

Our inference is conveyed very concisely: X = X0 ± σ

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 30 / 65

Page 32: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Reliabilities: best estimates, error bars and confidence intervals

The probability that the true value of X lies with ±σ of X = X0 canbe determined by integration:

P(X0 − σ ≤ X < X0 + σ|D, I ) =

∫ X0+σ

X0−σP(X ,D|I ) dX ≈ 0.67

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 31 / 65

Page 33: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Reliabilities: best estimates, error bars and confidence intervals

The probability that the true value of X lies with ±σ of X = X0 canbe determined by integration:∫ X0+σ

X0−σP(X |D, I ) dX ≈ 0.67

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 32 / 65

Page 34: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Coin example (reliabilities)

Let us take a look again at the coin flipping example, for which wefound the probability of obtaining the data R heads in N flips:

P(D|H, I ) ∝ HR(1− H)N−R

To reduce the gradient, we take the natural logarithm:

L = constant + R log (H) + (N − R) log (1− H)

With this result we can subsequently determine the derivatives:

dL

dH=

R

H− (N − R)

(1− H)and

d2L

dH2= − R

H2− (N − R)

(1− H)2

Finally, we equate the first derivative to zero:

dL

dH

∣∣∣H0

=R

H0− (N − R)

(1− H0)= 0

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 33 / 65

Page 35: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Coin example (reliabilities)

The best estimate and associated error-bar is then determined as:

H0 =R

Nand σ =

√H0(1− H0)

N

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 34 / 65

Page 36: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Asymmetric posteriors

The error-bar is not appropriate for asymmetric posteriors

More reliable to determine a confidence interval (solve X1 and X2):∫ X2

X1

P(X |D, I ) dX ≈ 0.95

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 35 / 65

Page 37: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Asymmetric posteriors

The best estimate becomes vague for asymmetric posteriors

More representative to determine the expected value:

〈X 〉 =

∫X P(X |D, I ) dX

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 36 / 65

Page 38: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Multimodal posteriors

The most honest approach is just to display the posterior itself

Otherwise: provide several best estimates and error bars

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 37 / 65

Page 39: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Gaussian noise and averages

As another example, let us estimate the mean of a Gaussian process

The Gaussian distribution is often used as a model to describe thenoise associated with experimental data (for now, we just believe)

As such, the probability of the kth trail having a value xk is

P(xk |µ, σ) =1

σ√

2πexp

[−(xk − µ)2

2σ2

]

Question: given a set of data {xk}, what is µ assuming σ is given?

We (obviously) use Bayes Theorem to determine the posterior:

P(µ|{xk}, σ, I ) ∝ P({xk}|µ, σ, I )P(µ|σ, I )

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 38 / 65

Page 40: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Gaussian noise and averages

Assuming that the data is independent, we have for the likelihood:

P({xk}|µ, σ, I ) =N∏

k=1

P(xk |µ, σ, I )

The above result follows from the repeated use of the product rule:

P(X ,Y |I ) = P(X |I )P(Y |I )

This step is allowed since themeasurements are independent:

P(X |Y , I ) = P(X |I )

Note: this is often not the case!

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 39 / 65

Page 41: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Gaussian noise and averages

Knowledge of the Gaussian width tells us nothing about its position

As before, we are ignorant about the nature of this process:

P(µ|σ, I ) = P(µ|I ) =

{A for µmin ≤ µ ≤ µmax ,0 for otherwise.

The normalization constant A is determined by the range

We obtain the following for the posterior logarithm:

L = log [P(µ|{xk}, σ, I )] = constant−N∑

k=1

[(xk − µ)2

2σ2

]The constant includes all terms not involving µ

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 40 / 65

Page 42: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Gaussian noise and averages

As before, with this result we determine the derivatives:

dL

∣∣∣µ0

=N∑

k=1

[xk − µσ2

]= 0

Since σ is independent of k , we can rearrange the sum as follows:

N∑k=1

xk =N∑

k=1

µ0 = Nµ0

The best estimate is therefore just the arithmetic mean:

µ0 =1

N

N∑k=1

xk

Note: the best estimate is independent of the measurement error

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 41 / 65

Page 43: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Gaussian noise and averages

Next, using the previous result we determine the second derivative:

d2L

dµ2

∣∣∣µ0

= −N∑

k=1

1

σ2= − N

σ2

Obviously, the error-bar depends on the measurement error.

Our inference is therefore conveyed veryconcisely (should be a familiar result):

µ = µ0 ±σ√N

Note: the reliability is proportional to the number of measurements

The above is not an approximation: it is the exact solution

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 42 / 65

Page 44: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Data with different sized error-bars

Sometimes the magnitude of the error-bars are not the same for eachdatum. We then obtain the following the posterior logarithm:

L = log [P(µ|{xk}, σk , I )] = constant−N∑

k=1

[(xk − µk)2

2σ2

]

The best estimate is now slightly more complicated than before:

µ0 =N∑

k=1

wkxk

/ N∑k=1

wk where wk =1

σ2k

Our inference is however still conveyed very concisely:

µ = µ0 ±

(N∑

k=1

wk

)−1/2

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 43 / 65

Page 45: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Lighthouse problem

A lighthouse is somewhere off a piece of straight coastline at aposition α along the shore and a distance β out at sea.

It emits a series of flashes at random intervals (therefore also randomazimuths), which are intercepted on the coast by photodetectors

If N flashes were noted at positions {xk}, where is the lighthouse?

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 44 / 65

Page 46: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Lighthouse problem

After trigonometry and change of variables, the likelihood is found:

P(xk |α, β, I ) =β

π[β2 + (xk − α)2]

Cauchy distribution: frequently encountered in physics.

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 45 / 65

Page 47: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Lighthouse problem

Fix the offshore distance β (limit problem to one parameter).

We are ignorant about the location along the shoreline:

P(α|β, I ) = P(α|I ) =

{A for αmin ≤ α ≤ αmax ,0 for otherwise.

The detectors do not influence one another (thus independence):

P({xk}|α, β, I ) =N∏

k=1

P(xk |α, β, I )

We obtain the following for the posterior logarithm:

L = constant−N∑

k=1

log[β2 + (xk − α)2

]C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 46 / 65

Page 48: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Lighthouse problem

Best estimate of the position α0 is given by the posterior maximum:

dL

∣∣∣α0

= 2N∑

k=1

xk − α0

β2 + (xk − α0)2= 0

An analytical solution will confound us (use numerical brute force)

Visual representation of our inference about the lighthouse

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 47 / 65

Page 49: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Lighthouse problem

Visual representation of our inference about the lighthouse

Evolution of the posterior as the number of flashes detected increases

The position of the flashes are marked by the open circlesThe number of data analyzed are shown in the corner

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 48 / 65

Page 50: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Lighthouse problem

Visual representation of our inference about the lighthouse

Evolution of the posterior as the number of flashes detected increases

The position of the flashes are marked by the open circlesThe number of data analyzed are shown in the corner

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 49 / 65

Page 51: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation

Central limit theorem

Arithmetic mean of the data sounds like a good idea...

Averages based on the central limit theorem (not always valid)

For example, the mean is not appropriate for the Cauchy distribution

Moral of the story: the sample mean is not always a useful number.When in doubt, just show the complete posterior distribution.

3Scott Adams. dilbert.com/strip/2008-05-08.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 50 / 65

Page 52: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Presence of background

Previously we estimated only a single parameter

Let us now consider multiple and nuisance parameters

Example: often our signal is obscured by background noise:

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 51 / 65

Page 53: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Presence of background

Simplest case: consider a flat background of unknown magnitude B

Signal has an amplitude A with a peak of known shape and position

Example: Number of photons detected at a given wavelength

Counts are proportional to the sum of the signal and background

Peak is assumed to be Gaussian, located at x0 and having width w

Dk = n0

[Ae−(xk−x0)2/2w2

+ B]

The constant no is related to the experimental conditions

The trail/datum Dk is therefore not an integer (an approximation)

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 52 / 65

Page 54: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Presence of background

Poisson distribution is usually called upon for counting experiments:

P(N|D) =DNe−D

N!

The expectation value can be shown to be 〈N〉 = D

Example: these fluctuations are due to shot noise

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 53 / 65

Page 55: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Presence of background

We, therefore, have an assignment for the likelihood:

P(Nk |A,B, I ) =DNkk e−Dk

Nk !

Note: the background information I contains a lot of knowledge.

Assume that channels do not influence each other (independence):

P({Nk}|A,B, I ) =M∏k=1

P(Nk |A,B, I )

Posterior contains our inference about the signal and background:

P(A,B|{Nk}, I ) ∝ P({Nk}|A,B, I )P(A,B|, I )

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 54 / 65

Page 56: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Presence of background

Amplitude of neither the signal nor the background can be negative:

P(A,B|, I ) =

{constant for A ≥ 0 and B ≥ 0,

0 for otherwise.

Note: this is a very powerful and not the most ignorant prior.

We obtain the following for the posterior logarithm:

L = constant +M∑k=1

[Nk log (Dk)− Dk ]

Therefore: maximize L for both the signal and the background

We will need to perform a numerical estimation (very easy example)

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 55 / 65

Page 57: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Presence of background

Data was simulated with a Poisson random number generator:

P(N|D) =DNe−D

N!

Signal is located at the origin with contours shown for A and B

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 56 / 65

Page 58: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Presence of background

Data was simulated with a Poisson random number generator:

P(N|D) =DNe−D

N!

Signal is located at the origin with contours shown for A and B

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 57 / 65

Page 59: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Marginal distributions

The posterior contains all of the information regarding A and B.

We are often only interested in the signal information A:

P(A|{Nk}, I ) =

∫ ∞0

P(A,B|{Nk}, I ) dB

However, we might be interested in the background information B:

P(B|{Nk}, I ) =

∫ ∞0

P(A,B|{Nk}, I ) dA

Important: marginal and conditional probabilities are not equal!

P(A|{Nk}, I ) 6= P(A|{Nk},B, I )

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 58 / 65

Page 60: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Marginal distributions

Comparison of the marginal and conditional probabilities are shown

Note the narrowing for the conditional (dotted curve) scenario

Calibration not required for large signal-to-noise ratios

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 59 / 65

Page 61: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Marginal distributions

Comparison of the marginal and conditional probabilities are shown

Note the narrowing for the conditional (dotted curve) scenario

Calibration not required for large signal-to-noise ratios

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 60 / 65

Page 62: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Data binning

If we bin the data, we should technically do the integral:

Dk =

∫ xk+∆/2

xk−∆/2n0

[Ae−(x−x0)2/2w2

+ B]

dx

Although the data appears noisier, the effect of binning is cosmetic.

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 61 / 65

Page 63: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Reliabilities: best estimates, correlations and error-bars

Prefer to summarize the posterior: best estimates and reliabilities.

We have seen now that the posterior can contain several parameters.

For quantities of interest {Xj} with a posterior P, the best estimateof their values are given by a solution to the system of equations:

∂P

∂Xi

∣∣∣X0j

= 0

Note: we need to avoid minima and saddle-points.

We will not consider the reliability here (similar as before).

As an example, consider the specific case of two variables X and Y :

∂L

∂X

∣∣∣X0,Y0

= 0 and∂L

∂Y

∣∣∣X0,Y0

= 0

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 62 / 65

Page 64: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Reliabilities: best estimates, correlations and error-bars

Although we can determine reliability expressions like σx and σy , theyoften do not provide the completely picture regarding the error bar

The variance is formally defined as the expectation value of thesquare of the deviations from the mean:

Var(X ) =⟨(X − µ)2

⟩=

∫(X − µ)2 P(X |D, I ) dX

For the one-dimensional Gaussian distribution we find:⟨(X − µ)2

⟩= σ2

We can therefore also extend this to more than one dimension:

σ2x =

⟨(X − X0)2

⟩=

∫(X − X0)2 P(X ,Y |D, I ) dXdY

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 63 / 65

Page 65: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Reliabilities: best estimates, correlations and error-bars

Variance be further extended to consider simultaneous variations:

σ2xy = 〈(X − X0)(Y − Y0)〉

Called the covariance and measures correlation of the parameters:(σ2x σ2

xy

σ2xy σ2

y

)= −

(A CC B

)−1

1Sivia and Skilling. Data Analysis: A Bayesian Tutorial, 2006.C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 64 / 65

Page 66: Introduction to Bayesian Concepts and Methodsbayes/18/lectures/PieterseBayesianNotes2… · Introduction There are three kinds of lies: lies, damned lies and statistics. (Mark Twain,

Parameter Estimation II

Numerical interlude

Constructing the posterior is easy, finding that maximum often not.

Brute force and ignorance is a very limited approach.

C.L. Pieterse (National Physical Laboratory) January 2018 · Chris Engelbrecht Summer School 65 / 65