Bayesian statistics: concept and Bayesian capabilities in SAS
Introduction to Bayesian Statistics for non-mathematicians
Transcript of Introduction to Bayesian Statistics for non-mathematicians
![Page 1: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/1.jpg)
logo
Introduction to Bayesian Statisticsfor non-mathematicians
By: Dr. J. Andres Christen (Centro de Investigacion en Matematicas,CIMAT. Perteneciente a la red de centros CONACYT).
Prerequisits: Elements of calulus and probobility (basic).
Lenght: 5 hours (two sessions of 2:30 hours each).
e-mail: [email protected] .
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 1 / 43
![Page 2: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/2.jpg)
logo
Introduction to Bayesian Statisticsfor non-mathematicians
By: Dr. J. Andres Christen (Centro de Investigacion en Matematicas,CIMAT. Perteneciente a la red de centros CONACYT).
Prerequisits: Elements of calulus and probobility (basic).
Lenght: 5 hours (two sessions of 2:30 hours each).
e-mail: [email protected] .
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 1 / 43
![Page 3: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/3.jpg)
logo
Introduction to Bayesian Statisticsfor non-mathematicians
By: Dr. J. Andres Christen (Centro de Investigacion en Matematicas,CIMAT. Perteneciente a la red de centros CONACYT).
Prerequisits: Elements of calulus and probobility (basic).
Lenght: 5 hours (two sessions of 2:30 hours each).
e-mail: [email protected] .
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 1 / 43
![Page 4: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/4.jpg)
logo
Introduction to Bayesian Statisticsfor non-mathematicians
By: Dr. J. Andres Christen (Centro de Investigacion en Matematicas,CIMAT. Perteneciente a la red de centros CONACYT).
Prerequisits: Elements of calulus and probobility (basic).
Lenght: 5 hours (two sessions of 2:30 hours each).
e-mail: [email protected] .
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 1 / 43
![Page 5: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/5.jpg)
logo
Introduction to Bayesian Statisticsfor non-mathematicians
By: Dr. J. Andres Christen (Centro de Investigacion en Matematicas,CIMAT. Perteneciente a la red de centros CONACYT).
Prerequisits: Elements of calulus and probobility (basic).
Lenght: 5 hours (two sessions of 2:30 hours each).
e-mail: [email protected] .
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 1 / 43
![Page 6: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/6.jpg)
logo
Texts:
1 Lee, P. (1994), Bayesian Statyistics: An Introduction, London:Edward Arnold.
2 J. O. Berger (1985), Statistical Decision Theory: foundations,concepts and methods, Second Edition, Springer-Verlag.
3 Bernardo, J. M. and Smith, A. F. M. (1994), Bayesian Theory, Wiley:Chichester, UK.
4 M. H. DeGroot (1970), Optimal statistical decisions, McGraw–Hill:NY.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 2 / 43
![Page 7: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/7.jpg)
logo
Texts:
1 Lee, P. (1994), Bayesian Statyistics: An Introduction, London:Edward Arnold.
2 J. O. Berger (1985), Statistical Decision Theory: foundations,concepts and methods, Second Edition, Springer-Verlag.
3 Bernardo, J. M. and Smith, A. F. M. (1994), Bayesian Theory, Wiley:Chichester, UK.
4 M. H. DeGroot (1970), Optimal statistical decisions, McGraw–Hill:NY.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 2 / 43
![Page 8: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/8.jpg)
logo
Texts:
1 Lee, P. (1994), Bayesian Statyistics: An Introduction, London:Edward Arnold.
2 J. O. Berger (1985), Statistical Decision Theory: foundations,concepts and methods, Second Edition, Springer-Verlag.
3 Bernardo, J. M. and Smith, A. F. M. (1994), Bayesian Theory, Wiley:Chichester, UK.
4 M. H. DeGroot (1970), Optimal statistical decisions, McGraw–Hill:NY.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 2 / 43
![Page 9: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/9.jpg)
logo
Texts:
1 Lee, P. (1994), Bayesian Statyistics: An Introduction, London:Edward Arnold.
2 J. O. Berger (1985), Statistical Decision Theory: foundations,concepts and methods, Second Edition, Springer-Verlag.
3 Bernardo, J. M. and Smith, A. F. M. (1994), Bayesian Theory, Wiley:Chichester, UK.
4 M. H. DeGroot (1970), Optimal statistical decisions, McGraw–Hill:NY.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 2 / 43
![Page 10: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/10.jpg)
logo
Texts:
Colin Howson and Peter Urbach (2006) Scientific Reasoning: TheBayesian Approach, (3rd Ed.), Open Court.
“Two English philosophers provocatively argue the case for Bayesian logic,with a minimum of complex math. They claim that Bayesian thinking isidentical to the scientific method and give fascinating examples of how toanalyze beliefs, such as Macbeth’s doubting of the witches’ prophecy, thediscovery of Neptune on the strength of faith in Newton’s laws but zeroevidence, and why people get hooked on Dianetics.”, – Discover.
“For the first time, we have a book that combines philosophical wisdom,mathematical skill, and statistical appreciation, to produce a coherentsystem.” – Dennis V. Lindley, University College, London (ret.).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 3 / 43
![Page 11: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/11.jpg)
logo
Conditional Probability
A cornerstone of Bayesian statistics is its (alternative) definition ofprobability, a definition sufficiently wide to cover many interesting cases.Let’s start with some examples:
1 What is the probability that if I toss a coin it lands on “heads”?
2 What is the probability that your lecturer has more than theequivalent of 50 pesos in his pocket?
3 What is the probability that it rains tomorrow?
4 What is the probability that it rained yesterday in Washington?
5 What is the probability that our Galaxy has more than 109 stars?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 4 / 43
![Page 12: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/12.jpg)
logo
Conditional Probability
A cornerstone of Bayesian statistics is its (alternative) definition ofprobability, a definition sufficiently wide to cover many interesting cases.Let’s start with some examples:
1 What is the probability that if I toss a coin it lands on “heads”?
2 What is the probability that your lecturer has more than theequivalent of 50 pesos in his pocket?
3 What is the probability that it rains tomorrow?
4 What is the probability that it rained yesterday in Washington?
5 What is the probability that our Galaxy has more than 109 stars?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 4 / 43
![Page 13: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/13.jpg)
logo
Conditional Probability
A cornerstone of Bayesian statistics is its (alternative) definition ofprobability, a definition sufficiently wide to cover many interesting cases.Let’s start with some examples:
1 What is the probability that if I toss a coin it lands on “heads”?
2 What is the probability that your lecturer has more than theequivalent of 50 pesos in his pocket?
3 What is the probability that it rains tomorrow?
4 What is the probability that it rained yesterday in Washington?
5 What is the probability that our Galaxy has more than 109 stars?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 4 / 43
![Page 14: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/14.jpg)
logo
Conditional Probability
A cornerstone of Bayesian statistics is its (alternative) definition ofprobability, a definition sufficiently wide to cover many interesting cases.Let’s start with some examples:
1 What is the probability that if I toss a coin it lands on “heads”?
2 What is the probability that your lecturer has more than theequivalent of 50 pesos in his pocket?
3 What is the probability that it rains tomorrow?
4 What is the probability that it rained yesterday in Washington?
5 What is the probability that our Galaxy has more than 109 stars?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 4 / 43
![Page 15: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/15.jpg)
logo
Conditional Probability
A cornerstone of Bayesian statistics is its (alternative) definition ofprobability, a definition sufficiently wide to cover many interesting cases.Let’s start with some examples:
1 What is the probability that if I toss a coin it lands on “heads”?
2 What is the probability that your lecturer has more than theequivalent of 50 pesos in his pocket?
3 What is the probability that it rains tomorrow?
4 What is the probability that it rained yesterday in Washington?
5 What is the probability that our Galaxy has more than 109 stars?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 4 / 43
![Page 16: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/16.jpg)
logo
More over...
A piece of maize with several kernels found in a clay pot believed tobelong to the last days of the Mexica umpire are radiocarbon dated.
What is the age of the pot?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 5 / 43
![Page 17: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/17.jpg)
logo
More over...
A piece of maize with several kernels found in a clay pot believed tobelong to the last days of the Mexica umpire are radiocarbon dated.
What is the age of the pot?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 5 / 43
![Page 18: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/18.jpg)
logo
More over...
A piece of maize with several kernels found in a clay pot believed tobelong to the last days of the Mexica umpire are radiocarbon dated.
What is the age of the pot?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 5 / 43
![Page 19: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/19.jpg)
logo
All probabilities are conditional(on the person or agent speaking, assumptions made, data used, etc.).
Probability statements go beyond favorable/possible calculations.
In Bayesian statistics, all uncertainties about unknowns are measured witha probability distribution.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 6 / 43
![Page 20: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/20.jpg)
logo
All probabilities are conditional(on the person or agent speaking, assumptions made, data used, etc.).
Probability statements go beyond favorable/possible calculations.
In Bayesian statistics, all uncertainties about unknowns are measured witha probability distribution.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 6 / 43
![Page 21: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/21.jpg)
logo
All probabilities are conditional(on the person or agent speaking, assumptions made, data used, etc.).
Probability statements go beyond favorable/possible calculations.
In Bayesian statistics, all uncertainties about unknowns are measured witha probability distribution.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 6 / 43
![Page 22: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/22.jpg)
logo
Informal Bayesian definition of Probability
Probability is an opinion hold by an agent, that may be turned into a betunder suitable circumstances.
If you say the probability of an event E is p, the you would take a bet ofat most a = 1−p
p to 1 on E being true.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 7 / 43
![Page 23: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/23.jpg)
logo
Informal Bayesian definition of Probability
Probability is an opinion hold by an agent, that may be turned into a betunder suitable circumstances.
If you say the probability of an event E is p, the you would take a bet ofat most a = 1−p
p to 1 on E being true.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 7 / 43
![Page 24: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/24.jpg)
logo
Preferences among events
Bayesian statistics, unlike other paradigms for inference, is based on atheory, that is, a set of axioms that creates a general procedure to makeinferences. We briefly present the theory given in DeGroot (1970, cap. 6).We begin with a quote by DeGroot (1970, p. 70):
...suitable probabilities can often be assigned objectively andquickly because of wide agreement on the appropriateness of aspecific distribution for a certain type of problem...On the otherhand, there are some situations for which it would be verydifficult to find even two people who would agree on theappropriateness of any specific distribution.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 8 / 43
![Page 25: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/25.jpg)
logo
Preferences among events
We have a total event Ω and a set o events @ ((Ω,@) is a mesurable set),we have:
A B, A ≺ B, A ∼ B.
to mean that A is less (more, equal) likely that B. Also
A B
means that A is no more likely than B.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 9 / 43
![Page 26: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/26.jpg)
logo
Preferences among events
We have a total event Ω and a set o events @ ((Ω,@) is a mesurable set),we have:
A B, A ≺ B, A ∼ B.
to mean that A is less (more, equal) likely that B. Also
A B
means that A is no more likely than B.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 9 / 43
![Page 27: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/27.jpg)
logo
Axioms
A set of axioms are given for the preference relation , for a rationalagent:
A complete ordering axiom:
Axiom
For any two events A,B ∈ @, we have exactly one of the three followingpreference relations: A B, A ≺ B, A ∼ B.
A transitivity axiom similar to this (a more general version is neededthough):
Axiom
Si A,B,C ∈ @, are three events A B y B C , then A C .
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 10 / 43
![Page 28: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/28.jpg)
logo
Axioms
A set of axioms are given for the preference relation , for a rationalagent:
A complete ordering axiom:
Axiom
For any two events A,B ∈ @, we have exactly one of the three followingpreference relations: A B, A ≺ B, A ∼ B.
A transitivity axiom similar to this (a more general version is neededthough):
Axiom
Si A,B,C ∈ @, are three events A B y B C , then A C .
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 10 / 43
![Page 29: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/29.jpg)
logo
Axioms
A set of axioms are given for the preference relation , for a rationalagent:
A complete ordering axiom:
Axiom
For any two events A,B ∈ @, we have exactly one of the three followingpreference relations: A B, A ≺ B, A ∼ B.
A transitivity axiom similar to this (a more general version is neededthough):
Axiom
Si A,B,C ∈ @, are three events A B y B C , then A C .
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 10 / 43
![Page 30: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/30.jpg)
logo
A non triviality axiom
Axiom
For A ∈ @ any event, then ∅ A. Moreover, ∅ Ω.
And a continuity axiom, a technicality to be able to work with continuosdistributions, like the gaussian:
Axiom
If A1 ⊃ A2 ⊃ · · · ia a decreasing sequence of events in @ and B ∈ @ isanother event such that Ai B for all i , then ∩∞i=1Ai B.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 11 / 43
![Page 31: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/31.jpg)
logo
A non triviality axiom
Axiom
For A ∈ @ any event, then ∅ A. Moreover, ∅ Ω.
And a continuity axiom, a technicality to be able to work with continuosdistributions, like the gaussian:
Axiom
If A1 ⊃ A2 ⊃ · · · ia a decreasing sequence of events in @ and B ∈ @ isanother event such that Ai B for all i , then ∩∞i=1Ai B.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 11 / 43
![Page 32: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/32.jpg)
logo
The auxiliary experiment
One further axiom is needed. This axiom more or less says that some“standard” events are added to our sets of events, and this in turn arecompared with the standard events.Suppose for example, that we spin a roulette and all events regarding thefinal position of the roulette are compared with our “relevant” events.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 12 / 43
![Page 33: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/33.jpg)
logo
Bayesian Inference
Uncertainty is quantified with a probability measure
Bayes’ Theorem: Modify our probability measure with evidence
All probability is conditional (to assumptions made, agent speaking etc.)
P(· | H), with H = particular context, agent speaking etc..
Now, let B ∈ @ and observable event What is the probability of A ∈ @given that we have observed B?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 13 / 43
![Page 34: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/34.jpg)
logo
Bayesian Inference
Uncertainty is quantified with a probability measure
Bayes’ Theorem: Modify our probability measure with evidence
All probability is conditional (to assumptions made, agent speaking etc.)
P(· | H), with H = particular context, agent speaking etc..
Now, let B ∈ @ and observable event What is the probability of A ∈ @given that we have observed B?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 13 / 43
![Page 35: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/35.jpg)
logo
Bayesian Inference
Uncertainty is quantified with a probability measure
Bayes’ Theorem: Modify our probability measure with evidence
All probability is conditional (to assumptions made, agent speaking etc.)
P(· | H), with H = particular context, agent speaking etc..
Now, let B ∈ @ and observable event What is the probability of A ∈ @given that we have observed B?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 13 / 43
![Page 36: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/36.jpg)
logo
We are talking about the event A | H,B and we may calculate itsprobability by means of
P(A | H,B) =P(A ∩ B | H)
P(B | H),
or
P(A | H,B) =P(B | H,A)P(A | H)
P(B | H).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 14 / 43
![Page 37: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/37.jpg)
logo
Let’s look closer at
P(A | H,B) =P(B | H,A)P(A | H)
P(B | H).
P(A | H) we call it a priori probability or “prior”, for A.
P(A | H,B) we call it a posteriori o posterior probability for A, giventhat we have observed B.
P(B | H,A) is our model...How the observables would be if we knewA? How the data B would be if we knew what we don’t know A(unknown parameters, for example)?
P(B | H) is a normalization constant P(· | H,B) is a modifiedmeasure, then we may say that
P(· | H,B) ∝ P(B | H, ·)P(· | H).
Commonly, conditioning on H is only done implicitly.JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 15 / 43
![Page 38: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/38.jpg)
logo
Let’s look closer at
P(A | H,B) =P(B | H,A)P(A | H)
P(B | H).
P(A | H) we call it a priori probability or “prior”, for A.
P(A | H,B) we call it a posteriori o posterior probability for A, giventhat we have observed B.
P(B | H,A) is our model...How the observables would be if we knewA? How the data B would be if we knew what we don’t know A(unknown parameters, for example)?
P(B | H) is a normalization constant P(· | H,B) is a modifiedmeasure, then we may say that
P(· | H,B) ∝ P(B | H, ·)P(· | H).
Commonly, conditioning on H is only done implicitly.JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 15 / 43
![Page 39: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/39.jpg)
logo
Let’s look closer at
P(A | H,B) =P(B | H,A)P(A | H)
P(B | H).
P(A | H) we call it a priori probability or “prior”, for A.
P(A | H,B) we call it a posteriori o posterior probability for A, giventhat we have observed B.
P(B | H,A) is our model...How the observables would be if we knewA? How the data B would be if we knew what we don’t know A(unknown parameters, for example)?
P(B | H) is a normalization constant P(· | H,B) is a modifiedmeasure, then we may say that
P(· | H,B) ∝ P(B | H, ·)P(· | H).
Commonly, conditioning on H is only done implicitly.JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 15 / 43
![Page 40: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/40.jpg)
logo
Let’s look closer at
P(A | H,B) =P(B | H,A)P(A | H)
P(B | H).
P(A | H) we call it a priori probability or “prior”, for A.
P(A | H,B) we call it a posteriori o posterior probability for A, giventhat we have observed B.
P(B | H,A) is our model...How the observables would be if we knewA? How the data B would be if we knew what we don’t know A(unknown parameters, for example)?
P(B | H) is a normalization constant P(· | H,B) is a modifiedmeasure, then we may say that
P(· | H,B) ∝ P(B | H, ·)P(· | H).
Commonly, conditioning on H is only done implicitly.JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 15 / 43
![Page 41: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/41.jpg)
logo
Let’s look closer at
P(A | H,B) =P(B | H,A)P(A | H)
P(B | H).
P(A | H) we call it a priori probability or “prior”, for A.
P(A | H,B) we call it a posteriori o posterior probability for A, giventhat we have observed B.
P(B | H,A) is our model...How the observables would be if we knewA? How the data B would be if we knew what we don’t know A(unknown parameters, for example)?
P(B | H) is a normalization constant P(· | H,B) is a modifiedmeasure, then we may say that
P(· | H,B) ∝ P(B | H, ·)P(· | H).
Commonly, conditioning on H is only done implicitly.JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 15 / 43
![Page 42: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/42.jpg)
logo
Let’s look closer at
P(A | H,B) =P(B | H,A)P(A | H)
P(B | H).
P(A | H) we call it a priori probability or “prior”, for A.
P(A | H,B) we call it a posteriori o posterior probability for A, giventhat we have observed B.
P(B | H,A) is our model...How the observables would be if we knewA? How the data B would be if we knew what we don’t know A(unknown parameters, for example)?
P(B | H) is a normalization constant P(· | H,B) is a modifiedmeasure, then we may say that
P(· | H,B) ∝ P(B | H, ·)P(· | H).
Commonly, conditioning on H is only done implicitly.JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 15 / 43
![Page 43: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/43.jpg)
logo
Posterior Distribution
Suppose a random variable with Xi = 0, 1, that is Xi | p ∼ Be(p)
independent and uncertainty about p ∈ [0, 1] is quantified with f (p) andp ∼ Beta(α, β) a priori. We obtain that X = (X1,X2, . . . ,Xn) and
P(p ≤ p0 | X) =P(X | p ≤ p0)P(p ≤ p0)
P(X).
But
P(X | p ≤ p0)P(p ≤ p0) = P(X, p ≤ p0) =
∫ p0
0f (X, p)dp.
Now f (X, p) = f (X | p)f (p) and then
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 16 / 43
![Page 44: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/44.jpg)
logo
Posterior Distribution
Suppose a random variable with Xi = 0, 1, that is Xi | p ∼ Be(p)
independent and uncertainty about p ∈ [0, 1] is quantified with f (p) andp ∼ Beta(α, β) a priori. We obtain that X = (X1,X2, . . . ,Xn) and
P(p ≤ p0 | X) =P(X | p ≤ p0)P(p ≤ p0)
P(X).
But
P(X | p ≤ p0)P(p ≤ p0) = P(X, p ≤ p0) =
∫ p0
0f (X, p)dp.
Now f (X, p) = f (X | p)f (p) and then
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 16 / 43
![Page 45: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/45.jpg)
logo
P(p ≤ p0 | X) ∝∫ p0
0f (X | p)f (p)dp.
The left hand side of the above expression is the posterior cdf of p, andthus by definition its posterior density is
f (p | X) ∝ f (X | p)f (p).
Moreover
f (X | p) =n∏
i=1
f (Xi |p) = pPn
i=1 Xi (1− p)n−Pn
i=1 Xi
andf (p) = B(α, β)−1pα−1(1− p)β−1,
and then
f (p | X) ∝ p(α+Pn
i=1 Xi )−1(1− p)(β+n−Pn
i=1 Xi )−1.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 17 / 43
![Page 46: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/46.jpg)
logo
Therefore
p | X ∼ Beta
(α +
n∑i=1
Xi , β + n −n∑
i=1
Xi
).
We present some priors and posterior (Beta) for p
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 18 / 43
![Page 47: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/47.jpg)
logo
Therefore
p | X ∼ Beta
(α +
n∑i=1
Xi , β + n −n∑
i=1
Xi
).
We present some priors and posterior (Beta) for p
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 18 / 43
![Page 48: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/48.jpg)
logo
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 19 / 43
![Page 49: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/49.jpg)
logo
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 20 / 43
![Page 50: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/50.jpg)
logo
Example 2
We have a couple that has had 5 pregnancies and all 5 have been male,What is the probability that thire next pregnancy results is female?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 21 / 43
![Page 51: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/51.jpg)
logo
Example 2
1 Are pregnancies independent with respect to the resulting gender?
2 Are there only two possible outputs?
Then the Bernoulli inference model explained above is valid and should beused.Check possibilities in R.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 22 / 43
![Page 52: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/52.jpg)
logo
Example 2
1 Are pregnancies independent with respect to the resulting gender?
2 Are there only two possible outputs?
Then the Bernoulli inference model explained above is valid and should beused.Check possibilities in R.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 22 / 43
![Page 53: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/53.jpg)
logo
Example 2
1 Are pregnancies independent with respect to the resulting gender?
2 Are there only two possible outputs?
Then the Bernoulli inference model explained above is valid and should beused.Check possibilities in R.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 22 / 43
![Page 54: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/54.jpg)
logo
Example 2
1 Are pregnancies independent with respect to the resulting gender?
2 Are there only two possible outputs?
Then the Bernoulli inference model explained above is valid and should beused.Check possibilities in R.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 22 / 43
![Page 55: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/55.jpg)
logo
Example 2
But the question is ...what prior would you use?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 23 / 43
![Page 56: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/56.jpg)
logo
Example 2
But the question is ...what prior would you use?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 23 / 43
![Page 57: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/57.jpg)
logo
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 24 / 43
![Page 58: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/58.jpg)
logo
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 25 / 43
![Page 59: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/59.jpg)
logo
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 26 / 43
![Page 60: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/60.jpg)
logo
Example: Normal sampling
In this case Xi ∼ N(θ, σ2), i = 1, 2, . . . , n (independent) with σ knownand θ ∼ N(θ0, σ
20) a priori :
f (θ | X) ∝ exp−
(θ − θ0)2
2σ20
+n∑
i=1
(xi − θ)2
2σ2
.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 27 / 43
![Page 61: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/61.jpg)
logo
We see that the posterior is of the form exp h(θ), where h(·) is a quadraticfunction of θ. Then θ | X has a Normal distribution. Compleating thesquares we obtain
f (θ | X) ∝ exp
−(θ − θp)2
2σ2p
+ C
,
where σ2p = 1/(σ−2
0 + nσ−2), θp = σ2p(µ0/σ
20 + nm/σ2), m = 1/n
∑ni=1 xi
and C does not depend on θ. Then
θ | X ∼ N(θp, σ2p).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 28 / 43
![Page 62: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/62.jpg)
logo
Point and interval estimation
The main objective of any Bayesian analysis is finding the posteriordistribution of interest. A secondary (although very important issue) ismaking proper outlines of this posterior distribution. For example, if wehave
f (θ1, θ2 | X)
(a bivariate distribution), what would you do if only θ1 is of interest?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 29 / 43
![Page 63: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/63.jpg)
logo
We need the posterior of θ1, and this may be obtained my marginalization,that is
f (θ1 | X) =
∫f (θ1, θ2 | X)dθ2.
This is the so called marginal posterior density of θ1 and etc.
Assuming we have the posterior f (θ | X), we only need to report itsomehow: How would you report the following distributions (see figura 2).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 30 / 43
![Page 64: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/64.jpg)
logo
We need the posterior of θ1, and this may be obtained my marginalization,that is
f (θ1 | X) =
∫f (θ1, θ2 | X)dθ2.
This is the so called marginal posterior density of θ1 and etc.
Assuming we have the posterior f (θ | X), we only need to report itsomehow: How would you report the following distributions (see figura 2).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 30 / 43
![Page 65: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/65.jpg)
logo
−4 −2 0 2 40.
00.
10.
20.
30.
4
Parámetro
Dens
idad
0 5 10 15 20 25 30
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
Parámetro
Dens
idad
(a) (b)
0.0 0.2 0.4 0.6 0.8 1.0
0.6
0.8
1.0
1.2
1.4
Parámetro
Dens
idad
0
0.05
0.1
0.15
0.2
0.25
25 30 35 40 45 50 55 60 65 70
Pro
bab
ilit
y
Number of species
PriorPosterior
(c) (d)
Figure: How would you report the following posterior distributions?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 31 / 43
![Page 66: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/66.jpg)
logo
The concept of (point or interval or else) “estimation” in Bayesianstatistics is only understood as outlines of the relevant posteriordistribution (of course, there are good and bad outlines). Therefore, forexample, point estimation may be understood as making an outline of acomplete probability distribution with a single point, as absurd as this maybe.
We could use the expected value of the posterior distribution.Or we could use the maximum of the posterior distribution, this is the socalled the MAP (Maximum a posteriori).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 32 / 43
![Page 67: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/67.jpg)
logo
The concept of (point or interval or else) “estimation” in Bayesianstatistics is only understood as outlines of the relevant posteriordistribution (of course, there are good and bad outlines). Therefore, forexample, point estimation may be understood as making an outline of acomplete probability distribution with a single point, as absurd as this maybe.
We could use the expected value of the posterior distribution.Or we could use the maximum of the posterior distribution, this is the socalled the MAP (Maximum a posteriori).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 32 / 43
![Page 68: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/68.jpg)
logo
The concept of (point or interval or else) “estimation” in Bayesianstatistics is only understood as outlines of the relevant posteriordistribution (of course, there are good and bad outlines). Therefore, forexample, point estimation may be understood as making an outline of acomplete probability distribution with a single point, as absurd as this maybe.
We could use the expected value of the posterior distribution.Or we could use the maximum of the posterior distribution, this is the socalled the MAP (Maximum a posteriori).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 32 / 43
![Page 69: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/69.jpg)
logo
Calculus
At the end of the day, we will need
1 f (X | θ1, θ2, . . . , θn), a model.
2 f (θ1, θ2, . . . , θn) a prior distribution for parameters.
3 The normalization constant
f (X) =
∫ ∫· · ·∫
f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)dθ1dθ2 · · · dθn
4 To obtain the posterior
f (θ1, θ2, . . . , θn | X) =f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)
f (X).
5 And outlines of these posteriors, like marginal distributions etc.f (θ1 | X) =
∫ ∫· · ·∫
f (θ1, θ2, . . . , θn | X)dθ2dθ3 · · · dθn.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 33 / 43
![Page 70: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/70.jpg)
logo
Calculus
At the end of the day, we will need
1 f (X | θ1, θ2, . . . , θn), a model.
2 f (θ1, θ2, . . . , θn) a prior distribution for parameters.
3 The normalization constant
f (X) =
∫ ∫· · ·∫
f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)dθ1dθ2 · · · dθn
4 To obtain the posterior
f (θ1, θ2, . . . , θn | X) =f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)
f (X).
5 And outlines of these posteriors, like marginal distributions etc.f (θ1 | X) =
∫ ∫· · ·∫
f (θ1, θ2, . . . , θn | X)dθ2dθ3 · · · dθn.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 33 / 43
![Page 71: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/71.jpg)
logo
Calculus
At the end of the day, we will need
1 f (X | θ1, θ2, . . . , θn), a model.
2 f (θ1, θ2, . . . , θn) a prior distribution for parameters.
3 The normalization constant
f (X) =
∫ ∫· · ·∫
f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)dθ1dθ2 · · · dθn
4 To obtain the posterior
f (θ1, θ2, . . . , θn | X) =f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)
f (X).
5 And outlines of these posteriors, like marginal distributions etc.f (θ1 | X) =
∫ ∫· · ·∫
f (θ1, θ2, . . . , θn | X)dθ2dθ3 · · · dθn.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 33 / 43
![Page 72: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/72.jpg)
logo
Calculus
At the end of the day, we will need
1 f (X | θ1, θ2, . . . , θn), a model.
2 f (θ1, θ2, . . . , θn) a prior distribution for parameters.
3 The normalization constant
f (X) =
∫ ∫· · ·∫
f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)dθ1dθ2 · · · dθn
4 To obtain the posterior
f (θ1, θ2, . . . , θn | X) =f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)
f (X).
5 And outlines of these posteriors, like marginal distributions etc.f (θ1 | X) =
∫ ∫· · ·∫
f (θ1, θ2, . . . , θn | X)dθ2dθ3 · · · dθn.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 33 / 43
![Page 73: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/73.jpg)
logo
Calculus
At the end of the day, we will need
1 f (X | θ1, θ2, . . . , θn), a model.
2 f (θ1, θ2, . . . , θn) a prior distribution for parameters.
3 The normalization constant
f (X) =
∫ ∫· · ·∫
f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)dθ1dθ2 · · · dθn
4 To obtain the posterior
f (θ1, θ2, . . . , θn | X) =f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)
f (X).
5 And outlines of these posteriors, like marginal distributions etc.f (θ1 | X) =
∫ ∫· · ·∫
f (θ1, θ2, . . . , θn | X)dθ2dθ3 · · · dθn.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 33 / 43
![Page 74: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/74.jpg)
logo
Hypotheses testing as an inference problem
If θ ∈ Θ our interest is
H1 : θ ∈ Θ1, H2 : θ ∈ Θ2
these hypotheses, where Θ1 y Θ2 form a partition of Θ, that is,Θ1 ∩Θ2 = ∅ y Θ1 ∪Θ2 = Ω. In Bayesian statistics terms, given a modelf (X | θ), a a priori f (θ) and observations X = (X1,X2, . . . ,Xn), Whatcould it mean to “test” the above hypotheses?Remmember:Uncertanty is quantified with a probability measure
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 34 / 43
![Page 75: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/75.jpg)
logo
Hypotheses testing as an inference problem
If θ ∈ Θ our interest is
H1 : θ ∈ Θ1, H2 : θ ∈ Θ2
these hypotheses, where Θ1 y Θ2 form a partition of Θ, that is,Θ1 ∩Θ2 = ∅ y Θ1 ∪Θ2 = Ω. In Bayesian statistics terms, given a modelf (X | θ), a a priori f (θ) and observations X = (X1,X2, . . . ,Xn), Whatcould it mean to “test” the above hypotheses?Remmember:Uncertanty is quantified with a probability measure
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 34 / 43
![Page 76: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/76.jpg)
logo
Let f (θ) an a priori for θ. We calculate
P(Hi ) =
∫Θi
f (θ | X)dθ
and “prefer” or “data support” H1 if P(H1) > P(H2) (equivalently for H2).Moreover, we could have more than two hypotheses
Hi : θ ∈ Θi ,
and we would only require the the corresponding posterior probability foreach of them.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 35 / 43
![Page 77: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/77.jpg)
logo
Let f (θ) an a priori for θ. We calculate
P(Hi ) =
∫Θi
f (θ | X)dθ
and “prefer” or “data support” H1 if P(H1) > P(H2) (equivalently for H2).Moreover, we could have more than two hypotheses
Hi : θ ∈ Θi ,
and we would only require the the corresponding posterior probability foreach of them.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 35 / 43
![Page 78: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/78.jpg)
logo
Example: Hypothese testing
We have an experimental treatment for a condition which is used in 20patients with similar cohort characteristics, from which 15 have recoverfrom the condition (success). The standard treatment has a probability ofsuccess of 50%. The following hypotheses is stated The experimentaltreatment is superior to the standard treatment.
The hypotheses can be translated as
H1 : θ > 0.5, H2 : θ ≤ 0.5
where θ is the probability of success of the experimental treatment. Notmuch is known about the experimental treatment and a uniform (flat;Beta( 1, 1)) prior is used. The corresponding posterior is Beta( 16, 6), seefigure.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 36 / 43
![Page 79: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/79.jpg)
logo
Example: Hypothese testing
We have an experimental treatment for a condition which is used in 20patients with similar cohort characteristics, from which 15 have recoverfrom the condition (success). The standard treatment has a probability ofsuccess of 50%. The following hypotheses is stated The experimentaltreatment is superior to the standard treatment.
The hypotheses can be translated as
H1 : θ > 0.5, H2 : θ ≤ 0.5
where θ is the probability of success of the experimental treatment. Notmuch is known about the experimental treatment and a uniform (flat;Beta( 1, 1)) prior is used. The corresponding posterior is Beta( 16, 6), seefigure.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 36 / 43
![Page 80: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/80.jpg)
logo
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 37 / 43
![Page 81: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/81.jpg)
logo
We have that a priori P(H2) = 0.5 and a posterioriP(H2 | X) = 0.01330185.
H1 : θ > 0.5, H2 : θ ≤ 0.5
We may we (or your) conclude?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 38 / 43
![Page 82: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/82.jpg)
logo
We have that a priori P(H2) = 0.5 and a posterioriP(H2 | X) = 0.01330185.
H1 : θ > 0.5, H2 : θ ≤ 0.5
We may we (or your) conclude?
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 38 / 43
![Page 83: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/83.jpg)
logo
Radiocarbon Calibration, one det.
We have that y ∼ N(µ(θ), σ). Considering the errors in the calibration
curve the model should be yj ∼ N(µ(θ),
√σ(θ)2 + σ2
j
).
Therefore the likelihood is
f (Data|θ) = f (y |θ)
And the posterior is f (θ|y) ∝ f (θ)f (y |θ)
Kf (θ)1√
σ(θ)2 + σ2exp
(y − µ(θ))2
2σ2
,
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 39 / 43
![Page 84: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/84.jpg)
logo
Radiocarbon Calibration, one det.
We have that y ∼ N(µ(θ), σ). Considering the errors in the calibration
curve the model should be yj ∼ N(µ(θ),
√σ(θ)2 + σ2
j
).
Therefore the likelihood is
f (Data|θ) = f (y |θ)
And the posterior is f (θ|y) ∝ f (θ)f (y |θ)
Kf (θ)1√
σ(θ)2 + σ2exp
(y − µ(θ))2
2σ2
,
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 39 / 43
![Page 85: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/85.jpg)
logo
Radiocarbon Calibration, one det.
We have that y ∼ N(µ(θ), σ). Considering the errors in the calibration
curve the model should be yj ∼ N(µ(θ),
√σ(θ)2 + σ2
j
).
Therefore the likelihood is
f (Data|θ) = f (y |θ)
And the posterior is f (θ|y) ∝ f (θ)f (y |θ)
Kf (θ)1√
σ(θ)2 + σ2exp
(y − µ(θ))2
2σ2
,
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 39 / 43
![Page 86: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/86.jpg)
logo
Solution to the Mexica pot problem
1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.
2 It is assumed that the pot was made “around” the same time as thecorn was harvested.
3 Prior information on θ is provided by f (θ).
We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.
From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be
yj ∼ N(µ(θ),
√σ(θ)2 + σ2
j
).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43
![Page 87: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/87.jpg)
logo
Solution to the Mexica pot problem
1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.
2 It is assumed that the pot was made “around” the same time as thecorn was harvested.
3 Prior information on θ is provided by f (θ).
We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.
From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be
yj ∼ N(µ(θ),
√σ(θ)2 + σ2
j
).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43
![Page 88: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/88.jpg)
logo
Solution to the Mexica pot problem
1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.
2 It is assumed that the pot was made “around” the same time as thecorn was harvested.
3 Prior information on θ is provided by f (θ).
We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.
From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be
yj ∼ N(µ(θ),
√σ(θ)2 + σ2
j
).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43
![Page 89: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/89.jpg)
logo
Solution to the Mexica pot problem
1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.
2 It is assumed that the pot was made “around” the same time as thecorn was harvested.
3 Prior information on θ is provided by f (θ).
We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.
From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be
yj ∼ N(µ(θ),
√σ(θ)2 + σ2
j
).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43
![Page 90: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/90.jpg)
logo
Solution to the Mexica pot problem
1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.
2 It is assumed that the pot was made “around” the same time as thecorn was harvested.
3 Prior information on θ is provided by f (θ).
We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.
From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be
yj ∼ N(µ(θ),
√σ(θ)2 + σ2
j
).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43
![Page 91: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/91.jpg)
logo
Solution to the Mexica pot problem
1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.
2 It is assumed that the pot was made “around” the same time as thecorn was harvested.
3 Prior information on θ is provided by f (θ).
We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.
From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be
yj ∼ N(µ(θ),
√σ(θ)2 + σ2
j
).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43
![Page 92: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/92.jpg)
logo
Solution to the Mexica pot problem
1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.
2 It is assumed that the pot was made “around” the same time as thecorn was harvested.
3 Prior information on θ is provided by f (θ).
We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.
From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be
yj ∼ N(µ(θ),
√σ(θ)2 + σ2
j
).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43
![Page 93: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/93.jpg)
logo
Solution to the Mexica pot problem
Therefore the likelihood is
f (Data|θ) = f (y1, y2, . . . , ym|θ) =m∏
j=1
f (yj |θ)
And the posterior is f (θ|y1, . . . , ym) ∝ f (θ)∏m
j=1 f (yj |θ), or
f (θ|y1, . . . , ym) = Kf (θ)m∏
j=1
1√σ(θ)2 + σ2
j
exp
(yj − µ(θ))2
2σ2j
,
where K is a normalizing constant.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 41 / 43
![Page 94: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/94.jpg)
logo
Solution to the Mexica pot problem
Therefore the likelihood is
f (Data|θ) = f (y1, y2, . . . , ym|θ) =m∏
j=1
f (yj |θ)
And the posterior is f (θ|y1, . . . , ym) ∝ f (θ)∏m
j=1 f (yj |θ), or
f (θ|y1, . . . , ym) = Kf (θ)m∏
j=1
1√σ(θ)2 + σ2
j
exp
(yj − µ(θ))2
2σ2j
,
where K is a normalizing constant.
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 41 / 43
![Page 95: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/95.jpg)
logo
Solution to the Mexica pot problem
Four radiocarbon dates are taken from 4 of the maize kernels. Theobtained dates are:
sim1 340 20sim2 370 20sim3 355 20sim4 360 20
The posterior distribution is calculated as above, see next slide, Figure (a).
However, knowledge of basic Mexican history tells us that the Mexicaumpire fell to Conquistador Hernan Cortez in 1521 AD. Including suchprior information we obtain the next slide, Figure (b).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 42 / 43
![Page 96: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/96.jpg)
logo
Solution to the Mexica pot problem
Four radiocarbon dates are taken from 4 of the maize kernels. Theobtained dates are:
sim1 340 20sim2 370 20sim3 355 20sim4 360 20
The posterior distribution is calculated as above, see next slide, Figure (a).
However, knowledge of basic Mexican history tells us that the Mexicaumpire fell to Conquistador Hernan Cortez in 1521 AD. Including suchprior information we obtain the next slide, Figure (b).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 42 / 43
![Page 97: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/97.jpg)
logo
Solution to the Mexica pot problem
Four radiocarbon dates are taken from 4 of the maize kernels. Theobtained dates are:
sim1 340 20sim2 370 20sim3 355 20sim4 360 20
The posterior distribution is calculated as above, see next slide, Figure (a).
However, knowledge of basic Mexican history tells us that the Mexicaumpire fell to Conquistador Hernan Cortez in 1521 AD. Including suchprior information we obtain the next slide, Figure (b).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 42 / 43
![Page 98: Introduction to Bayesian Statistics for non-mathematicians](https://reader036.fdocuments.net/reader036/viewer/2022081602/613d675c736caf36b75ced00/html5/thumbnails/98.jpg)
logo
550 500 450 400 350 300 250
0.000
0.005
0.010
0.015
0.020
0.025
0.030
0.035
Cal. BP
Probability
550 500 450 400 350 300 250
0.000
0.005
0.010
0.015
0.020
0.025
0.030
0.035
Cal. BP
Probability
(a) (b)
Figure: Posterior distribution for the age of the maize kernels, (1) no prior(constant), (b) a priori distribution indicating θ ≥ 429 BP (= 1521 AD).
JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 43 / 43