Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

51
Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th , 2006

Transcript of Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Page 1: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Bayesian Statistics

Lecture 8

Likelihood Methods in Forest Ecology

October 9th – 20th , 2006

Page 2: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

“Real knowledge is to know the extent of one’s ignorance”

-Confucius

Page 3: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

How do we measure our knowledge (ignorance)?

• Scientific point of view: Knowledge is acceptable if it explains a body of natural phenomena (Scientific model).

• Statistical point of view: Knowledge is uncertain but we can use it if we can measure its uncertainty. The question is how to measure uncertainty and make use of available knowledge.

Page 4: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Limitations of likelihoodist & frequentist approaches

• Parsimony is often an insufficient criterion for inference particularly if our objective is forecasting.

• Model selection uncertainty is the big elephant in the room.

• Since parameters do not have probability distributions, error propagation in models cannot be interpreted in a probabilistic manner.

• Cannot deal with multiple sources of error and complex error structures in an efficient way.

• New data require new analyses.

Page 5: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Standard statistics revisited: Complex Variance Structures

Page 6: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Inference

Addresses three basic questions:

1. What do I believe now that I have these data? [Credibility or confidence]

2. What should I do now that I have these data? [Decision]

3. How should I interpret these data as evidence of one hypothesis vs. other competing hypotheses? [Evidence]

Page 7: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Body of knowledge

Scientific Model Scientific Hypothesis

StatisticalModel

DATA

StatisticalHypothesis

Page 8: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Body of knowledge=Fruit production

in trees

Scientific Explanation= physiology,Life history

Scientific Hypothesis

yi = DBH b

StatisticalModel=

Poisson dist.

DATA

StatisticalHypothesisb = valuePred (y)

An example

Page 9: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

The Frequentist Take

b = 0.4

Belief: Only with reference to an infinite series of trialsDecision: Accept or reject that b=0Evidence: None

Body of knowledge=Fruit production

in trees

Scientific Explanation= physiology

Scientific Hypothesis

Log yi = b log(DBH)

StatisticalModel=

Poisson dist.

DATA

StatisticalHypothesis

b 0

Page 10: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

The Likelihodist Take

b = 0.4

Belief: None, only relevant to the data at hand.Decision: Only with reference to alternate modelsEvidence: Likelihood Ratio Test or AIC.

Body of knowledge=Fruit production

in trees

Scientific Explanation= physiology

Scientific Hypothesis

Log yi = b log(DBH)

StatisticalModel=

Poisson dist.

DATA

StatisticalHypothesis

b 0

Page 11: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

The Bayesian Take

b = 0.4

Belief: Credible intervalsDecision: Parameter in or out of a distributionEvidence: None

Body of knowledge=Fruit production

in trees

Scientific Explanation= physiology

Scientific Hypothesis

Log yi = b log(DBH)

StatisticalModel=

Poisson dist.

DATA

StatisticalHypothesis

b 0

Page 12: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Parallels and differences in Bayesian & Frequentist statistics

• Bayesian and frequentist approaches use the data to derive a parameter estimate and a measure of uncertainty around the parameter that can be interpreted using probability.

• In Bayesian inference, parameters are treated as random variables that have a distribution.

• If we know their distribution, we can assess the probability that they will take on a particular value (posterior ratios or credible intervals).

Page 13: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Evidence vs Probability

“As a matter of principle , the infrequency with which, in particular circumstances,

decisive evidence is obtained, should not be confused with the force or cogency, of

such evidence”. Fischer 1959

Page 14: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Frequentist vs Bayesian

• Prob = objective relative frequencies

• Params are fixed unknown constants, so cannot write e.g. P(=0.5|D)

• Estimators should be good when averaged across many trials

• Prob = degrees of belief (uncertainty)

• Can write P(anything|D)

• Estimators should be good for the available data

Source: “All of statistics”, Larry Wasserman

Page 15: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Frequentism

• Probability only defined as a long-term average in an infinite sequence of trials (that typically never happen!).

• p-value is probability of that extreme outcome given a specified null hypothesis.

• Null hypotheses are often strawmen set up to be rejected

• Improperly used p values are poor tools for statistical inference.

• We are interested in parameter estimation rather than p values per se.

Page 16: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Frequentist statistics violates the likelihood principle

“The use of p-values implies that a hypothesis that may be true can be rejected because it has not predicted observable results that have not actually occurred.” Jeffreys, 1961

Page 17: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Some rules of probability

)A(obPr)A|B(obPr)BA(obPr

)A(obPr

)BA(obPr)A|B(obPr

)B(obPr

)BA(obPr)B|A(obPr

)B(obPr)*A(obPr)BA(obPr

)BA(obPr)B(obPr)A(obPr)BA(obPr

assuming independence

A B

Page 18: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Bayes Theorem

)A(obPr

)B(obPr)B|A(obPr)A|B(obPr

)A(obPr)A|B(obPr)BA(obPr

)A(obPr

)BA(obPr)A|B(obPr

)B(obPr

)BA(obPr)B|A(obPr

Page 19: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Bayes Theorem

)Data(obPr

)(obPr)|Data(Lik)Data|(obPr

)Data(obPr

)(obPr)|Data(obPr)Data|(obPr

)Data(obPr

)Hyp(obPr)Hyp|Data(obPr)Data|Hyp(obPr

Page 20: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Bayes Theorem

)Data(obPr

)(obPr)|Data(Lik)Data|(obPr

?

Page 21: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

For a set of mutually exclusive hypotheses…..

)|Data(obPr)(obPr

)(obPr)|Data(Lik

)Data(obPr

)(obPr)|Data(Lik

)Data(obPr

)(obPr)|Data(Lik)Data|(obPr

ii

N

i

iiN

i

ii

iii

11

Page 22: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Bolker

Page 23: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

An example from medical testing

)(obPr

)ill(obPr)ill|(obPr)|ill(obPr

?)test|ill(probisWhat

)healthy|test(obPr

)ill|test(obPr

)ill(obPr

3

6

10

1

10

Page 24: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

An example from medical testing

)(obPr

)ill(obPr)ill|(obPr)|ill(obPr

?)test|ill(probisWhat

)healthy|test(obPr

)ill|test(obPr

)ill(obPr

3

6

10

1

10

Page 25: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

ill

Not ill

Test +

3366 1010101101

x)(x

)healthy|(obPr)healthy(prob)ill|(prob)ill(prob

)healthy(prob)ill(prob)(obPr

Page 26: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Bayes Theorem

)Data(obPr

)(obPr)|Data(Lik)Data|(obPr

Rarely known

Hard to integrate function MCMC methods

Page 27: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Joint and Marginal distributions:Probability that 2 pigeon species (S & R) occupy an island

Events S Sc Marginal

R 2 9 Pr{R}=11/32

Rc 18 3 Pr{Rc}=21/32

Marginal Pr{S}=20/32 Pr{Sc}=11/32 N=32

Diamond 1975

Event Prob Estimate

R given S Prob{R|S} 2/20

S given R Prob{S|R} 2/11

}SPr{

}RPr{}R|SPr{

}SPr{

}S,RPr{}S|RPr{

Page 28: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

dy)y(p)y|x(p

}yPr{}y|xPr{

}xPr{

}yPr{}y|xPr{

}xPr{

}x,yPr{}x|yPr{

Page 29: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Conjugacy

• In Bayesian probability theory, a conjugate prior is a family of prior probability distributions which has the property that the posterior probability distribution also belongs to that family.

• A conjugate prior is an algebraic convenience: otherwise a difficult numerical integration may be necessary.

Page 30: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Jointly distributed random variables

)C(obPr)C|B(obPr)C,B|A(obPr

)C(obPr)C|B,A(obPr)C,B,A(obPr

)parametersall(px

)parametersprocess|process(px

)parametersdata,process|data(p)data|parameters(obPr

We have to normalize this to turn it into a probability

Page 31: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Hierarchical Bayes• Ecological models tend to be high-dimensional and

include many sources of stochasticity.• These sources of “noise” often don’t comply with

assumptions of traditional statistics:– Independence (spatial or temporal)– Balanced among groups– Distributional assumptions

• HB can deal with these problems by partioning a complex problem into a series of univariate distributions for which we can solve –typically using sophisticated computational methods.

Page 32: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Hierarchical Bayes

Page 33: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Clark et al. 2004

Page 34: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Hierarchical Bayes

• Marginal distribution of a parameter averaged over all other parameters and hyperparameters:

)Data|(p)*Data,|(p)*,,|Data(p)Data|,,(p

Page 35: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Hierarchical Bayes: Advantages

1. Complex models can be constructed from simple, conditional relationships. We don’t need an integrated specification of the problem, only the conditional components. We are drawing boxes and arrows.

2. We relax the traditional requirement for independent data. Conditional independence is enough. We typically take up the relationships that cause correlation at a lower process stage. We can accommodate multiple data types within a single analysis, even treating model output as ‘data’.

3. Sampling based approaches (MCMC) can do the integration for us (the thing we avoided in advantage 1).

Page 36: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Useful approach for understanding ecological processes because:

– Incorporates uncertainty using a probabilistic framework

– Model parameters are random variables – output is a probability distribution (the posterior distribution)

– Complex models are partitioned into a hierarchical structure

– Performs well for high-dimensional models (ie - many parameters) with little data

Why Hierarchical Bayes?

Page 37: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Bayes’ Rule

p(|y) = p() * p(y|)

p(y)

LikelihoodPrior

distributionPosterior

distribution

Normalizing density

Posterior distribution is affected by the data only through the likelihood function

If prior distribution is non-informative, then the data dominate the outcome

p(y) = p()p(y|)d

(marginal distribution of y or

prior predictive distribution)

is set of model parameters

y is observed data

Page 38: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

How do we do this? Baby steps: Rejection sampling

• Suppose we have a distribution

Target distribution

-2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

xx

g(x

x)

Page 39: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

• Bound target distribution with a function f(x) so that Cf(x)>=p(x)

• Calculate ratio -2 0 2 4 6

0.0

00

.05

0.1

00

.15

0.2

00

.25

0.3

0xx

1/(

pi *

(1

+ (

xx -

1)^

2))

-2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

xx

g(x

x)

)x(Cf

)x(pa

Page 40: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

• With prob a accept this value of a random draw from p(x). With probability a-1 reject this value of X and repeat the procedure. To do this draw a random variate (z) from the uniform density. If z<a, accept X.

-2 0 2 4 6

0.0

00

.05

0.1

00

.15

0.2

00

.25

0.3

0

xx

1/(

pi *

(1

+ (

xx -

1)^

2))

)x(Cf

)x(pa

Target distribution

Proposed distribution

Page 41: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Histogram of outD

en

sity

-4 -2 0 2 4 6

0.0

00

.05

0.1

00

.15

0.2

00

.25

0.3

0

•Build an empirical distribution of accepted draws which approximates the target distribution.

Theoretical distribution

Smoothed empirical distribution

Page 42: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

MCMC Methods• Markov process – a random process whose next step

depends only on the prior realization (lag of 1)

• The joint probability distribution (p(|y), which is the posterior distribution of the parameters) is generally impossible to integrate in closed form

• So…use a simulation approach based on conditional probability

• The goal is to sample from this joint distribution of all parameters in the model, given the data, (the target distribution) in order to estimate the parameters, but…

• …we don’t know what the target distribution looks like, so we have to make a proposal distribution

Page 43: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Monte Carlo principle• Given a very large set X and a distribution p(x) over it• We draw i.i.d. a set of N samples • We can then approximate the distribution using these

samples

N

i

iN xx

Nx

1

)( )1(1

)(p

X

p(x)

)p(xN

Page 44: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Markov Chain Monte Carlo (MCMC)• Recall again the set X and the distribution

p(x) we wish to sample from

• Suppose that it is hard to sample p(x) but that it is possible to “walk around” in X using only local state transitions

• Insight: we can use a “random walk” to help us draw random samples from p(x)

X

p(x)

Page 45: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

MCMC Methods• Metropolis-Hastings algorithms are a way to

construct a Markov chain in such a way that its

equilibrium (or stationary) distribution is the target

distribution.

– Proposal is some kind of bounding distribution

that completely contains the target distribution

– Acceptance-rejection methods are used to decide

whether a proposed value is accepted or rejected

as being drawn from the target distribution

– Jumping rules determine when and how the

chain moves on to new proposal values

Page 46: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

MCMC• The basic rule is that the ratio of successful jump

probabilities is proportional to the ratio of posterior probabilities.

• This means that over the long term, we stay in areas with high probability and the long-term occupancy of the chain matches the posterior distribution of interest.

)B|Aaccept(P)BAjump(P

)A|Baccept(P)ABjump(P

)B(Post

)A(Post

Page 47: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

MCMC Methods• Eventually, through many proposals that are updated

iteratively (based on jumping rules), the Markov chain will converge to the target distribution, at which time it has reached equilibrium (or stationarity)

• This is achieved after the so-called “burn-in” (“the chain converged”)

• Simulations (proposals) made prior to reaching stationarity (ie - during burn-in) are not used in estimating the target

• Burning questions: When have you achieved stationarity and how do you know??? (some diagnostics, but no objective answer because the target distribution is not known)

Page 48: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

More burning questions• How can you pick a proposal distribution when you

don’t know what the target distribution is?

(this is what M-H figured out!)– Series of proposals depends on a ratio involving

the target distribution, which itself cancels out in the ratio

– So you don’t need to know the target distribution in order to make a set of proposals that will eventually converge to the target

– This is (vaguely) analogous in K-L information theory to not having to “know the truth” in order to estimate the difference between 2 models in their distance from the truth (truth drops out in the comparison)

Page 49: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Posterior Distributions• Assuming the chain converged, you obtain an estimate

for each parameter of its marginal distribution,

p(1| 2, 3 … n, y)

That is, the distribution of 1 , averaged over the distributions for all other parameters in the model & given the data

• This marginal distribution is the posterior distribution that represents the probability distribution of this parameter, given the data & other parameters in the model

• These posterior distributions of the parameters are the basis of inference

Page 50: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

Assessing Convergence

Not converged

• Run multiple chains (chains are independent)

• Many iterations (>2000)• First half are burn-in• “Thin” the chain (take

every xth value; depends on auto-correlation)

• Compare traces of chains

Chain 1

Chain 2

Chain 3

Page 51: Bayesian Statistics Lecture 8 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.

Assessing Convergence

• Estimate , which is the square root of the

ratio of the variance within chains vs.

variance between chains

• At stationarity and as n , 1

• Above ~ 1.1, chains may not have

converged, and a greater number of

simulations is recommended