Bayesian estimation Why and How to Run Your First Bayesian Model

Bayesian estimation

Why and How to Run Your First Bayesian Model

Rens van de Schoot Rensvandeschoot. com

Classical null hypothesis testing

Wainer:

"One Cheer for Null-Hypothesis Significance Testing“

(1999; Psych. Meth., 4, 212-213)

… however …

NHT vs. Bayes

Pr (Data | H0)

≠Pr (Hi | Data)

Bayes Theorem

Pr (Hi | Data) =

Posterior ≈ prior * data

Posterior probability is proportional to the product of the prior probability and the likelihood

Bayes theorem: prior, data and posterior

Bayes Theorem:

Bayes Theorem

Pr (Hi| Data) =

Posterior ≈ prior * data

Posterior probability is proportional to the product of the prior probability and the likelihood

Intelligence (IQ)

IQ-∞ ∞

Prior Knowledgde

IQ-∞ ∞

Intelligence Interval Cognitive Designation

40 - 54 Severely challenged (<1%)

55 - 69 Challenged (2.3% of test takers)

70 - 84 Below average

85 - 114 Average (68% of test takers)

115 - 129 Above average

130 - 144 Gifted (2.3% of test takers)

145 - 159Genius (Less than 1% of test takers)

160 - 175 Extraordinary genius

Prior Knowledgde

IQ40 180

Prior Knowledgde

IQ40 180

Prior Knowledgde

IQ40 180

Prior Knowledgde

IQ40 180

Prior Knowledgde

IQ40 180

Prior Knowledgde

-∞ ∞

IQ-∞ ∞

Posterior

IQ-∞ ∞

Posterior

Prior - Data

IQ40 180

Prior - Data

IQ40 180

Prior Data

How to obtain posterior?

In complex models, the posterior is often intractable (impossible to compute exactly)

Solution: approximate posterior by simulation– Simulate many draws from posterior

distribution– Compute mode, median, mean, 95% interval

et cetera from the simulated draws21

4 unknown parameters μj (j=1,...,4) and one common but unknown σ2.

Statistical model:

Y = μ1*D1 + μ2*D2 + μ3*D3 + μ4*D4 + E

with E ~ N(0, σ2)

ANOVA example

The Gibbs sampler

Specify prior:Pr(μ1, μ2, μ3, μ4, σ2)

Prior (μj) ~ Nor(μ0, var0)

Prior (μj) ~ Nor(0,10000)

Prior (σ2) ~ IG(0.001, 0.001)

Prior is Inverse Gamma (shape), (scale)

The Gibbs sampler

Combine prior with likelihood provides posterior:

Post ( μ1, μ2, μ3, μ4, σ2 | data )

…this is a 5 dimensional distribution…

The Gibbs sampler

Iterative evaluation via conditional distributions:

Post ( μ1 | μ2, μ3, μ4, σ2, data ) ~ Prior (μ1) X Data (μ1)

Post ( σ2 | μ1, μ2, μ3, μ4, data ) ~ Prior (σ2) X Data (σ2)

1.Assign starting values

2.Sample μ1 from conditional distribution

6.Sample σ2 from conditional distribution

7.Go to step 2 until enough iterations

The Gibbs sampler

Iteration

μ1 μ2 μ3 μ4 σ2

1 3.00 5.00 8.00 3.00 10

2 3.75 4.25 7.00 4.30 8

3 3.65 4.11 6.78 5.55 5

. . . . . .

15 4.45 3.19 5.08 6.55 1.1

. . . . . .

199 4.59 3.75 5.21 6.36 1.2

200 4.36 3.45 4.65 6.99 1.3

The Gibbs sampler

Trace plot

Trace plot: posterior

Posterior Distribution

Burn In

Gibbs sampler must run t iterations ‘burn in’ before we reach target distribution f(Z)– How many iterations are needed to

converge on the target distribution? Diagnostics

– Examine graph of burn in– Try different starting values– Run several chains in parallel

Convergence

Conclusion about convergenge

Burn-in: Mplus deletes first half of chain Run multiple chains (Mplus default 2)

– Decrease Bconvergence: default .05

but better use .01

ALWAYS do a graphical evaluation of each and every parameter

Summing up

Probability

Posterior

Informative prior

Non-informative prior

MCMC methods

Convergence

Degree of belief

What is known before observing the

What is known after observing the

Tool to include subjective knowledge

Try to express absence of prior

knowledge

Posterior mainly determined by data

Simulation (sampling) techniques to

obtain the posterior distribution and all

posterior summary measures

Important to check

N = 20 Data are generated Mean = 102 SD = 15

Prior type Prior Variance used Posterior Mean IQ score 95% C.I./C.C.I.

ML 102.00 94.42 – 109.57Prior 1 A 101.99 94.35 – 109.62Prior 2a M or A large variance, SD=100 101.99 94.40 – 109.42Prior2b M or A medium variance, SD=10 101.99 94.89 – 109.07Prior2c M or A small variance, SD=1 102.00 100.12 – 103.87Prior 3A 102.03 94.22 – 109.71Prior 4W medium variance, SD=10 102.00 97.76 – 106.80Prior 5 W small variance, SD=1 102.00 100.20-103.90Prior 6a W Large variance, SD=100 99.37 92.47 – 106.10Prior 6b W medium variance, SD=10 86.56 80.17 – 92.47

Uncertainty in Classical Statistics

Uncertainty = sampling distribution– Estimate population parameter by – Imagine drawing an infinity of samples– Distribution of over samples

Problem is that we have only one sample– Estimate and its sampling distribution– Estimate 95% confidence interval

Inference in Classical Statistics

What does 95% confidence interval actually mean?– Over an infinity of samples, 95% of these

contain the true population value – But we have only one sample– We never know if our present estimate and

confidence interval is one of those 95% or not

Inference in Classical Statistics

What does 95% confidence interval NOT mean?

We have a 95% probability that the true population value is within the limits of our confidence interval

We only have an aggregate assurance that in the long run 95% of our confidence intervals contain the true population value

Uncertainty in Bayesian Statistics

Uncertainty = probability distribution for the population parameter

In classical statistics the population parameter has one single true value

In Bayesian statistics we imagine a distribution of possible values of population parameter

Inference in Bayesian Statistics

What does a95% central credibility interval mean?

We have a 95% probability that the population value is within the limits of our confidence interval

What have we learned so far?

Results are compromise of prior & data

However: -> non/low-informative priors-> informative priors-> misspecification of the prior-> convergence

Results are easier to communicate (eg CCI compared to confidence interval)

Software

WinBUGS/ OpenBUGS Bayesian inference Using Gibbs Sampling Very general, user must set up model

R packages LearnBayes, R2Winbugs, MCMCpack

MLwiN Special implementation for multilevel regression

AMOS Special implementation for SEM

Mplus Very general (SEM + ML + many other models)

MPLUS - ML

DATA: FILE IS data.dat;

VARIABLE: NAMES ARE IQ; ANALYSIS:

ESTIMATOR IS ML;

MODEL: [IQ];

MPLUS – BAYES: default settings

ESTIMATOR IS BAYES;

MODEL: [IQ];

MPLUS – BAYES: default settings

Prior for IQ:

Prior mean = 0 Prior variance/precision = 1010

MPLUS – BAYES: change prior

ESTIMATOR IS BAYES;

MODEL: [IQ] (p1);

ESTIMATOR IS BAYES;

MODEL: [IQ] (p1);

MODEL PRIOR:p1 ~ N(a,b);

a = prior meanb = prior precission

ESTIMATOR IS BAYES;

MODEL: [IQ] (p1);

MODEL PRIOR:p1 ~ N(100,10);

ESTIMATOR IS BAYES;

MODEL: [IQ] (p1);

MODEL PRIOR:p1 ~ N(100,10);

PLOT: type is plot2;

ESTIMATOR IS BAYES;CHAINS = 4;BITERATIONS = (1000);BCONVERGENCE = .01;

MODEL: [IQ] (p1);MODEL PRIOR:

p1 ~ N(100,10);PLOT: type is plot2;

DATA: FILE IS data.dat;VARIABLE: NAMES ARE IQ;ANALYSIS:

ESTIMATOR IS BAYES;CHAINS = 4;BITERATIONS = (1000);BCONVERGENCE = .01;

MODEL: [IQ] (p1);MODEL PRIOR:

p1 ~ N(100,10);PLOT: type is plot2;

OUTPUT: stand sampstat TECH4 TECH8;

Bayesian updating

Dynamic interactionism where adolescents are believed to develop through a dynamic and reciprocal transaction between personality and the environment

Bayesian updating

Dynamic interactionism where adolescents are believed to develop through a dynamic and reciprocal transaction between personality and the environment

In 1998, Asendorpf and Wilpers stated that "empirical evidence on the relative strength of personality

effects on relationships and vice versa is surprisingly limited"

Back in 1998, there had been very few longitudinal studies about personality development. Personality was not often used as outcome variable because it was seen as stable

These authors investigated for the first time personality and relationships over time in a sample of young students (n = 132) after their transition to university. The main conclusion of their analyses was that personality influenced change in social relationships, but not vice versa.

Bayesian updating

In 2001, Neyer and Asendorpf replicated the personality–relationship model, but now using a large representative sample of young adults

Based on the previous results Neyer and Asendorpf “[…] hypothesized that personality effects would

have a clear superiority over relationships effects“

In line with Asendorpf and Wilpers, they concluded that“Path analyses showed that once initial correlations were

controlled, personality traits predicted change in various aspects of social relationships, whereas effects of antecedent relationships on personality were rare and restricted to very specific relationships with one's pre-school children"

Bayesian updating

T1Extraversion

Hypothesized to be >0

Hypothesized to be 0

T2Extraversion

T1Friends

T2Friends

T1Extraversion

T2Extraversion

T1Friends

T2Friends

Bayesian updating

In 2003 Asendorpf and van Aken continued working on studies into personality–relationship transaction The authors stated that

"The aim of the present study was to apply the methodology used by Asendorpf and Wilpers (1998) and Neyer and Asendorpf (2001) to the study of personality–relationship transaction over adolescence, to try to replicate key findings of these earlier studies, particularly the dominance of […] traits over relationship quality“

Asendorpf and van Aken confirmed previous findings:

"The stronger effect was an extraversion effect on perceived support from peers. This result replicates, once more, similar findings in adulthood." (p.653)

Bayesian updating

In 2010, Sturaro, Denissen, van Aken, and Asendorpf, once again, investigated the personality–relationship transaction model

Sturaro et al. found some contradictory results compared to the previously described studies

"[The Five-Factor theory] predicts significant paths from personality to change in social relationship quality, whereas it does not predict social relationship quality to have an impact on personality change. Contrary to our expectation, however, personality did not predict changes in relationship quality"

Bayesian updating

In conclusion, the four papers described above clearly illustrate how theory building works in daily practice.

Asendorpf and Wilpers (1998) started with testing theoretical ideas on the association between personality and social relationships, tracing back to McCrae and Costa (1996),

and although their results were replicated by Neyer and Asendorpf (2001), and Asendorpf and van Aken (2003),

Sturaro, Denissen, van Aken, and Asendorpf (2010) were not able to do so. This latter finding let to re-formulations of the original theoretical ideas.

Bayesian updating

Why not update the results instead of testing the null hypothesis over and over again?

Let’s use Bayesian updating and impost subjective priors

In the first scenario we only focus on those data sets with similar age groups.

Therefore we first re-analyze the data of Neyer and Asendorpf (2001) without using prior knowledge. Thereafter, we re-analyze the data of Sturaro et al. (2010) using prior information based on the data of Neyer and Asendorpf; both data sets contain young adults between 17-30 years of age.

Bayesian updating

Why not update the results instead of testing the null hypothesis over and over again?

Let’s us Bayesian updating and impost subjective priors

In the second scenario we assume the relation between personality and social relationships is independent of age and we re-analyze the data of Sturaro et al. using prior information taken from Neyer and Asendorpf and from Asendorpf and van Aken.

In this second scenario we make a strong assumption, namely that the cross lagged effects for young adolescents are equal to the cross lagged effects of young adults.

This assumption implicates similar developmental trajectories across age groups and indicates a full replication study.

Bayesian updating

T1Extraversion

T2Extraversion

T1Friends

T2Friends

T1Extraversion

T2Extraversion

T1Friends

T2Friends

Scenario 1

Model 1:

Neyer & Asendorpf data

without prior knowledge

Estimate (SD) 95% PPI

β1 0.605 (0.037) 0.532 - 0.676

β2 0.293 (0.047) 0.199 - 0.386

β3 0.131 (0.046) 0.043 -

β4 -0.026

(0.039)

-0.100 -

Scenario 1

Model 1:

Model 2:

Sturaro et al. data

Estimate (SD) 95% PPI Estimate (SD) 95% PPI

β1 0.605 (0.037) 0.532 - 0.676 0.291 (0.063) 0.169 - 0.424

β2 0.293 (0.047) 0.199 - 0.386 0.157 (0.103) -0.042 -

β3 0.131 (0.046) 0.043 -

0.029 (0.079) -0.132 -

β4 -0.026

(0.039)

-0.100 -

0.303 (0.081) 0.144 -

Scenario 1

Model 1:

Model 2:

Sturaro et al. data

Model 3:

Sturaro et al. data

with priors based on Model 1

Estimate (SD) 95% PPI Estimate (SD) 95% PPI Estimate (SD) 95% PPI

β1 0.605 (0.037) 0.532 - 0.676 0.291 (0.063) 0.169 - 0.424 0.337 (0.058) 0.228 - 0.449

β2 0.293 (0.047) 0.199 - 0.386 0.157 (0.103) -0.042 - 0.364 0.287 (0.082) 0.130 - 0.448

β3 0.131 (0.046) 0.043 - 0.222 0.029 (0.079) -0.132 -

0.106 (0.072) -0.038 - 0.247

β4 -0.026 (0.039) -0.100 -

0.303 (0.081) 0.144 - 0.462 0.249 (0.067) 0.111 - 0.375

Scenario 2

Model 4:

Asendorpf & van Aken data

β1 0.512 (0.069) 0.376 - 0.649

β2 0.115 (0.083) -0.049 - 0.277

β3 0.217 (0.106) 0.006 - 0.426

β4 0.072 (0.055) -0.036 - 0.179

Scenario 2

Model 4:

Model 5:

with priors based on Model

β1 0.512 (0.069) 0.376 - 0.649 0.537 (0.059) 0.424 - 0.654

β2 0.115 (0.083) -0.049 - 0.277 0.140 (0.071) 0.005 - 0 .283

β3 0.217 (0.106) 0.006 - 0.426 0.212 (0.079) 0.057 - 0.361

β4 0.072 (0.055) -0.036 - 0.179 0.073 (0.051) -0.030 - 0.171

Scenario 2

Model 4:

Model 5:

Model 6:

Sturaro et al. data

Estimate (SD) 95% PPI Estimate (SD) 95% PPI Estimate (SD) 95% PPI

β1 0.512 (0.069) 0.376 - 0.649 0.537 (0.059) 0.424 - 0.654 0.313 (0.059) 0.199 - 0.427

β2 0.115 (0.083) -0.049 - 0.277 0.140 (0.071) 0.005 - 0 .283 0.246 (0.087) 0.079 - 0.420

β3 0.217 (0.106) 0.006 - 0.426 0.212 (0.079) 0.057 - 0.361 0.100 (0.076) -0.052 - 0.248

β4 0.072 (0.055) -0.036 - 0.179 0.073 (0.051) -0.030 - 0.171 0.259 (0.070) 0.116 - 0.393

Final results Sturaro et al

Scenario 1 Scenario 2

Model 3:

Sturaro et al. data

Model 6:

Sturaro et al. data

β1 0.337 (0.058) 0.228 - 0.449 0.313 (0.059) 0.199 - 0.427

β2 0.287 (0.082) 0.130 - 0.448 0.246 (0.087) 0.079 - 0.420

β3 0.106 (0.072) -0.038 - 0.247 0.100 (0.076) -0.052 - 0.248

β4 0.249 (0.067) 0.111 - 0.375 0.259 (0.070) 0.116 - 0.393

Final results Sturaro et al

Scenario 1 Scenario 2

Model 3:

Sturaro et al. data

Model 6:

Sturaro et al. data

β1 0.337 (0.058) 0.228 - 0.449 0.313 (0.059) 0.199 - 0.427

β2 0.287 (0.082) 0.130 - 0.448 0.246 (0.087) 0.079 - 0.420

β3 0.106 (0.072) -0.038 - 0.247 0.100 (0.076) -0.052 - 0.248

β4 0.249 (0.067) 0.111 - 0.375 0.259 (0.070) 0.116 - 0.393

Model 2:

Sturaro et al. data

0.291 (0.063) 0.169 - 0.424

0.157 (0.103) -0.042 - 0.364

0.029 (0.079) -0.132 - 0.180

0.303 (0.081) 0.144 - 0.462

Conclusions

The updating procedure of both scenarios leads us to conclude that the that using subjective priors decrease confidence intervals.

=> More certainty about the relations

However…

Conclusions

Using subjective priors never changed the real issuenamely that Sturaro et al found opposite effects to Neyer and Asendorpf.

The results supported the robustness of a conclusion that effects occurring between ages 17 and 23 are different from those occurring between ages 18-30, i.e., the clearly higher age in the Neyer and Asendorpf data.

Overall Conclusions

Excellent tool to include prior knowledge if available

Estimates (including intervals) always lie in the sample space if prior is chosen wisely

Results are easier to communicate

Better small-sample performance, large-sample theory not needed

Analyses can be made less computationally demanding

BUT: Bayes doesn’t solve misspecification of the model

Bayesian estimation Why and How to Run Your First Bayesian Model

Documents

Transcript of Bayesian estimation Why and How to Run Your First Bayesian Model

Bayesian Estimation Supersedes the Test · 2012. 7. 9. · Bayesian Estimation Supersedes the t Test John K. Kruschke Indiana University, Bloomington Bayesian estimation for 2 groups

LECTURE 05: BAYESIAN ESTIMATION

Bayesian Estimation of Time Varying Systems

Bayesian Estimation for Continuous-Time

The “Checklist” - 3b. Estimation blending and assessing - Bayesian estimation

Bayesian Estimation of Multilevel Hierarchical Linear ...

Bayesian Learning & Estimation Theory

Bayesian Techniques for Parameter Estimation

Bayesian Estimation of Discrete Duration Models · Bayesian Estimation of Dimete Duration Models ... The results from Bayesian estimation are compared with ... 4a: S pecification

Introduction to Bayesian models with Stata · Bayesian estimation in Stata •Bayesian estimation in Stata is similar to standard estimation, simply prefix command with “bayes:”

BAYESIAN ESTIMATION IN DIFFERENTIAL EQUATION

Bayesian Estimation - · PDF fileIntroduction Bayesian estimation: the basics Priors Evaluating the posterior Bayesian inference and model comparison Bayesian estimation in Dynare

Bayesian Estimation of Epidemiological Models: Methods ...

Bayesian Filterings for Location Estimation

A Bayesian Approach for Transformation Estimation

Successive Bayesian Estimation

BAYESIAN ESTIMATION - Weebly

Bayesian estimation of synaptic physiology from the spectral …karl/Bayesian estimation of... · 2008. 6. 30. · Bayesian estimation of synaptic physiology from the spectral ...

Recursive Bayesian Estimation

Bayesian Divergence Time Estimation – Workshop Lecture