Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and...

130
Logistic regression Weakly informative priors Conclusions Bayesian generalized linear models and an appropriate default prior Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau, and Yu-Sung Su Columbia University 14 August 2008 Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

Transcript of Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and...

Page 1: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Bayesian generalized linear models and anappropriate default prior

Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau, andYu-Sung Su

Columbia University

14 August 2008

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 2: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

Logistic regression

−6 −4 −2 0 2 4 60.0

0.2

0.4

0.6

0.8

1.0 y = logit−1(x)

x

logi

t−1(x

)

slope = 1/4

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 3: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

A clean example

−10 0 10 20

0.0

0.2

0.4

0.6

0.8

1.0

estimated Pr(y=1) = logit−1(−1.40 + 0.33 x)

x

y slope = 0.33/4

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 4: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

The problem of separation

−6 −4 −2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

slope = infinity?

x

y

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 5: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

Separation is no joke!

glm (vote ~ female + black + income, family=binomial(link="logit"))

1960 1968

coef.est coef.se coef.est coef.se

(Intercept) -0.14 0.23 (Intercept) 0.47 0.24

female 0.24 0.14 female -0.01 0.15

black -1.03 0.36 black -3.64 0.59

income 0.03 0.06 income -0.03 0.07

1964 1972

coef.est coef.se coef.est coef.se

(Intercept) -1.15 0.22 (Intercept) 0.67 0.18

female -0.09 0.14 female -0.25 0.12

black -16.83 420.40 black -2.63 0.27

income 0.19 0.06 income 0.09 0.05

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 6: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

bayesglm()

I Bayesian logistic regression

I In the arm (Applied Regression and Multilevel modeling)package

I Replaces glm(), estimates are more numerically andcomputationally stable

I Student-t prior distributions for regression coefs

I Use EM-like algorithm

I We went inside glm.fit to augment the iteratively weightedleast squares step

I Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 7: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

bayesglm()

I Bayesian logistic regression

I In the arm (Applied Regression and Multilevel modeling)package

I Replaces glm(), estimates are more numerically andcomputationally stable

I Student-t prior distributions for regression coefs

I Use EM-like algorithm

I We went inside glm.fit to augment the iteratively weightedleast squares step

I Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 8: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

bayesglm()

I Bayesian logistic regression

I In the arm (Applied Regression and Multilevel modeling)package

I Replaces glm(), estimates are more numerically andcomputationally stable

I Student-t prior distributions for regression coefs

I Use EM-like algorithm

I We went inside glm.fit to augment the iteratively weightedleast squares step

I Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 9: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

bayesglm()

I Bayesian logistic regression

I In the arm (Applied Regression and Multilevel modeling)package

I Replaces glm(), estimates are more numerically andcomputationally stable

I Student-t prior distributions for regression coefs

I Use EM-like algorithm

I We went inside glm.fit to augment the iteratively weightedleast squares step

I Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 10: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

bayesglm()

I Bayesian logistic regression

I In the arm (Applied Regression and Multilevel modeling)package

I Replaces glm(), estimates are more numerically andcomputationally stable

I Student-t prior distributions for regression coefs

I Use EM-like algorithm

I We went inside glm.fit to augment the iteratively weightedleast squares step

I Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 11: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

bayesglm()

I Bayesian logistic regression

I In the arm (Applied Regression and Multilevel modeling)package

I Replaces glm(), estimates are more numerically andcomputationally stable

I Student-t prior distributions for regression coefs

I Use EM-like algorithm

I We went inside glm.fit to augment the iteratively weightedleast squares step

I Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 12: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

bayesglm()

I Bayesian logistic regression

I In the arm (Applied Regression and Multilevel modeling)package

I Replaces glm(), estimates are more numerically andcomputationally stable

I Student-t prior distributions for regression coefs

I Use EM-like algorithm

I We went inside glm.fit to augment the iteratively weightedleast squares step

I Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 13: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

bayesglm()

I Bayesian logistic regression

I In the arm (Applied Regression and Multilevel modeling)package

I Replaces glm(), estimates are more numerically andcomputationally stable

I Student-t prior distributions for regression coefs

I Use EM-like algorithm

I We went inside glm.fit to augment the iteratively weightedleast squares step

I Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 14: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

Regularization in action!

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 15: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

What else is out there?

I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data

I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors

I brlr (Jeffreys-like prior distribution): computationallyunstable

I brglm (improvement on brlr): doesn’t do enough smoothing

I BBR (Laplace prior distribution): OK, not quite as good asbayesglm

I Non-Bayesian machine learning algorithms: understateuncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 16: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

What else is out there?

I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data

I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors

I brlr (Jeffreys-like prior distribution): computationallyunstable

I brglm (improvement on brlr): doesn’t do enough smoothing

I BBR (Laplace prior distribution): OK, not quite as good asbayesglm

I Non-Bayesian machine learning algorithms: understateuncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 17: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

What else is out there?

I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data

I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors

I brlr (Jeffreys-like prior distribution): computationallyunstable

I brglm (improvement on brlr): doesn’t do enough smoothing

I BBR (Laplace prior distribution): OK, not quite as good asbayesglm

I Non-Bayesian machine learning algorithms: understateuncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 18: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

What else is out there?

I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data

I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors

I brlr (Jeffreys-like prior distribution): computationallyunstable

I brglm (improvement on brlr): doesn’t do enough smoothing

I BBR (Laplace prior distribution): OK, not quite as good asbayesglm

I Non-Bayesian machine learning algorithms: understateuncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 19: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

What else is out there?

I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data

I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors

I brlr (Jeffreys-like prior distribution): computationallyunstable

I brglm (improvement on brlr): doesn’t do enough smoothing

I BBR (Laplace prior distribution): OK, not quite as good asbayesglm

I Non-Bayesian machine learning algorithms: understateuncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 20: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

What else is out there?

I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data

I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors

I brlr (Jeffreys-like prior distribution): computationallyunstable

I brglm (improvement on brlr): doesn’t do enough smoothing

I BBR (Laplace prior distribution): OK, not quite as good asbayesglm

I Non-Bayesian machine learning algorithms: understateuncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 21: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Classical logistic regressionThe problem of separationBayesian solution

What else is out there?

I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data

I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors

I brlr (Jeffreys-like prior distribution): computationallyunstable

I brglm (improvement on brlr): doesn’t do enough smoothing

I BBR (Laplace prior distribution): OK, not quite as good asbayesglm

I Non-Bayesian machine learning algorithms: understateuncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 22: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 23: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 24: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 25: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 26: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 27: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 28: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 29: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 30: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 31: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 32: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 33: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 34: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 35: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 36: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 37: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 38: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 39: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Prior distributions

−10 −5 0 5 10

θ

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 40: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Another example

Dose #deaths/#animals

−0.86 0/5−0.30 1/5−0.05 3/5

0.73 5/5

I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9

I Which is truly conservative?

I The sociology of shrinkage

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 41: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Another example

Dose #deaths/#animals

−0.86 0/5−0.30 1/5−0.05 3/5

0.73 5/5

I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9

I Which is truly conservative?

I The sociology of shrinkage

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 42: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Another example

Dose #deaths/#animals

−0.86 0/5−0.30 1/5−0.05 3/5

0.73 5/5

I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9

I Which is truly conservative?

I The sociology of shrinkage

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 43: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Another example

Dose #deaths/#animals

−0.86 0/5−0.30 1/5−0.05 3/5

0.73 5/5

I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9

I Which is truly conservative?

I The sociology of shrinkage

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 44: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Another example

Dose #deaths/#animals

−0.86 0/5−0.30 1/5−0.05 3/5

0.73 5/5

I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9

I Which is truly conservative?

I The sociology of shrinkage

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 45: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Another example

Dose #deaths/#animals

−0.86 0/5−0.30 1/5−0.05 3/5

0.73 5/5

I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9

I Which is truly conservative?

I The sociology of shrinkage

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 46: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Maximum likelihood and Bayesian estimates

Dose

Pro

babi

lity

of d

eath

0 10 20

0.0

0.5

1.0

glmbayesglm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 47: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Conservatism of Bayesian inference

I Problems with maximum likelihood when data showseparation:

I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases

I Is this conservative?

I Not if evaluated by log score or predictive log-likelihood

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 48: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Conservatism of Bayesian inference

I Problems with maximum likelihood when data showseparation:

I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases

I Is this conservative?

I Not if evaluated by log score or predictive log-likelihood

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 49: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Conservatism of Bayesian inference

I Problems with maximum likelihood when data showseparation:

I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases

I Is this conservative?

I Not if evaluated by log score or predictive log-likelihood

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 50: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Conservatism of Bayesian inference

I Problems with maximum likelihood when data showseparation:

I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases

I Is this conservative?

I Not if evaluated by log score or predictive log-likelihood

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 51: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Conservatism of Bayesian inference

I Problems with maximum likelihood when data showseparation:

I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases

I Is this conservative?

I Not if evaluated by log score or predictive log-likelihood

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 52: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Conservatism of Bayesian inference

I Problems with maximum likelihood when data showseparation:

I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases

I Is this conservative?

I Not if evaluated by log score or predictive log-likelihood

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 53: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Which one is conservative?

Dose

Pro

babi

lity

of d

eath

0 10 20

0.0

0.5

1.0

glmbayesglm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 54: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Prior as population distribution

I Consider many possible datasets

I The “true prior” is the distribution of β’s across these datasets

I Fit one dataset at a time

I A “weakly informative prior” has less information (widervariance) than the true prior

I Open question: How to formalize the tradeoffs from usingdifferent priors?

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 55: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Prior as population distribution

I Consider many possible datasets

I The “true prior” is the distribution of β’s across these datasets

I Fit one dataset at a time

I A “weakly informative prior” has less information (widervariance) than the true prior

I Open question: How to formalize the tradeoffs from usingdifferent priors?

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 56: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Prior as population distribution

I Consider many possible datasets

I The “true prior” is the distribution of β’s across these datasets

I Fit one dataset at a time

I A “weakly informative prior” has less information (widervariance) than the true prior

I Open question: How to formalize the tradeoffs from usingdifferent priors?

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 57: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Prior as population distribution

I Consider many possible datasets

I The “true prior” is the distribution of β’s across these datasets

I Fit one dataset at a time

I A “weakly informative prior” has less information (widervariance) than the true prior

I Open question: How to formalize the tradeoffs from usingdifferent priors?

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 58: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Prior as population distribution

I Consider many possible datasets

I The “true prior” is the distribution of β’s across these datasets

I Fit one dataset at a time

I A “weakly informative prior” has less information (widervariance) than the true prior

I Open question: How to formalize the tradeoffs from usingdifferent priors?

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 59: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Prior as population distribution

I Consider many possible datasets

I The “true prior” is the distribution of β’s across these datasets

I Fit one dataset at a time

I A “weakly informative prior” has less information (widervariance) than the true prior

I Open question: How to formalize the tradeoffs from usingdifferent priors?

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 60: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Evaluation using a corpus of datasets

I Compare classical glm to Bayesian estimates using variousprior distributions

I Evaluate using 5-fold cross-validation and average predictiveerror

I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)

I Our Cauchy (0, 2.5) prior distribution is weakly informative!

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 61: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Evaluation using a corpus of datasets

I Compare classical glm to Bayesian estimates using variousprior distributions

I Evaluate using 5-fold cross-validation and average predictiveerror

I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)

I Our Cauchy (0, 2.5) prior distribution is weakly informative!

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 62: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Evaluation using a corpus of datasets

I Compare classical glm to Bayesian estimates using variousprior distributions

I Evaluate using 5-fold cross-validation and average predictiveerror

I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)

I Our Cauchy (0, 2.5) prior distribution is weakly informative!

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 63: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Evaluation using a corpus of datasets

I Compare classical glm to Bayesian estimates using variousprior distributions

I Evaluate using 5-fold cross-validation and average predictiveerror

I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)

I Our Cauchy (0, 2.5) prior distribution is weakly informative!

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 64: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Evaluation using a corpus of datasets

I Compare classical glm to Bayesian estimates using variousprior distributions

I Evaluate using 5-fold cross-validation and average predictiveerror

I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)

I Our Cauchy (0, 2.5) prior distribution is weakly informative!

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 65: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Expected predictive loss, avg over a corpus of datasets

0 1 2 3 4 5

0.29

0.30

0.31

0.32

0.33

scale of prior

−lo

g te

st li

kelih

ood

(1.79)GLM

BBR(l)

df=2.0

df=4.0

df=8.0BBR(g)

df=1.0

df=0.5

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 66: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Priors for other regression models

I Probit

I Ordered logit/probit

I Poisson

I Linear regression with normal errors

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 67: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Priors for other regression models

I Probit

I Ordered logit/probit

I Poisson

I Linear regression with normal errors

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 68: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Priors for other regression models

I Probit

I Ordered logit/probit

I Poisson

I Linear regression with normal errors

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 69: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Priors for other regression models

I Probit

I Ordered logit/probit

I Poisson

I Linear regression with normal errors

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 70: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Priors for other regression models

I Probit

I Ordered logit/probit

I Poisson

I Linear regression with normal errors

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 71: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Other examples of weakly informative priors

I Variance parameters

I Covariance matrices

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 72: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Other examples of weakly informative priors

I Variance parameters

I Covariance matrices

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 73: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Other examples of weakly informative priors

I Variance parameters

I Covariance matrices

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 74: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Other examples of weakly informative priors

I Variance parameters

I Covariance matrices

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 75: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Other examples of weakly informative priors

I Variance parameters

I Covariance matrices

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 76: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models

Other examples of weakly informative priors

I Variance parameters

I Covariance matrices

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 77: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Conclusions

I “Noninformative priors” are actually weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation (bayesglm)

I Why use weakly informative priors rather than informativepriors?

I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 78: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Conclusions

I “Noninformative priors” are actually weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation (bayesglm)

I Why use weakly informative priors rather than informativepriors?

I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 79: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Conclusions

I “Noninformative priors” are actually weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation (bayesglm)

I Why use weakly informative priors rather than informativepriors?

I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 80: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Conclusions

I “Noninformative priors” are actually weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation (bayesglm)

I Why use weakly informative priors rather than informativepriors?

I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 81: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Conclusions

I “Noninformative priors” are actually weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation (bayesglm)

I Why use weakly informative priors rather than informativepriors?

I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 82: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Conclusions

I “Noninformative priors” are actually weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation (bayesglm)

I Why use weakly informative priors rather than informativepriors?

I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 83: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Conclusions

I “Noninformative priors” are actually weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation (bayesglm)

I Why use weakly informative priors rather than informativepriors?

I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 84: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Conclusions

I “Noninformative priors” are actually weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation (bayesglm)

I Why use weakly informative priors rather than informativepriors?

I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 85: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Conclusions

I “Noninformative priors” are actually weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation (bayesglm)

I Why use weakly informative priors rather than informativepriors?

I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 86: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Conclusions

I “Noninformative priors” are actually weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation (bayesglm)

I Why use weakly informative priors rather than informativepriors?

I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 87: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Other examples of weakly informative priors

I Variance parameters

I Covariance matrices

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 88: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Other examples of weakly informative priors

I Variance parameters

I Covariance matrices

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 89: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Other examples of weakly informative priors

I Variance parameters

I Covariance matrices

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 90: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Other examples of weakly informative priors

I Variance parameters

I Covariance matrices

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 91: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Other examples of weakly informative priors

I Variance parameters

I Covariance matrices

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 92: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Other examples of weakly informative priors

I Variance parameters

I Covariance matrices

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 93: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forvariance parameter

I Basic hierarchical model

I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!

I Noninformative uniform prior works better

I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 94: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forvariance parameter

I Basic hierarchical model

I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!

I Noninformative uniform prior works better

I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 95: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forvariance parameter

I Basic hierarchical model

I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!

I Noninformative uniform prior works better

I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 96: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forvariance parameter

I Basic hierarchical model

I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!

I Noninformative uniform prior works better

I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 97: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forvariance parameter

I Basic hierarchical model

I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!

I Noninformative uniform prior works better

I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 98: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Priors for variance parameter: J = 8 groups

σα0 5 10 15 20 25 30

8 schools: posterior on σα givenuniform prior on σα

σα0 5 10 15 20 25 30

8 schools: posterior on σα giveninv−gamma (1, 1) prior on σα

2

σα0 5 10 15 20 25 30

8 schools: posterior on σα giveninv−gamma (.001, .001) prior on σα

2

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 99: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Priors for variance parameter: J = 3 groups

σα0 50 100 150 200

3 schools: posterior on σα givenuniform prior on σα

σα0 50 100 150 200

3 schools: posterior on σα givenhalf−Cauchy (25) prior on σα

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 100: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forcovariance matrices

I Inverse-Wishart has problems

I Correlations can be between 0 and 1

I Set up models so prior expectation of correlations is 0

I Goal: to be weakly informative about correlations andvariances

I Scaled inverse-Wishart model uses redundant parameterization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 101: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forcovariance matrices

I Inverse-Wishart has problems

I Correlations can be between 0 and 1

I Set up models so prior expectation of correlations is 0

I Goal: to be weakly informative about correlations andvariances

I Scaled inverse-Wishart model uses redundant parameterization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 102: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forcovariance matrices

I Inverse-Wishart has problems

I Correlations can be between 0 and 1

I Set up models so prior expectation of correlations is 0

I Goal: to be weakly informative about correlations andvariances

I Scaled inverse-Wishart model uses redundant parameterization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 103: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forcovariance matrices

I Inverse-Wishart has problems

I Correlations can be between 0 and 1

I Set up models so prior expectation of correlations is 0

I Goal: to be weakly informative about correlations andvariances

I Scaled inverse-Wishart model uses redundant parameterization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 104: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forcovariance matrices

I Inverse-Wishart has problems

I Correlations can be between 0 and 1

I Set up models so prior expectation of correlations is 0

I Goal: to be weakly informative about correlations andvariances

I Scaled inverse-Wishart model uses redundant parameterization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 105: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forcovariance matrices

I Inverse-Wishart has problems

I Correlations can be between 0 and 1

I Set up models so prior expectation of correlations is 0

I Goal: to be weakly informative about correlations andvariances

I Scaled inverse-Wishart model uses redundant parameterization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 106: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 107: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 108: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 109: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 110: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 111: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 112: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 113: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 114: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 115: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 116: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 117: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 118: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 119: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 120: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 121: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 122: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “incompatible Gibbs” algorithm

I Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 123: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “incompatible Gibbs” algorithm

I Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 124: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “incompatible Gibbs” algorithm

I Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 125: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “incompatible Gibbs” algorithm

I Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 126: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “incompatible Gibbs” algorithm

I Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 127: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “incompatible Gibbs” algorithm

I Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 128: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “incompatible Gibbs” algorithm

I Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 129: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “incompatible Gibbs” algorithm

I Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior

Page 130: Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.

Logistic regressionWeakly informative priors

Conclusions

ConclusionsExtra stuff

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “incompatible Gibbs” algorithm

I Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior