Calibrated Bayes: an attractive framework for official ... Rod Little.pdf ·...

41
Calibrated Bayes: an attractive framework for official statistics in the 21st century Roderick J. Little

Transcript of Calibrated Bayes: an attractive framework for official ... Rod Little.pdf ·...

  • Calibrated Bayes: an attractive framework for official statistics in the

    21st century

    Roderick J. Little

  • Overview

    • Design-based versus model-based survey inference

    • Current orthodoxy: design-model compromise

    – Strengths and drawbacks

    • An alternative: Calibrated Bayes

    • Two US Census Bureau applications

    – Disclaimer: views are mine, not US Census Bureau

    NTTS 2015: Calibrated Bayes 2

  • Overview

    • Design-based versus model-based survey inference

    • Current orthodoxy: design-model compromise

    – Strengths and drawbacks

    • An alternative: Calibrated Bayes

    • Two US Census Bureau applications

    – Disclaimer: views are mine, not US Census Bureau

    NTTS 2015: Calibrated Bayes 3

  • Survey estimation • Design-based inference: population values are

    fixed, inference is based on probability distribution of sample selection. Obviously this assumes that we have a probability sample (or “quasi-randomization”, where we pretend that we have one)

    • Model-based inference: survey variables are assumed to come from a statistical model: probability sampling is not the basis for inference, but useful for making the sample selection ignorable. (see e.g. Gelman et al., 2003; Little 2004)

    NTTS 2015: Calibrated Bayes 4

  • Design vs model-based survey inference • Two main variants of model-based inference:

    – Superpopulation models: Frequentist inference based on repeated samples from a “superpopulation” model

    – Bayes: add prior distribution for parameters; inference about finite population quantities or parameters based on posterior distribution

    • A fascinating part of the more general debate about frequentist versus Bayesian inference in statistics at large: – Design-based inference is inherently frequentist

    – Purest form of model-based inference is Bayes

    NTTS 2015: Calibrated Bayes 5

  • Design-based inference

    1( ,..., ) = population values (fixed); design variablesNY Y Y Z

    ( , ) = finite population quantityQ Q Y Z

    1( ,..., ) = Sample Inclusion Indicators (random)NI I I

    Ii RST

    1

    0

    ,

    ,

    unit included in sample

    otherwise

    incˆ ˆ( , , ) = sample estimate of q q Y I Z Q

    incˆ ˆ( , , ) = sample estimate of , the variance of V Y I Z V q

    inc part of included in the surveyY Y

    ˆ ˆˆ ˆ1.96 , 1.96 95% confidence interval for q V q V Q NTTS 2015: Calibrated Bayes 6

  • Choice of q̂

    NTTS 2015: Calibrated Bayes

    It is natural to seek an estimate that is -

    However, this kind of optimality is not possible without

    a model (Horvitz and Thompson 1952, Godambe 1955)

    design efficient

    There are many choices of design-consistent estimates ...

    Many survey estimates are motivated by

    Regression model regression estimator

    Ratio model rat

    mod

    io

    els:

    estimator, etc.

    implicit

    Seek good design-based properties:

    ˆ : ( | ) (too strong)

    ˆOr weaker: : as sample size gets large

    design unbiasedness E q Y Q

    design consistency q Q

    7

  • Limitations of design-based approach

    • Inference is based on probability sampling, but true probability samples are harder and harder to come by:

    – Noncontact, nonresponse is increasing

    – Face-to-face interviews increasingly expensive

    – High proportion of available information is now not based on probability samples (e.g. internet, administrative data)

    • Theory is basically asymptotic -- limited tools for small samples, e.g. small area estimation

    NTTS 2015: Calibrated Bayes 8

  • Asymptotia Highlands

    Murky sub-asymptotial forests

    How many

    more to reach the promised

    land of

    asymptotia?

    Design-based methods live in the land of asymptotia 9

  • Model-based approaches • In model-based, or model-dependent, approaches,

    models are the basis for the entire inference: estimator, standard error, interval estimation

    • Two variants:

    – Superpopulation modeling

    – Bayesian (full probability) modeling

    • Common theme is to “infer” or “predict” about non-sampled portion of the population, conditional on the sample and model

    • Superpopulation is super, but Bayes is better … for small samples

    NTTS 2015: Calibrated Bayes 10

  • Bayes inference for surveys

    inc

    Model: ( | ) = prior distribution for

    Data: ampled values of ; = design variables

    p Y Z Y

    Y s Y Z

    inc

    Inference about ( , ) are based on

    posterior predictive distribution ( ( , ) | , )

    Q Q Y Z

    p Q Y Z Y Z

    inc

    inc

    In particular:

    ˆOne estimate is posterior mean: ( | , )

    Standard error is posterior sd: ( | , )

    95% posterior probability interval plays role

    of confidence interval (with a simpler interpretat

    q E Q Y Z

    Var Q Y Z

    ion)

    NTTS 2015: Calibrated Bayes 11

  • Inference about is then obtained from its posterior

    distribution, computed via Bayes’ Theorem:

    Parametric models

    Usually prior distribution is specified via parametric models:

    ( | ) ( | , ) ( | )p Y Z p Y Z p Z d

    ( | , ) = parametric model, as in superpopulation approachp Y Z

    ( | ) = prior distribution for p Z

    That is: Posterior = Prior x Likelihood

    inc inc

    inc

    ( | , ) ( | ) ( | , )

    ( | , ) Likelihood function

    p Y Z p Z L Y Z

    L Y Z

    NTTS 2015: Calibrated Bayes 12

  • Example. Spline model on weights

    Z Y Z Sample Population

    HT

    1

    1/ ; selection prob

    n

    i i i

    i

    y yN

    mod

    1 1

    2 2

    A modeling alternative to the HT estimator is create

    predictions from a more robust model relating to :

    1ˆ ˆ= , predictions from model, e.g.:

    ~ Nor( , ); leads to

    n N

    i i i

    i i n

    i i i

    Y Z

    y y y yN

    y

    HT

    2

    ~ Nor( ( ), ); ( ) = penalized spline of on

    Simulations in Zheng and Little (2005) suggest better RMSE,

    confidence coverage for spline model compared with

    design-based approaches

    k

    i i i i

    y

    y S S Y Z

    NTTS 2015: Calibrated Bayes 13

  • The model-based perspective- pros

    • Flexible, unified approach for all survey problems

    – Models for nonresponse, response and matching errors, small area models, combining data sources

    • Bayesian approach is not asymptotic, provides better small-sample inferences

    • Probability sampling is justified as making sampling mechanism ignorable, improving robustness

    NTTS 2015: Calibrated Bayes 14

  • Models bring survey inference closer to

    the statistical mainstream

    B/F Gorilla

    Follow my (frequentist)

    statistical standards

    Why? I am an

    economist, I

    build models!

    15 NTTS 2015: Calibrated Bayes

  • The model-based perspective- cons

    • Explicit dependence on the choice of model, which has subjective elements (but assumptions are explicit, not buried in a formula)

    • Bad models provide bad answers – justifiable concerns about the effect of model misspecification

    • Models are needed for all survey variables – need to understand the data, and potential for more complex computations

    NTTS 2015: Calibrated Bayes 16

  • Overview

    • Design-based versus model-based survey inference

    • Current orthodoxy: design-model compromise

    – Strengths and drawbacks

    • An alternative: Calibrated Bayes

    • Two US Census Bureau applications

    – Disclaimer: views are mine, not US Census Bureau

    NTTS 2015: Calibrated Bayes 17

  • The current “status quo” -- design-

    model compromise • Design-based for large samples, descriptive statistics

    – But may be model assisted, e.g. regression calibration:

    – model estimates adjusted to protect against misspecification, (e.g. Särndal, Swensson and Wretman 1992).

    • Model-based for small area estimation, nonresponse, time series,…

    • Attempts to capitalize on best features of both paradigms… but … at the expense of “inferential schizophrenia” (Little 2012)?

    NTTS 2015: Calibrated Bayes 18

    GREG

    1 1

    ˆ ˆ ˆ ˆ( ) / , model predictionN N

    i i i i i i

    i i

    T y I y y y

  • Example: when is an area “small”?

    n

    -

    o

    m

    e

    t

    e

    r

    Design-based inference

    -----------------------------------

    Model-based inference

    n0 = “Point of

    inferential

    schizophrenia”

    How do I choose n0?

    If n0 = 35, should my entire statistical philosophy

    and inference be different when n=34 and n=36? n=36, CI: [ ] (wider since based on direct estimate)

    n=34, CI: [ ] (narrower since based on model)

    NTTS 2015: Calibrated Bayes 19

  • Multilevel (hierarchical Bayes) models

    n

    -

    o

    m

    e

    t

    e

    r

    Bayesian multilevel model estimates borrow

    strength increasingly from model as n decreases

    ˆ(1 )a a a a aw y w

    aw

    1

    0

    Sample size n

    Model estimate

    Direct estimate

    NTTS 2015: Calibrated Bayes 20

  • Overview

    • Design-based versus model-based survey inference

    • Current orthodoxy: design-model compromise

    – Strengths and drawbacks

    • An alternative: Calibrated Bayes

    • Two US Census Bureau applications

    – Disclaimer: views are mine, not US Census Bureau

    NTTS 2015: Calibrated Bayes 21

  • An alternative paradigm: Calibrated Bayes • Frequentists should be Bayesian

    – Bayes is optimal under a correctly specified model

    • Bayesians should be frequentist

    – We never know the model (and all models are wrong)

    – Inferences should be robust to misspecification, have good repeated sampling characteristics

    • Calibrated Bayes (Box 1980, Rubin 1984, Little 2006, 2012, 2013)

    – Inference based on a Bayesian model

    – Model chosen to yield inferences that are well-calibrated in a frequentist sense

    – Aim for posterior probability intervals that have (approximately) nominal frequentist coverage

    NTTS 2015: Calibrated Bayes 22

  • NTTS 2015: Calibrated Bayes 23

    Bayes/frequentist compromises

    “I believe that … sampling theory is needed for exploration and ultimate criticism of the entertained model in the light of the current data, while Bayes’ theory is needed for estimation of parameters conditional on adequacy of the model.”

    George Box (1980)

  • Calibrated Bayes “The applied statistician should be

    Bayesian in principle and calibrated to the real world in practice – appropriate frequency calculations help to define such a tie.”

    NTTS 2015: Calibrated Bayes 24

    “… frequency calculations are useful for making Bayesian statements scientific, … in the sense of capable of being shown wrong by empirical test; here the technique is the calibration of Bayesian probabilities to the frequencies of actual events.”

    Rubin (1984)

  • NTTS 2015: Calibrated Bayes

    Calibrated Bayes models for surveys should

    incorporate sample design features

    • The “Calibrated” part of Calibrated Bayes requires robust models with good repeated sampling properties:

    • Generally weak priors that are dominated by the likelihood (“objective Bayes”)

    • Models that incorporate sampling design features:

    – Capture design weights and stratifying variables as covariates in the prediction model (e.g. Gelman 2007)

    – Clustering via hierarchical random effects models

    25

  • Overview

    • Design-based versus model-based survey inference

    • Current orthodoxy: design-model compromise

    – Strengths and drawbacks

    • An alternative: Calibrated Bayes

    • Two US Census Bureau applications

    – Disclaimer: views are mine, not US Census Bureau

    NTTS 2015: Calibrated Bayes 26

  • Applications

    • Voting Rights Act special tabulation

    • The American Community Survey (ACS) and the “standard error error”

    NTTS 2015: Calibrated Bayes 27

  • Voting Rights Act Special

    Tabulation

    • Section 203 Language Provisions of the Voting Rights Act

    • Determines counties and townships required to provide language assistance at the polls

    • Determinations are based in part on the following “more than 5%” provision:

    … More than 5 percent of voting age citizens of political district are members of a single language minority and are Limited English Proficient (LEP).

    28 NTTS 2015: Calibrated Bayes

  • Voting Rights Act Tabulations • Previously used direct estimates from Long Form

    Decennial Census Data • Used ACS 2005-2009 and 2010 Census data to

    produce estimates by fall 2011 • Direct estimates for some districts are based on small

    ACS sample and hence have unacceptably high variance

    • E.g. let P be proportion of voting age citizens in political district who are members of a single language minority and are Limited English Proficient

    • Suppose ACS was a simple random sample, a direct estimate of P is the sample proportion m/n – District A with n=105, m=5, m/n < 0.05 – District B with n=105, m=6, m/n > 0.05 – Direct ACS estimation is more complex, but same idea applies

    NTTS 2015: Calibrated Bayes 29

  • Voting Rights Tabulations • Overview of approach to the “more than 5%” provision:

    • Build a district level regression model to predict P based on variables in the ACS

    • Classify districts into classes with similar predicted P based on the model [predictive mean stratification]

    • Within classes, apply a Beta-Binomial model that pulls the direct ACS estimate of P towards the average P for districts in that class

    • Compare Beta-Binomial model estimate with 5% for this aspect of the determination

    • Rationale: increased precision of Beta-Binomial estimates in small samples increases the probability of getting the determination right, particularly in small districts • See Joyce et al. (2014)

    NTTS 2015: Calibrated Bayes 30

  • • Small p and n, posterior distribution is skewed to right

    mode median mean

    • What’s the right point estimate: median, mode, mean? Bayes forces a choice …

    • Design-based, superpopulation model approaches fail to address the issue

    – Maximum likelihood is equivalent to mode with flat prior, which does not correspond to a sensible loss function

    Bayes forces a loss function

    NTTS 2015: Calibrated Bayes 31

  • American Community Survey • US Census Bureau is making available thousands

    of ACS tables, with millions of cells

    • A high fraction of these estimates are based on very little data, and hence are very noisy

    – Many people want information, not data, so ACS should produce information products, as well as data products

    – When noise swamps the signal, the information content is buried

    – Data products are highly constrained by confidentiality requirements, leading to incompleteness

    NTTS 2015: Calibrated Bayes 32

  • The Statistical Problem • The ACS philosophy is essentially to produce

    “direct” (“design-based”) estimates, together with margins of error

    • This works fine with large samples, but most of the ACS estimates are based on small samples

    – The estimates are often too noisy to be useful

    – The confidence intervals derived from the estimates and margins of error are known to be of poor quality, violating statistical standards

    • Intervals include proportions outside the range (0,1)

    • Intervals do not have nominal coverage

    NTTS 2015: Calibrated Bayes 33

  • The “standard error” error

    • ACS reports estimates and margins of error that yield asymptotic 90% confidence intervals

    • But in small samples, the implied confidence intervals do not have the stated coverage; so

    • Seek to replaces estimates and margins of error by posterior means and 5% to 95% credibility intervals that have the approximately the nominal coverage

    • A non-Bayesian can interpret the posterior means as estimates, and the 90% credibility intervals as 90% confidence intervals.

    NTTS 2015: Calibrated Bayes 34

  • 35

    Binary outcome: Schmertmann example

    Margins of

    error exceed

    the estimates

  • Data for example

    NTTS 2015: Calibrated Bayes 36

    outcome (e.g. poverty)

    covariates (e.g. categorized age=a, gender = g, stratum = h)

    In county :

    sample count with age=a, gender = g, stratum = h

    sample count in poverty with age=a, ge

    aghc

    aghc

    Y

    x

    c

    n

    x

    nder = g, stratum = h

    ˆ / sample proportionaghc aghc aghcp x n

  • Fully Bayesian model

    NTTS 2015: Calibrated Bayes 37

    *

    | ~ Bin( , )

    ~ Beta( , ) Beta ( , )

    [Assumption: ]

    | ~ Beta , (1 )

    aghc aghc aghc aghc

    aghc agh agh agh

    agh agh agh

    aghc aghc aghc agh aghc aghc agh

    x p p n

    p

    p x x n x

    Key is how to determine prior parameters , (or , )

    (a) Empirical Bayes: estimate prior parameters, then treat as if known

    Simple beta intervals, but understates uncertainty

    agh agh agh

    (b) Full Bayes: Incorporate uncertainty of prior parameter estimates

    More work, but better reflects uncertainty; Consider approximations,

    since full Bayes seems computationally complex

  • Pragmatic “pseudo-Bayes” approach

    Tom Louis suggested this simple “Bayes-like” approach:

    A. Compute design-based estimate of proportion and standard error using existing methods

    B. Pretend data are binomial with number of successes x* and sample size n* that lead to the estimates in A.

    C. Compute Beta posterior distribution with noninformative prior (e.g. uniform or Jeffreys)

    D. Compute 90% posterior credibility interval based on this Beta posterior (reflects asymmetry, always between 0 and 1)

    Simple to implement and easily beats standard Wald-type confidence intervals in simulations (Franco, Little, Louis and Slud 2015, in preparation)

    NTTS 2015: Calibrated Bayes 38

  • Barriers to Calibrated Bayes • It’s a major paradigm shift

    • It’s too much work/computation

    – but this concern is alleviated by gains in computing power and advances in Bayesian computational methods

    • More explicit dependence on the choice of model -- concerns with model misspecification

    – “Design-based is model-free and hence robust…model-based requires models, which are inherently subjective”

    • But models are essential for today’s data, and

    • a judicious Calibrated Bayes model is robust and incorporates key design features – and would bring official statistics back in the statistical mainstream

    NTTS 2015: Calibrated Bayes 39

  • References 1 Box, G.E.P. (1980), Sampling and Bayes inference in scientific modeling and robustness (with discussion), JRSSA, 143, 383-430.

    Joyce, P.M., Malec, D., Little, R.J., Gilary, A., Navarro, A. and Asiala, M.E. (2014). Statistical Modeling Methodology for the Voting Rights Act Section 203 Language Assistance Determinations. JASA, 109, 36-47.

    Gelman, A. (2007). Struggles with survey weighting and regression modeling. Statist. Sci., 22, 2, 153-164 (with discussion and rejoinder).

    Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2003), Bayesian Data Analysis, 2nd. edition. New York: CRC Press.

    Godambe, V.P. (1955). A unified theory of sampling from finite populations. JRSSB, 17, 269-278.

    Horvitz, D.G. & Thompson, D.J. (1952). A generalization of sampling without replacement from a finite universe. JASA, 47, 663-685.

    Little, R.J.A. (2004). To Model or Not to Model? Competing Modes of Inference for Finite Population Sampling. JASA, 99, 546-556. NTTS 2015: Calibrated Bayes 40

  • References 2

    Little, R.J.A. (2006). Calibrated Bayes: A Bayes/frequentist roadmap. Am. Statist., 60, 3, 213-223

    _____ (2012). Calibrated Bayes: an alternative inferential paradigm for official statistics (with discussion and rejoinder). JOS, 28, 3, 309-372.

    _____ (2013). Survey Sampling: Past Controversies, Current Orthodoxies, and Future Paradigms. In Past, Present and Future of Statistical Science, COPSS 50th Anniversary Volume, X. Lin, D. L. Banks, C. Genest, G. Molenberghs, D.W. Scott, and J.-L. Wang, eds. CRC Press.

    Rubin, DB (1984), Bayesianly justifiable and relevant frequency calculations for the applied statistician, Annals Statist. 12, 1151-1172.

    Särndal, C.-E., Swensson, B. & Wretman, J.H. (1992), Model Assisted Survey Sampling, Springer Verlag: New York.

    Zheng, H. & Little, R.J. (2005). Inference for the population total from probability-proportional-to-size samples based on predictions from a penalized spline nonparametric model. JOS, 21, 1-20.

    NTTS 2015: Calibrated Bayes 41