Bayesian Model Averaging and Model Searchjltobias/BayesClass/... · Bayesian framework are often...
Transcript of Bayesian Model Averaging and Model Searchjltobias/BayesClass/... · Bayesian framework are often...
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Bayesian Model Averaging and Model Search
Econ 690
Purdue University
February 19, 2010
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Outline
1 Motivation
2 Basic g -prior Results with BMA
3 Markov Chain Monte Carlo Model Composition: MC 3
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Motivation for Model Averaging
In Frequentist econometrics, the dominant paradigm is to select aparticular model.
The process used in formulating the model is not typically madetransparent, and the resulting inference does not typically take intoaccount the uncertainty associated with the selection of the modelitself.
In other words, a variety of pretest-type procedures are employedto select a model, and then, conditioned on that selection,statistics are reported.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
As an alternative to model selection, many Bayesians prefer toaverage across models rather than select a single model.
To fix ideas, consider a binary choice example: We could use aprobit, a logit, a regression with Student-t errors, a mixture ofNormals, or other methods like the log-log link or skew Normalmodels.
Each of these specifications will generate a prediction regardingsome parameter of interest.
Instead of selecting one of these, might it be more sensible ingeneral to obtain a prediction from each model and then combinethese predictions in some “optimal” way?
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Procedures for combining model-specific estimates within theBayesian framework are often termed exercises in Bayesian ModelAveraging.
The motivation for Bayesian model averaging (BMA), followsimmediately from the laws of probability.
That is, if we let Mr for r = 1, ..,R denote the R different modelsunder consideration and φ be a vector of parameters which has acommon interpretation in all models, then the rules of probabilitysuggest
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Alternatively, if g (φ) is a function of φ, the rules of conditionalexpectation imply that:
In words, the logic of Bayesian inference says that:
It can also be shown that averaging over all possible models in thisfashion produces better predictive ability (under a logarithmicscoring rule) than the selection of any particular model [e.g.,Raftery, Madigan and Hoeting (1997)].
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
OK, so what’s the big deal? We already talked about calculatingexpectations, or marginal posteriors within a given model, as wellas calculating marginal likelihoods for a range of possible models.
So, is there really anything more that is required to implement aBMA procedure?
It is important to recognize that, in many empirical examples, thenumber of models under consideration is simply too large forevaluation of, e.g., a marginal likelihood for every model.
For instance, in the regression model, one has K potentialexplanatory variables, leading to R = 2K possible models!
For example, with K = 20, we have R = 1, 048, 576 possiblespecifications!
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
In this lecture, we discuss some possible computationalstrategies for carrying out this model comparison exercise.
To this end, we first introduce benchmark-type priors that areoften used in these analyses and thereby discuss the g -prior ofZellner (1986).
We then discuss the technique of Markov Chain Monte CarloModel Composition as a potentially useful device for modelexploration when the number of possible models, R is large.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Basic g-prior Results
Consider the regression model:
(here, we parameterize in terms of the error precision h−1 ratherthan the variance σ2) under the priors
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
In addition, select
with
and
This prior is termed the g -prior by Zellner (1986). Fernandez, Leyand Steel (2001) consider the use of this prior for BMA purposesand suggest the use of gr = 1/n so that the information in theprior is roughly the same as the informational content in oneobservation.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Note, in the above that r denotes a particular model, and βr
denotes a particular configuration of possible β coefficients.
Along these lines, Xr denotes the stacked covariate matrix formodel r . For example, if model r only has the β1 and β2, thenXr = [x1 x2].
Fernandez et al also recommend standardizing all explanatoryvariables to have mean zero so that the intercept α has acommon interpretation across models (as the unconditionalmean of y).
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
With this construction, it follows that the marginal posteriordistribution of βr is multivariate Student-t with mean
with
and covariance matrix:
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
where, in the above quantities we define
s2r =
1gr+1y ′PXr y + gr
gr+1 (y − y ιN)′ (y − y ιN)
ν,
wherePXr = IN − Xr
(X ′rXr
)−1X ′r .
and ν = n.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Similarly, with a bit of algebra, and following similar steps to thosegiven in our earlier lecture on marginal likelihoods and hypothesistesting in the linear model, we can calculate:
Posterior model probabilities can then be calculated using:
This constant can, when required, be obtained by summing over allthe (unnormalized) marginal likelihoods from each candidatemodel.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
The foregoing suggests how one could implement a BMAprocedure in practice:
For the linear model, employ the Fernandez et al (2001) priors.
Consider every possible model, as defined by theexclusion/inclusion of each covariate.
For each model and a given object of interest, calculate itsposterior mean, or marginal posterior distribution usingproperties of the multivariate Student-t distribution.
Calculate the (unnormalized) marginal likelihood using theformula given above
Normalize the marginal likelihoods into posterior modelprobabilities (under equal prior odds) and calculatemodel-averaged posterior means or marginal posteriors.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
We now illustrate how BMA (in conjunction with a method wedescribe later) can be used in practice using a real data set.
The data set used is taken from Fernandez, Ley and Steel (2001a)and is available from the Journal of Applied Econometrics dataarchive (www.econ.queensu.ca/jae).
This data set covers N = 72 countries and contains K = 41potential explanatory variables.
The dependent variable is average per capita GDP growth for theperiod 1960-1992.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
In the following table, “BMA Post. Prob.” can be interpreted asthe probability that the corresponding explanatory variable shouldbe included.
That is, it is the sum of all the posterior model probabilities forthose models that actually include the given covariate.
The other two columns of the table contain posterior means andstandard deviations for each regression coefficient, averaged acrossmodels. Remember that models where a particular explanatoryvariable is excluded are interpreted as implying a zero value for itscoefficient. Hence, the averages involve some terms whereE
[g (φ) |y ,M(s)
]is calculated and others where the value of zero
is used.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Bayesian Model Averaging ResultsExplanatoryVariable
BMAPost. Prob.
PosteriorMean
PosteriorSt. Dev
Primary School Enrolment 0.207 0.004 0.010
Life expectancy 0.935 0.001 3.4 × 10−4
GDP level in 1960 0.998 −0.016 0.003Fraction GDP in Mining 0.460 0.019 0.023Degree of Capitalism 0.452 0.001 0.001No. Years Open Economy 0.515 0.007 0.008
% of Pop. Speaking English 0.068 −4.3 × 10−4 0.002
% of Pop. Speaking Foreign Lang. 0.067 2.9 × 10−4 0.001
Exchange Rate Distortions 0.081 −4.0 × 10−6 1.7 × 10−5
Equipment Investment 0.927 0.161 0.068Non-equipment Investment 0.427 0.024 0.032
St. Dev. of Black Market Premium 0.049 −6.3 × 10−7 3.9 × 10−6
Outward Orientation 0.039 −7.1 × 10−5 5.9 × 10−4
Black Market Premium 0.181 −0.001 0.003
Area 0.031 −5.0 × 10−9 1.1 × 10−7
Latin America 0.207 −0.002 0.004Sub-Saharan Africa 0.736 −0.011 0.008Higher Education Enrolment 0.043 −0.001 0.010Public Education Share 0.032 0.001 0.025
Revolutions and Coups 0.030 −3.7 × 10−6 0.001
War 0.076 −2.8 × 10−4 0.001
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Bayesian Model Averaging ResultsExplanatoryVariable
BMAPost. Prob.
PosteriorMean
PosteriorSt. Dev
Political Rights 0.094 −1.5 × 10−4 0.001
Civil Liberties 0.127 −2.9 × 10−4 0.001
Latitude 0.041 9.1 × 10−7 3.1 × 10−5
Age 0.083 −3.9 × 10−6 1.6 × 10−5
British Colony 0.037 −6.6 × 10−5 0.001Fraction Buddhist 0.201 0.003 0.006
Fraction Catholic 0.126 −2.9 × 10−4 0.003Fraction Confucian 0.989 0.056 0.014
Ethnolinguistic Fractionalization 0.056 3.2 × 10−4 0.002
French Colony 0.050 2.0 × 10−4 0.001Fraction Hindu 0.120 −0.003 0.011
Fraction Jewish 0.035 −2.3 × 10−4 0.003Fraction Muslim 0.651 0.009 0.008
Primary Exports 0.098 −9.6 × 10−4 0.004Fraction Protestant 0.451 −0.006 0.007Rule of Law 0.489 0.007 0.008
Spanish Colony 0.057 2.2 × 10−4 1.5 × 10−3
Population Growth 0.036 0.005 0.046
Ratio Workers to Population 0.046 −3.0 × 10−4 0.002
Size of Labor Force 0.072 6.7 × 10−9 3.7 × 10−8
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
As discussed earlier, however, BMA has its limitations.
In particular, when K is large, calculation of something like aposterior mean and posterior model probability for every modelbecomes nearly infeasible.
Indeed, in the above table, we did not calculate the posteriorprobability for each model, since there are more than 40 covariates.
Instead, we employed an alternate procedure that may be able tomore quickly determine those models receiving high posteriorprobability.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Markov Chain Monte Carlo Model Composition: MC 3
Should this be (MC )3 instead of MC 3?Consider the following algorithm, as described by Madigan andYork (1995):
Let the model space be denoted as {Mr} for r = 1, ..,R. This canbe expressed in terms of a K × 1 vector γ = (γ1, .., γK )′ where allelements are either 0 or 1.
Models defined by γj = 1 indicate the j th explanatory variableenters the model (else γj = 0).
There are 2K possible configurations of γ and the space of thisparameter is equivalent to the model space.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Consider the following M − H type routine for sampling over themodel space:
Suppose we are currently “at” M(s−1) in our sampler, withassociated current set of parameter values γ(s−1).
Now, consider all re-configurations of M(s−1) that includeboth the model itself, as well as all other potential modelsthat differ from the current model in one component ONLY.
That is, we define the set of all possible models that we can“move to” as the current model together with all models thatadd a single or delete a single explanatory variable from thecurrent model under consideration.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
With the sampling defined in this way, we can apply the standardM − H formula governing the probability of movement fromM(s−1) to M∗ in the above set:
(Why is this the correct acceptance probability?) Also note thatp(y |γ(s−1)) and p(y |γ∗) are our marginal likelihoods, that can becalculated analytically with our g − prior .
In the common case where equal prior weight is allocated to eachmodel, p (γ∗) = p
(γ(s−1)
)and these terms cancel out of the
above ratio.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Given the simulated output from this procedure, a variety ofquantities of interest can be calculated including:
The models most supported by the data, determined as thosemodels that are visited most frequently when applying oursimulator.
The posterior probability of a particular model, determined asthe fraction of times that model is visited in the sampler.
Bayes factors comparing models M1 and M2, determined asthe ratio of the number of times our model visits M1 relativeto M2.
Model-averaged posterior distributions of a function ofinterest, g(θ) where:
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
In the foregoing example, we did not calculate the ML’s for eachmodel, but instead, we employed the MC 3 method.
Specifically, we obtained 1, 100, 000 draws and discarded the first100, 000 as burn-in replications.
The column “BMA Post. Prob.” was calculated as the proportionof models drawn by the MC3 algorithm containing thecorresponding explanatory variable.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
To illustrate the use of the MC 3 method, we consider the followinggenerated data experiment.
We first generate 10 potential explanatory variables as follows:
xiiid∼ N(010, .7I10 + .3ι10ι
′10).
We then generate y using
yi = 3 + 4.5x2i − 6.2x4i + x6i − 5.4x9i + εi ,
whereεi
iid∼ N(0, 1).
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
In our MC 3 method, we allow all variables (including an intercept)to be excluded/included in our model.
We start the sampler away from the “true” model of the datagenerating process by choosing:
γ = [0 1 1 1 1 0 0 1 1 1 1]′
The sampler is rung for 10, 000 iterations, discarding the first1, 000 as the burn-in period.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
In terms of the “true” in/out classification of each variable, wecould write
γtrue = [1 0 1 0 1 0 1 0 0 1 0]′
The posterior mean associated with γ is found to be:
E (γ|y) = [1.00 .074 1.00 .056 1.00 .051 1.00 .03 .04 1.00 .05].
So, when the variable is to be included, we clearly get it right.
However, at a particular iteration, we may choose to keep an“inappropriate” variable in the model. Then, we typically must waitfor our next chance to exclude that variable from the specification.
Justin L. Tobias Bayesian Model Averaging
Motivation Basic g-prior Results with BMA Markov Chain Monte Carlo Model Composition: MC3
Further Reading
Fernandez,C., E. Ley and M. Steel (2001).
“Benchmark Priors for Bayesian Model Averaging”
Journal of Econometrics 100(2), 381-427.
Madigan, D. and York, J. (1995).
“Bayesian Graphical Models for Discrete Data”
International Statistical Review 63, 215-232.
Raftery, A.E. (1997).
“Approximate Bayes Factors and Accounting for Model Uncertainty in Generalized Linear Models”
Biometrika 83, 251-266.
Raftery, A.E., Madigan, D. and J. A. Hoeting (1997).
“Bayesian Model Averaging for Linear Regression Models”
JASA 179-191.
Zellner, A. (1986).
“On assessing prior distributions and Bayesian regression analysis with g -prior distributions]”in Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, North-Holland.
Justin L. Tobias Bayesian Model Averaging