Parametric Empirical Bayes Inference: Theory and Applications: Comment

3
Parametric Empirical Bayes Inference: Theory and Applications: Comment Author(s): Tom Leonard Source: Journal of the American Statistical Association, Vol. 78, No. 381 (Mar., 1983), pp. 59- 60 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2287102 . Accessed: 14/06/2014 15:31 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 188.72.126.25 on Sat, 14 Jun 2014 15:31:59 PM All use subject to JSTOR Terms and Conditions

Transcript of Parametric Empirical Bayes Inference: Theory and Applications: Comment

Parametric Empirical Bayes Inference: Theory and Applications: CommentAuthor(s): Tom LeonardSource: Journal of the American Statistical Association, Vol. 78, No. 381 (Mar., 1983), pp. 59-60Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2287102 .

Accessed: 14/06/2014 15:31

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 188.72.126.25 on Sat, 14 Jun 2014 15:31:59 PMAll use subject to JSTOR Terms and Conditions

Leonard: Comment on Parametric Empirical Bayes Inference 59

{XI, i. , Otk}.) In both approaches one must choose r. It happens that, in Morris's method, the unbiased estimate of risk (= sum of expected squared errors of prediction) is essentially equivalent to the usual cross-validation as- sessment of {zi'VLs}, so that choosing r to minimize un- biased estimate of risk has the optimality described by Shibata (1981) when k and r = o(k) are both large. In the second approach, where one must also select q, the same choice of r is probably appropriate. Incidentally, I suspect that simple quartic regression will give better answers for the Cobb data than Morris's estimates.

Morris's attempt to obtain confidence intervals based on PEB estimates is interesting. The strict Bayesians will object because frequentist concepts and calculations are too prominent. This dislike has a lot to do with the pre- ceived misapplication of probability in the Neyman-Wald theory of confidence intervals. I sympathize with the Bayesians on this. However, I would contend that ap- propriate frequentist calculations can be made. The key idea is to condition on the ancillary indicator of that rec- ognizable subset to which the sample belongs. To illus- trate, consider the elementary problem

Yi=Oi? 1, Oj= + 1,

where + 1 and - 1 are equally likely in both cases and t is known. Then, for convenience taking Ti = (Yi + t)/2, and noting that Yi - t is ancillary with respect to Oi, we find the relevant probabilities are

Pr(Ti -Oi = d I - = a, O)

= l(a = ?2, d = 0), = 2(a = 0, d = ?1).

These agree with the Bayes posterior distribution. The problem is complicated when t is unknown and estimated by Y, in which case Yi - Y is only approximately ancil- lary. One would hope that a more sophisticated appli- cation of these ideas could be used in Morris's confidence interval problem, for example using studentized quan-

tities (Yi - Y)IS"12 as approximate ancillaries in the sim- ple problem without covariates.

Incidentally, I object to equation (3.6) in the article because I do not find it useful to attach the probability .9, say, to an interval that covers with probability 1. The interval is only useful if we can state what the probability is, with reasonable accuracy. It is partly for this reason that I find intervals such as those proposed by Hwang and Casella (1982) to be rather unsatisfactory for practical purposes.

Finally, I should like to mention the interesting problem of distinction between pre-chosen and data-chosen con- trasts among parameters, a topic briefly studied in Sec- tion 8 of Efron (1982). Efron suggests, rightly, that the usual PEB estimates should not be used for any contrast where there is a priori evidence of its being large. This corresponds to Morris's initial subtraction of the large regression effect z',B in Section 1. It is, however, not completely clear to me why the usual PEB estimates are directly relevant to data-chosen contrasts. Perhaps Mor- ris could comment on this point.

REFERENCES EFRON, B. (1982), "Maximum Likelihood and Decision Theory," An-

nals of Statistics, 10, 340-356. EFRON, B., and MORRIS, C. (1975), "Data Analysis Using Stein's

Estimator and its Generalizations," Journal of the American Statis- tical Association, 70, 311-319.

GHOSH, M., DWANG, J.T., and TSUI, K. (1983), "Construction of Improved Estimators in Multiparameter Estimation for Exponential Families," Annals of Statistics, 11 (to appear).

GINI, C. (1911), "Considerazioni sulla probabilita a posteriori e appli- cazioni al rapporto dei sessi melle nascite unane," Reprinted in Me- tron, 15, 133-172.

HUDSON, H.M. (1981), "Adaptive Estimation for Simultaneous Es- timation of Poisson Means," Unpublished report.

HWANG, J.T., and CASELLA, G. (1982), "Minimax Confidence sets for the Mean of a Multivariate Normal Distribution," Annals of Sta- tistics, 10, 868-881.

SHIBATA, R. (1981), "An Optimal Selection of Regression Variables," Biometrika, 68, 45-54.

Comment TOM LEONARD*

I have enjoyed reading Morris's paper, and agree with his type of conceptual thinking, relating theory and prac- tice in just the way it should.

I don't see any substantial controversy between the parametric empirical Bayes and hierarchical Bayes ap- proaches to Bayes-Stein estimation. These alternatives

* Tom Leonard is Associate Professor, Department of Statistics and Mathematics Research Center, University of Wisconsin, Madison, WI 53706.

simply provide different techniques for estimating the hy- perparameters from the data. In practice, any hyperprior for the hyperparameters should be largely uninformative, since it is difficult to specify prior information at the low- est level of the hierarchy. Hierarchical Bayes then gives precise estimators that relate to specific loss functions. Empirical Bayes is usually more easy to compute and, if k is at least 10, will provide close approximations to the hierarchical Bayes estimators. Empirical Bayes is partic- ularly useful in more complicated situations involving,

This content downloaded from 188.72.126.25 on Sat, 14 Jun 2014 15:31:59 PMAll use subject to JSTOR Terms and Conditions

60 Journal of the American Statistical Association, March 1983

say, several multiple regressions, or several multinomi- als, where the hyperparameters are a common mean vec- tor and common covariance matrix. It is then often much easier to empirically calculate the marginal maximum likelihood estimate of all the hyperparameters. Whenever the first stage prior belongs to the exponential family, the EM algorithm yields straightforward computational pro- cedures (see, e.g., Leonard 1983a and Laird 1975).

The real controversy arises in going from parametric empirical Bayes to nonparametric empirical Bayes. Sup- pose that several normal means are taken to constitute a random sample from a common mixing distribution G. Then it is possible to estimate G, without further as- sumptions, by maximum likelihood, via the marginal dis- tribution of the observations, given G. In Leonard (1983b) I investigate the effect, of this nonparametric estimate of the mixing distribution, upon the posterior estimates of the first stage parameters. The numerical estimates can be strikingly different. For the Efron-Morris baseball ex- ample, I obtained similar shrinkages towards a common value. However, in other examples the observations be- come clustered into several groups, with shrinkages to- wards the common values within each group. Further- more, extreme observations are either viewed as outliers, and not smoothed at all, or viewed as inliers and smoothed towards the nearest cluster. In short, when a technique is used that takes account of the scatter of the data, the straightforward shrinkages of the Bayes-Stein estimators, towards a common value, can be shown to be overspecialized. Perhaps some compromise, between parametric and nonparametric procedures, would be ben- eficial in practice; a nonparametric Bayesian procedure with a prior distribution on function space for the mixing distribution would produce this. (See Leonard 1978.)

I like Morris's philosophy of using hierarchical models to express uncertainty about hypothesised regression models. I think that Bayes-Stein estimation is very gen- erally applicable to modeling situations. Consider, for ex- ample, multinomial frequencies yl,. . , Ys with corre- sponding cell probabilities 01, . . ., O and sample size n, where 0, . . .., O possess a Dirichlet prior distribution with parameters aoE, . ., aot, where 0 < a < oo, and the prior means ti, . . . , s sum to unity.

The prior parameters ti, . . . , s may then be taken to represent a hypothesized model for 01, . . . , Os of the form

V- = j U,(- ,.** s) (1) where l(z), . . ., As() are specified functions, and y is a q x 1 vector of unknown prior parameters.

Two possible hypothesized models are s

i, = exp{xj17X} Y. exp{xg9Ty} (2) g= I

representing a hypothesized log-linear model with design vectors x1, ... , x,, and

= F(rj+ i,y) -F(r, ,) (1= 1,...,s) (3)

where F(z, y) is the cumulative distribution function of a hypothesized distribution for the raw observations, cor- responding to a grouped histogram with cell boundaries 1n, * *qs+ 1-

In either case,, y may be estimated by the vector y minimizing the chi-squared statistic

n

X2(zy) = n I {pj - O('y)}2I/(y) (4) j=l

where pj = yjln. Generalizing a result described by Leonard (1977), the

marginal distribution of X2(y) is now approximately a multiple of chi-squared,

TX ('Y) - Xs-q-12, (5)

where

T - (1+ ot)(n + t) (6)

When a = oc and X = 1, the null hypothesis is true, whereas when a = 0 and X = n - the null hypothesis is refuted. A general judgment on a, and hence the null hypothesis, may be made by reference to the approximate marginal likelihood

( y ) (X,1/2(s--1)exp{. jx2(y)} (n c T_ 1) (7)

and, at this stage either an empirical or hierarchical Bayes procedure, or indeed a classical likelihood ratio test, may be employed. More generally, these procedures enable us to consider complicated hypothesized models by fo- cusing attention on either a single hyperparameter, or a set of hyperparameters that is less complicated than the null hypothesis.

REFERENCES LAIRD, N.M. (1975), "Empirical Bayes Methods for Two-Way Con-

tingency Tables," Ph.D. Dissertation, Harvard University. LEONARD, T. (1977), "A Bayesian Approach to Some Multinomial

Estimation and Pre-Testing Problems," Journal of the American Sta- tistical Association, 72, 677-691.

(1978), "Density Estimation, Stochastic Processes,, and Prior Information" (with Discussion), Journal of the Royal Statistical Soci- ety Ser. B, 40, 113-146.

(1983a), "Application of the EM Algorithm to the Estimation of Bayesian Hyperparameters," In Bayesian Studies in Econometrics and Statistics, ed. P. Goel, in honor of Bruno de Finetti, to appear.

(1983b), "Some Data-analytic Modifications to Bayes-Stein Es- timation," Ann. Inst. Stat. Math., to appear.

This content downloaded from 188.72.126.25 on Sat, 14 Jun 2014 15:31:59 PMAll use subject to JSTOR Terms and Conditions