A Bayesian Approach to Some Multinomial Estimation and Pretesting Problems

7
A Bayesian Approach to Some Multinomial Estimation and Pretesting Problems Author(s): Tom Leonard Source: Journal of the American Statistical Association, Vol. 72, No. 360 (Dec., 1977), pp. 869- 874 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2286478 . Accessed: 14/06/2014 14:27 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 195.34.79.223 on Sat, 14 Jun 2014 14:27:01 PM All use subject to JSTOR Terms and Conditions

Transcript of A Bayesian Approach to Some Multinomial Estimation and Pretesting Problems

A Bayesian Approach to Some Multinomial Estimation and Pretesting ProblemsAuthor(s): Tom LeonardSource: Journal of the American Statistical Association, Vol. 72, No. 360 (Dec., 1977), pp. 869-874Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2286478 .

Accessed: 14/06/2014 14:27

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 195.34.79.223 on Sat, 14 Jun 2014 14:27:01 PMAll use subject to JSTOR Terms and Conditions

A Bayesian Approach to Some Multinomial

Estimation and Pretesting Problems

TOM LEONARD*

New Bayesian estimates are proposed for multinomial probabilities, when the prior distribution is a mixture of J)irichlet distributions. They are based upon a distributional approximation for the x2 statistic and may be contrasted with the frequentist approximations recommended by other authors. An alternative is also suggested to the x2 goodness-of-fit test, when it is employed as a preliminary test of significance. The new procedure possesses remarkably different properties; it is used to reanalyze some data from Mendel's pea breeding experiment. The results suggest that the x2 test may have no particular relevance in preliminary testing situations.

KEY WORDS: Bayesian estimation; Multinomial probabilities; Dirichlet distributions; Chi-squared statistic; Preliminary test of significance.

1. SMOOTHING MULTINOMIAL PROPORTIONS

Consider observed frequencies x1, . .., x, possessing a multinomial distribution with respective cell probabili- ties 01, . . ., As summing to one, and sample size n = E xj. Good (1965; 1967) has pioneered the important idea of estimating the 6j by smoothing the raw proportions pj = xJ/n. He utilizes all the proportions pi, ..., p, when estimating each individual Oj. Similar procedures have been recommended by several other authors, e.g., Fienberg and Holland (1973), Sutherland, Fienberg, and Holland (1974), and Stone (1974).

We denote the prior means of 01, . . ., As by 1, . . .,

respectively, and suppose that the prior distribution may be described in the following two stages:

Stage I: Given a, the parameters 01, ..., 6, possess a Dirichlet distribution with respective parameters aei, . a. ., az and joint density

7r(O I a) = { r(a)/Jr P(4ja) II 9jali-1 i i

(0 < a < ?o; 0 < Oi, ti < 1

for j = 1, . ., s; E Oj = E {j = 1) (1.1)

Stage II: The parameter a possesses density ir(a) for O < a < oo. The parameter a measures the (first-stage) degree of belief in the prior estimates tj of the 6j; it is sometimes referred to as the flattening constant. The particular choice of its second-stage density ir(a) will be discussed in the next section.

* Tom Leonard is Lecturer, Department of Statistics, University of Warwick, Coventry CV4 7AL, England. The author wishes to acknowledge the help and advice of J.K. Ord, who in particular made valuable contributions to Sections 5 and 6. Thanks are also due to a reviewer for many stimulating comments and to an Associate Editor- for indicating some useful references.

The prior specification just described is open to some criticism on the grounds that it does not yield a flexible prior covariance structure for the 6j. An alternative formulation by Leonard (1973) employs a multivariate normal prior distribution for multivariate logits. We show here, however, that with the type of prior assump- tion made by Good, the more restrictive covariance structure at least leads to some fairly tractable approxi- mations to the Bayes estimates.

Under this prior mixture of Dirichlet distributions, the exact posterior means of the 6j are well-known (e.g., Fienberg and Holland 1973, p. 685) to be given by

E(6jIx) = (1- X)pj + Xi (j = 1, .. ., s), (1.2)

where

X =E{t/(n+ a)lx} 00

f a {/(n +a)}r(alx)da ' (1.3)

with s n"p -1 n-1

7r(aIX) C 7r((a) fl I (H1a + k) ll (a + k)- j=1 k=O k=O

(O< a<oo) . (1.4)

Whenever npj = 0, the corresponding product in (1.4) should be set equal to one. The estimates in (1.2) are Bayes with respect to quadratic loss. Owing to the con- tinuous nature of the loss function and prior distribution, it follows immediately that these estimates are also ad- missible and, therefore, possess a desirable frequentist property as well as a straightforward Bayesian justification.

We see from (1.2) that the Bayes estimate of 6j shrinks the standard estimate pj a constant proportion 1 - X of the distance towards the prior mean {j. A variety of frequentist alternatives have been suggested for X. For example, Good (1965, pp. 32-33) suggests replacing X by

A* = min{(s - 1)/X2, 1}1 , (1.5) where

X2 = n , -'(pij - tj) 2 (1.6)

We will show in the next section that, as n -* oo and s and the pj remain fixed, (1.5) is always smaller than an approximation to the Bayesian quantity in (1.3), sug-

? Journal of the American Statistical Association December 1977, Volume 72, Number 360

Theory and Methods Section

869

This content downloaded from 195.34.79.223 on Sat, 14 Jun 2014 14:27:01 PMAll use subject to JSTOR Terms and Conditions

870 Journal of the American Statistical Association, December 1977

gesting that, compared with the Bayes solution, X* might shrink the pi slightly too far towards the {j in this limiting case.

Sutherland et al. (1974) base their estimates on the first and second moments of Good's sample repeat rate statistic,

Exj(x - 1)/ n(n -1)

and replace X in (1.3) by

X = &/(n + &), (1.7)

where c is the ratio unbiased estimate

a = (n2 - E x22)/{ -2(n - x1) xj j i j

+ n (n -1) Eij2 - n}.(1.8)

In the symmetric case, where all the {j are equal to s-i, this reduces to another estimate proposed by Good; various adjustments are suggested by Fienberg and Holland. The quantity in (1.8) has a tendency to fall outside the sensible range [0, 1] e.g., when each pj = xJln is close to the corresponding fj. It has been suggested (e.g., by Baranchik 1964) that X should be constrained to [0, 1] by replacing i in (1.8) by zero whenever it is negative. Our Bayesian approximation will lie in the interval [0, 1] without the assistance of informal constraints.

2. APPROXIMATIONS TO THE BAYES ESTIMATES

We employ a X2 approximation to the marginal dis- tribution conditional on a of the statistic X2 in (1.2). The marginal distribution of the xj, given a, may be obtained by combining the multinomial sampling distribution, given the Oj, with the Dirichlet first stage of the prior distribution of the Oj. The marginal expectation of pj = xj/n is then clearly the prior mean tj, and the marginal covariance of pj and pk is given by

cov(pi, pk I a) = cov (06, Ok I a)

+]E{cov(pj,pk6)Ja } , (2.1)

where the expectation on the right side should be taken with respect to the Dirichlet distribution of the Oj, given a. Using the standard results

cov(06, Ok I a) (4j6jk - tjik)/(a + 1)

and cov(pj, pk I j) = (Gjajk - 6jGk)/n , (2.2)

with 6jk denoting the Kronecker-delta function, it follows from (2.1) that

cov(pi, Pk a a) = cov(06, Ok a a) + nl-1 6jk - nr1E(01jk I a)

= COV(Oi, Ok a ) + nl 6jbjk - nl-1j k

- n-1 cov(0j, Ok Ia)

= (tibjk - Pjik) /nr , (2.3)

where

n-Tl= (a+ 1)-i + n1- -n1(a+ 1)-i,

so that hT ( +a)/(n+ a). (2.4)

The expression in (2.3) is remarkably similar to the covariance structure in (2.2), based on the multinomial sampling distribution for the 6j. Kendall and Stuart (1958, p. 355), show that the classical X2 statistic, n Z, 6- (pi - 6j)2, possesses a sampling distribution which is approximately X2, with s - 1 degrees of freedom. Their method only depends upon the specification of the first two moments of the multinomial distribution. As the multinomial-Dirichlet distribution possesses the similar covariance structure in (2.3), we may therefore use exactly the same arguments to find an approximation to the marginal distribution of X2 in (1.6). This gives the result that the quantity rX2 is approximately X2 dis- tributed with s - 1 degrees of freedom, i.e.,

TX21 |T - X,_12 (2.5)

If we ignore any information possessed by the data about r and not summarized by the statistic X2, we find that the marginal likelihood of T is approximated by

f* (T X) K T2(s 1) exp {-TX2} (n-1 < r < 1) . (2.6)

We note in passing that the value of r maximizing f* (r I x) is given by

r = (s-1)/X2 if (s-1) < X2 < n(s-1)

- n-1 if X2 > n(s-1)

= 1 if X2 <(s-1) . (2.7)

Since a/ (n + a) = (nT - 1)/(n - 1), this yields the following (non-Bayesian) alternative to the expression for X in (1.3):

= {n(s - 1) - X2}/(n - 1)X2

if (s-1) <X2 <n(s-1)

=0O if X2 >n(s-1)

= 1 if X2 < (s-1). (2.8)

The expression in (2.8) provides an adjustment to Good's suggestion in (1.5). We find it extremely interest- ing that this maximization procedure provides exactly the same answer as that suggested by Stone (1974, p. 513), using a completely different (predictive) approach. He concludes that this alternative may lead to superior frequentist properties when compared with Good's. Note, however, that Stone's estimates do not possess a straight- forward Bayesian justification.

We now approximate the posterior means in (1.2) under the particular choice lr(T) K T-1 of prior for r at the second stage of the prior model. This choice is made for technical convenience; the alternatives lr(T) < 1 or 1r(T) CX: T-2 would have the effect of increasing or de- creasing the degrees of freedom s - 1 by one in our posterior arguments; and prior of the form lr(T) cc Tr-

would lead to a simple analysis. Note that our prior dis- tribution is proper, since T is already confined to the range [n-1, 1].

This content downloaded from 195.34.79.223 on Sat, 14 Jun 2014 14:27:01 PMAll use subject to JSTOR Terms and Conditions

A Bayesian Approach to Some Multinomial Problems 871

Under this choice of prior, the posterior density of T is approximated by

7r*(-r|X) OK 7r(r)f(T IX)

cc T Vs- 3) exp {-TX'} (n-1 < T < 1) .(2.9)

The integral of the expression on the right side of (2.9) is given by

Js-3= fI i( 3) exp { -2 IX} dT

- (lA72)-( -')-'I{2(s- 1), X2}

where I(q, y) = y(q, 21y) - (q, n'-ly) , (2.10)

with

ay

y(q, 1y) = e-tq-dt (2.11) 0

denoting the incomplete Gamma function. The posterior mean of r is, therefore, approximated by

E* (T Ix) = J_11/J8-3 = X-2g(X2) , (2.12)

where

g(X2) = 2I{ 2 (s + 1), X2}/I (s - 1), X2} * (2.13)

Substitution of (2.12) for E(r x) in the expression

X = {nE(Tlx) - 1}/(n - 1) , (2.14)

and the subsequent expression for X in (1.2), provides explicit Bayesian alternatives to the frequentist estimates defined by (1.5), (1.8), and (2.8).

Unlike (1.7), (2.1) cannot fall outside the interval [0, 1], since the approximate posterior mean in (2.12) always lies on the interior of the interval [n-1, 1]. The alternatives in (1.5) and (2.8) possess the possible dis- advantage of assuming the same value for X over a range of values of X2, but our Bayesian approach also removes this difficulty. There are similarities with the estimates proposed by Leonard (1976) for the means of several normal distributions.

Consider now the limiting situation as n -o oc and s and the pj remain fixed. In this case, we see from (1.6) that X2 * ao, but A = n-X2 stays fixed. We therefore find, from (2.10) and (2.11), that the function II { (S- 1), X2} converges to the limit

00

-2 (s-1), ?o} = e-ttaL -')dt 1A

A simple integration by parts gives the result

I{~(s + 1), oo e- '

so that the ratio in (2.13) converges to

g(oo) = (s -1) + b(A) ' (2.15) where

b(A) = 2e A(2A) 2(8-l)/I { 2(s -1), Xc } .(2.16)

Therefore, in this limiting situation, our Bayesian alternative to Good's expression in (1.5) is given by

= X-2g(oo) = {(s - 1) + b(n-X2)}/X2 . (2.17)

The expression in (2.17) is always greater than that in (1.5), suggesting that Good's shrinkages of the pj towards the prior means tj will be more conservative than ours in this limiting case. However, the numerical ex- ample in Section 6 will show that Good's shrinkages may be more radical in nonlimiting situations.

3. PRELIMINARY TESTS OF SIGNIFICANCE

We now consider a rather different conceptual situa- tion, where, instead of wishing to smooth the raw pro- portions, the statistician would like to decide whether to accept or reject previQusly hypothesized values for the cell probabilities. Before observing the data, he may have a null hypothesis Ho that 01, .. ., As are, respectively, equal to 41, . . ., (,. The classical x2 goodness-of-fit test is commonly used in this context to help the statistician to decide whether or not to accept H0. If the test suggests acceptance, then the 6j are estimated by the correspond- ing tj, while rejection would lead many statisticians to estimate each 0j by the corresponding proportion pj = xj/n. This standard procedure is equivalent to estimat- ing 6j by Oj, where

Oj = pj for x E C

- j for xdIIC ' (3.1)

with the critical region C approximately defined by those x such that

X2>d 2 (3.2)

where X2 is the X2 statistic in (1.6), and d is an appropriate percentage point of the X2 distribution, with s - 1 degrees of freedom, and depends upon the statistician's choice of significance level.

A significance test of this type is often completed as a preliminary to the estimation, with the primary purpose of simplifying the estimates; it may then be referred to as a preliminary test of significance. The whole method may be referred to as pretesting, and we also use the term estimation shortcut, as the procedure helps us to decide between a simplified choice of estimates.

One difficulty of this approach is that the final esti- mates for the Oj will always depend upon the statistician's choice of significance level, which may be somewhat arbitrary. Also, the x2 test does not appear to us to possess any reasonable formal justification when judged within an estimation framework. Our primary motivation for this investigation lies in our feeling that fixed-size tests have no obvious relevance in estimation shortcut situa- tions. This aspect has been previously discussed by Leonard and Ord (1976) in proposing an alternative to the F test for the one-way ANOVA situation.

We will introduce some plausible underlying assump- tions which will lead to a sensibly formulated alternative to the x2 test as a preliminary test of significance and will

This content downloaded from 195.34.79.223 on Sat, 14 Jun 2014 14:27:01 PMAll use subject to JSTOR Terms and Conditions

872 Journal of the American Statistical Association, December 1977

help us to assess the adequacy of the X2 test in this con- text. Our new procedure will not depend upon a choice of significance level, and it will possess some remarkably different properties. It should be appropriate for situa- tions where

(a) The accurate estimation of the 6i is of major concern (by accuracy, we mean adequacy of performance with respect to an appropriate loss function; we will make the particular choice in (4.1)).

(b) The statistician wishes to restrict himself to the simple sub- class of estimates in (3.1).

(c) The Dirichlet mixture described in Section 1 is a reasonable prior distribution for the 0,.

The consequences of (b) will be investigated, since they describe a criterion so frequently adopted by significance testers as an estimation short-cut. Our results and conclusions will depend upon the assumption implicit in (3.1) that the only possible alternatiye to the null hypothesis is to estimate the Oj by the corresponding raw proportions. However, we feel justified in making this assumption for comparison purposes with the classical procedure previously described.

4. AN ALTERNATIVE TO THE X2 TEST

There may be some situations involving the X2 test where the statistician's primary objective is to decide whether or not the null hypothesis Ho is true. It may be costly to surrender his working hypothesis, in which case a high cost might be formally assigned to the Type I error (e.g., Lindley (1953), p. 59; and Dickey (1967) in related contexts). In pretesting situations, however, it may be more reasonable to assume that the loss function does not discriminate against Type I errors and to simply judge the accuracy of the final estimates for the Oj by their performance with respect to the quadratic loss function

L(61 ) E j ( - oj)2 (4.1) i

This criterion will help us to choose a critical region, C, leading to estimates of the form defined by (3.1), which do not depend upon a choice of significance level. The function L (6, 0) = E 0j-< (oj - Oj)2 is another possibility which would lead to qualitatively similar results.

Suppose that 01, ..., Es possess the two-stage prior distribution described in Section 1. This permits us to employ the hypothesized values 4, ..., is as prior estimates of the corresponding 01, ..., 6s. Attention is restricted here to situations where the hypothesized values are completely specified.

Good (1967, 1976) and Good and Crook (1974) employ the maximum of the marginal likelihood contribution to (1.4) and seek approximations to its sampling distribu- tion. Our alternative will be somewhat simpler; it is, of course, only recommended for pretesting situations.

Consider the Bayes estimates for the Gi when attention is restricted to the special class of estimates defined by (3.2), but where C is now, instead, a region which will be optimally obtained by maximizing the posterior expecta-

tion of the loss function in (4.1). Whenever x E C, so that Aj - pj, the posterior expected loss is equal to

{pj - E(j1Ix) }2 + E var(j1Ix) I I

= X2 E (pj - .j)2 + E var(O I x), (4.2)

j j

where X is given in (1.3) and var(oj x) is the posterior variance of 6j. Similarly, whenever x X C, so that -j = (jl the posterior expected loss is equal to

(1 - X)2 E (pj - )2 + E var(Oj Ix) . (4.3)

Comparing (4.2) and (4.3), we see that the Bayes pro- cedure, under the restriction in (1.1), is defined by taking x E C whenever \2 < (1 - X)2. The optimal critical region C, therefore, consists of those x satisfying

=\(X) < 2 (4.4)

We have proposed here a remarkably simple exact alternative to the x2 goodness-of-fit test, which does not, for example, depend upon a choice of significance level. The corresponding estimates for the 6j in (3.1) are re- stricted Bayes and also admissible among this restricted class.

It is straightforward, though slightly tedious, to numerically compute X(x) using the integration defined in (1.3). In order to compare our method with the X2 procedure, however, it is useful to consider our approxi- mations. Under the approximation in (2.12), we find, from (2.14) and (4.4), that C is approximated by the region such that X2 in (1.6) satisfies

X-2g(X2) < '(1 + n-') (4.5)

with g (X2) defined in (2.13). This region is generally completely different from that suggested by a X2 test. Consider, for example, the limiting case as n -> oo while s and the pj remained fixed. The contribution g (X2) to (4.5) now converges to the limit in (2.15), with A = n-'X2. In this limiting case, therefore, we would accept our null hypothesis Ho whenever

X2 < 2(s - 1) + b(n-'X2) (4.6)

where b(A) is the nonnegative quantity defined in (2.17). The null hypothesis Ho should, therefore, certainly be

accepted whenever X2 < 2 (s - 1). This surprising result suggests an entirely different procedure from that involv- ing the x2 test (it of course depends upon our assumption that the only possible alternative to Ho is to estimate the oj by the corresponding pj). Under Ho, classical theory takes X2 to possess a X2 sampling distribution with mean s - 1 and variance 2 (s - 1). We are therefore recom- mending that Ho be accepted in situations where X2 exceeds the mean by up to (s - 1)' standard deviations. As s gets large, it follows that we would accept Ho, even though it would be rejected b;y classical hypothesis testers under any sensible significance level. The numerical ex- ample in Section 6 will suggest that this result is only true

This content downloaded from 195.34.79.223 on Sat, 14 Jun 2014 14:27:01 PMAll use subject to JSTOR Terms and Conditions

A Bayesian Approach to Some Multinomial Problems 873

when x1 ..., x, are very large. However, the striking difference in this limiting situation adds evidence to the claim made bv Leonard and Ord (1976) that fixed-size significance tests have no obvious relevance in the classi- cal pretesting situations.

5. A FREQUENTIST APPROACH

Brown and Muentz (1976) have recently considered preliminary testing for two-way contingency tables, and their independent conclusions are similar in spirit to those described at the end of the previous section. Our Bayesian results suggest to us that their frequentist alternatives will probably be fairly plausible in our limit- ing situation, but we obtain substantial differences in both this and nonlimiting situations, e.g., in the numerical example of Section 6.

We now describe a frequentist approach which serves to illustrate why the spirit of our results is not dependent upon Bayesianism.

Under the loss function in (4.1), the risk function for the estimates pi, ..., p, of 01, ..., 0,, respectively, is given by

E{L(p, 0) I O} = El E (pj - 0j)2 I o} j

= n-'Qi() , (5.1) where

Q1(O) = Ea(1- 0j) j

This should be compared with the risk function

Q2(O) = E{L(t, 0) Io} = E (6j - {j)2 (5.2) i

of the hypothesized values 01, ..., 4,. In the idealized situation where Qi(0) and Q2(0) are known, this suggests that we should reject our null hypothesis Ho whenever

n-IQI() - Q2(O) < 0 * (5.3)

When Qi (8) and Q2 (0) are unknown, it might be reasonable to replace the quantity in the left side of (5.3) by its unbiased estimate

2 n-i (1-E pj2) (pj - j)2

leading to the rejection of Ho whenever

(n - 1) , (pj - tj)2 > 2(1 - Epj2) . (5.4) J I

The intuitive rule in (5.4) is similar in spirit to the more sophisticated alternatives of Section 4, and it is not dependent upon a choice of prior distribution. For ex- ample, in the special case where none of the 4j and pj differ too strongly from s-1, the left and right sides of (6.4) could be, respectively, approximated by

sn- 1n-)X2 and 2(1 - s'), where x2 is given in (1.6). This suggests that the rule X2 > 2n(s -l)/(n -1) might serve as an approximation.

6. NUMERICAL EXAMPLE-MENDEL'S PEA BREEDING EXPERIMENT

We now use a numerical example to demonstrate that, in nonlimiting situations, a small value for the observed X2 statistic X2 does not necessarily lead to the acceptance of the null hypothesis Ho under our own formulation.

We reanalyze Mendel's data on plant-hybridization described, e.g., in Bishop, Fienberg, and Holland (1975, p. 328). In experiments on pea breeding, Mendel obtained the following frequencies of seeds: 315 round and yellow (RY), 101 wrinkled and yellow (wy), 108 round and green (RG), and 32 wrinkled and green (WG). Genetical theory predicts that the frequencies should be in the proportions 9: 3: 3: 1. The value of X2 0.47 for the X2 statistic suggests that, given the adequacy of the sampling model, the theoretical hypothesis should be accepted at any sensible significance level.

As discussed by Bishop et al. (1975) and Fisher (1936), the multinomial sampling model is generally accepted as correct in this situation, but there is substantial, though controversial, evidence to suggest that most of Mendel's experiments were falsified so as to agree closely with Mendel's expectations. We have no wish to enter into the general controversy but simply wish to make the point that, in situations like this, with X2 smaller than the degrees of freedom, the raw proportions could still give better estimates than the hypothesized values, irrespec- tive of the results of a fixed size X2 test.

The quantity in (2.12) was employed in order to cal- culate our approximation to the value of X in (2.14). Our approximate value X = 0.5831 compares with the exact value X = 0.5842, calculated from (1.3). This suggests that our approximations are adequate in this case. Since

2 > 2, our analysis 'suggests that the null hypothesis Ho should be accepted, but the evidence is not nearly as con- clusive as we might have expected. With such a low value for X2, the classical statistician might have expected X to lie much closer to one.

We used (2.12) and (2.14) to compute approximations to X for different values of X2, but still with s - = 3 and n = 556. The results are summarized in the following tabulation.

X2

0.6 1.0 1.4 1.8 2.2 2.6 3.0 3.4 3.8

X 0.579 0.564 0.550 0.534 0.521 0.507 0.492 0.478 0.464

If, for example, X2 were instead equal to its sampling expectation s - 1 = 3, we would reject Ho, since X < 2,

although classical hypothesis testing theory would typi- cally heavily accept Ho. This result is opposite in spirit to that obtained in the limiting situation described at the end of Section 4. The corresponding 95 percent point of the X2 distribution is 7.81; we have replaced this by a value of about 2.8, not depending upon a choice of significance level.

Instead of restricting himself to a choice between two alternative sets of estimates for the Gj, the statistician

This content downloaded from 195.34.79.223 on Sat, 14 Jun 2014 14:27:01 PMAll use subject to JSTOR Terms and Conditions

874 Journal of the American Statistical Association, December 1977

may prefer to smooth the raw proportions via the weighted averages in (1.2), i.e., he might wish to place more emphasis on accurate estimation of the %j instead of simplifying the estimates via (3.1). The raw proportions, theoretical values, and smoothed estimates are given in the table.-

Observed, Theoretical, and Smoothed Proportions for Four Types of Pea

Type of pea

RY WY RG WG

Observed 0.5666 0.1817 0.1942 0.0576 Theoretical 0.5625 0.1875 0.1875 0.0625 Smoothed 0.5642 0.1851 0.1903 0.0604

The smoothed estimates compromise between the ob- served proportions and theoretical prior values in a meaningful way. Our exact and approximate Bayes esti- mates were virtually identical to the accuracy given. The estimates of Good and Sutherland et al., defined by (1.5) and (1.7), respectively, but with i- in (1.8) constrained to be nonnegative, were both identical to the prior values. This shows that the frequentist estimates shrink more radically, rather than more conservatively, in this non- limiting situation.

[Received August 1975. Revised June 1977.]

REFERENCES

Baranchik, A.J. (1964), "Multiple Regression and Estimation of the Mean of a Multi-Variant Normal Distribution," Technical Report

No. 51, Department of Statistics, Stanford UJniversity, Stanford, California.

Bishop, Yvonne M.M., Fienberg, Stephen E., and Holland, Paul W. (1975), Discrete Multivariate Analysis, Cambridge, Mass.: MIT Press.

Brown, Charles C., and Muentz, Larry R. (1976), "Reduced Mean Square Error Estimation in Contingency Tables," Journal of the Americal Statistical Association, 71, 176-182.

Dickey, James M. (1967), "A Bayesian Hypothesis-Decision Pro- cedure," Annals of Mathematical Statistics, 19, 367-369.

Fienberg, Stephen E., and Holland, Paul W. (1973), "Simultaneous Estimation of Multinomial Cell Probabilities," Journal of the American Statistical Association, 68, 683-689.

Fisher, Ronald A. (1936) "Has Mendel's Work Been Rediscovered?" Annals of Science, 1, 115-137.

Good, Irving J. (1965), The Estimation of Probabilities, Cambridge, Mass.: MIT Press.

(1967), "A Bayesian Significance Test for Multinomial Dis- tributions (with Discussion)," Journal of the Royal Statistical Society, Ser. B., 29, 399-431.

, and Crook, James F. (1974), "The Bayes/Non-Bayes Com- promise and the Multinomial Distribution," Journal of the Ameri- can Statistical Association, 69, 711-720.

(1976), "On the Application of Symmetric Dirichlet Distribu- tions and Their Mixtures to Contingency Tables," Annals of Statistics, 4, 1159-1189.

Kendall, Maurice G., and Stuart, Alan (1958), The Advanced Theory of Statistics, Vol. 1: Distribution Theory, London: Charles W. Griffin & Co.

Leonard, Tom (1973), "A Bayesian Method for Histograms," Biometrika, 60, 297-308.

(1976), "Some Alternative Approaches to Multi-Parameter Estimation," Biometrika, 63, 69-75.

, and Ord, John K. (1976), "Investigation of the F test Pro- cedure as an Estimation Short-Cut," Journal of the Royal Statistical Society, Ser. B, 38, 95-98.

Lindley, Dennis V. (1953), "Statistical Inference (with Discussion)," Journal of the Royal Statistical Society, Ser. B, 15, 30-40.

Stone, Mervyn (1974), "Cross-Validation and Multinomial Predic- tion," Biometrika, 61, 509-515.

Sutherland, Michael, Fienberg, Stephen E., and Holland, Paul W. (1974), "Combining Bayes and Frequency Approaches to Estimate a Multinomial Parameter," Studies in Bayesian Econometrics and Statistics, eds. Stephen E. Fienberg and Arnold Zellner, Amster- dam: North Holland Publishing Co.

This content downloaded from 195.34.79.223 on Sat, 14 Jun 2014 14:27:01 PMAll use subject to JSTOR Terms and Conditions