Existence and uniqueness of the maximum likelihood estimator for the two-parameter negative binomial...

5
Statistics & Probability Letters 15 (1992) 375-379 North-Holland 8 December 1992 Existence and uniqueness of the maximum likelihood estimator for the two-parameter negative binomial distribution Jorge Arag6n *, David Eberly * * and Shelly Eberly * * * DiGsion of Mathematics, Computer Science, and Statistics, University of Texas, San Antonio, TX, USA Received September 1991 Revised February 1992 Abstract: Given a sample with mean E and second moment s2, Anscombe in 1950 conjectured that the maximum likelihood equations for the two-parameter negative binomial distribution have a unique solution if and only if s* > E. We give a proof of his conjecture. Keywords: Maximum likelihood estimator; negative binomial distribution; Newton’s method 1. Introduction The negative binomial distribution has been ap- plied widely in Biology, Psychology, Communica- tions, Insurance, Economics, Medicine, Military, etc. The following parametrization is used here: f(x)=(X~kTl)p*(l-p)X, x=0,1,2 ).... We treat, k as a continuous parameter with k E (0, m) and refer to the distribution as NB(k, p). Correspondence to: David Eberly, Computer Science Depart- ment, CB 3175, Sitterson Hall, University of North Carolina, Chapel Hill, NC 27599-3175, USA. e-mail: [email protected]. * Research partially supported by NIH grant AI-07358. Current address: Biostatistics Department, Harvard School of Public Health, Boston, MA 02115, USA. ** Research partially supported by NSF Grant DMS- 9003037. * ** Research supported by a NASA/Texas Space Grant Consortium Fellowship. Current address: Statistics Unit, Cornell University, Ithaca, NY 14853, USA. As a result of the frequent application of the negative binomial distribution, an increasing number of papers on estimation have appeared in the literature (Fisher, 1941; Haldane, 1941; Wise, 1946; Anscombe 1949 and 1950; Bliss and Fisher, 1953; Bliss and Gwen, 1958; Shah, 1961; Katti and Gurland, 1962; O’Carroll, 1962. Shenton and Wallington, 1962; Shenton, 1963; Martin and Katti, 1965; Shenton and Myers, 1965; Johnson and Kotz, 1969; Pahl, 1969; Shenton and Bow- man, 1967; Pieters, Gates, Matis and Sterling, 1977; Nedelman, 1983; Bowman, 1984; Willson, Folks and Young, 1986; Ross and Preece, 1985; Binet, 1986; Kemp and Kemp, 1987; Binns and Bostanian, 1988; Lam, Shenton and Bowman, 1988; and Piegorsch, 1990; among others). A sur- vey of the articles on the topic can be found in Clark and Perry (1989). Forasample xi,..., x,, Anscombe (1950) con- jectured that the maximum likelihood estimator exists and is unique when the second sample moment s2 = Cy==,x:/n -X2 is greater than the sample mean Z = Cy==, xi/n and that no maxi- 0167-7152/92/$05.00 0 1992 - Elsevier Science Publishers B.V. All rights reserved 375

Transcript of Existence and uniqueness of the maximum likelihood estimator for the two-parameter negative binomial...

Statistics & Probability Letters 15 (1992) 375-379

North-Holland

8 December 1992

Existence and uniqueness of the maximum likelihood estimator for the two-parameter negative binomial distribution

Jorge Arag6n *, David Eberly * * and Shelly Eberly * * * DiGsion of Mathematics, Computer Science, and Statistics, University of Texas, San Antonio, TX, USA

Received September 1991

Revised February 1992

Abstract: Given a sample with mean E and second moment s2, Anscombe in 1950 conjectured that the maximum likelihood

equations for the two-parameter negative binomial distribution have a unique solution if and only if s* > E. We give a proof of his

conjecture.

Keywords: Maximum likelihood estimator; negative binomial distribution; Newton’s method

1. Introduction

The negative binomial distribution has been ap- plied widely in Biology, Psychology, Communica- tions, Insurance, Economics, Medicine, Military, etc. The following parametrization is used here:

f(x)=(X~kTl)p*(l-p)X, x=0,1,2 )....

We treat, k as a continuous parameter with k E

(0, m) and refer to the distribution as NB(k, p).

Correspondence to: David Eberly, Computer Science Depart-

ment, CB 3175, Sitterson Hall, University of North Carolina,

Chapel Hill, NC 27599-3175, USA. e-mail: [email protected].

* Research partially supported by NIH grant AI-07358.

Current address: Biostatistics Department, Harvard School of Public Health, Boston, MA 02115, USA.

** Research partially supported by NSF Grant DMS- 9003037.

* ** Research supported by a NASA/Texas Space Grant Consortium Fellowship. Current address: Statistics Unit,

Cornell University, Ithaca, NY 14853, USA.

As a result of the frequent application of the negative binomial distribution, an increasing number of papers on estimation have appeared in the literature (Fisher, 1941; Haldane, 1941; Wise, 1946; Anscombe 1949 and 1950; Bliss and Fisher, 1953; Bliss and Gwen, 1958; Shah, 1961; Katti and Gurland, 1962; O’Carroll, 1962. Shenton and Wallington, 1962; Shenton, 1963; Martin and Katti, 1965; Shenton and Myers, 1965; Johnson and Kotz, 1969; Pahl, 1969; Shenton and Bow- man, 1967; Pieters, Gates, Matis and Sterling, 1977; Nedelman, 1983; Bowman, 1984; Willson, Folks and Young, 1986; Ross and Preece, 1985; Binet, 1986; Kemp and Kemp, 1987; Binns and Bostanian, 1988; Lam, Shenton and Bowman, 1988; and Piegorsch, 1990; among others). A sur- vey of the articles on the topic can be found in Clark and Perry (1989).

Forasample xi,..., x,, Anscombe (1950) con- jectured that the maximum likelihood estimator exists and is unique when the second sample moment s2 = Cy==, x:/n -X2 is greater than the sample mean Z = Cy==, xi/n and that no maxi-

0167-7152/92/$05.00 0 1992 - Elsevier Science Publishers B.V. All rights reserved 375

Volume 15. Number 5 STATISTICS & PROBABILITY LETTERS 8 December 1992

mum likelihood estimator exists when s2 <X. Proofs for existence of the maximum likelihood estimator when s2 > X were given by Johnson and Kotz (19691, although the book contains some misprints, and by Willson, Folks and Young (1986). Several issues pertaining to the negative binomial distribution have been addressed in the literature, such as what to do when s2 <Z, how to treat estimates of k less than one, and compari- son of different estimation methods for small samples. However, the questions of uniqueness when s2 > X and of existence when s2 <X have gone unanswered. We answer these remaining questions by proving the following:

Theorem. Let xi, i = 1,. . . , n be a random sample from NB(k, p). The maximum likelihood estima- tor of (k, p) exists if and only if s2 > X. Moreover, if the maximum likelihood estimator exists, then it must be unique.

The nonexistence of the maximum likelihood estimator when s2 <X fits in well with the nonex- istence of the method-of-moments estimator and the fact that (T* > p for the negative binomial distribution.

2. Formulation of the problem

Let

M=max xi,

so 0 6x, GM. If f, is the proportion of the sample values equal to j, then

X = f jfj j=l

and

s*= Ej'f,- Ejfi .

j=l i 1

2

j=l

We wish to compute the maximum likelihood estimators (ff, fi> for the sample. The maximum

376

likelihood estimator for p is given by fi = f/c_? + k> where k is a solution to

g(k) = 5 4 j=l k+j-1 (I)

and where F, = CEj fi is the proportion of the sample values greater than or equal to j.

The approach we took to finding the maximum likelihood estimator was motivated by Eberly (1991). In this thesis, data sets were generated from a length-biased truncated negative binomial distribution 1 + NB(k + 1, p> where k > 0. To construct a maximum likelihood estimator, we used equation (1) with k replaced by k + 1. In applying Newton’s method to (11, we had prob- lems with the zero at infinity for g(k). With an inappropriate initial guess, the iterates tended towards infinity. To avoid this problem, we de- fined z = l/k and G(z) = g(k), so

G(z)= ? . ZF,

j=l (J - 1)” + 1 -lo&l +xz),

2 E (0, w). (2) This reparametrization has been used by Lam, Shenton and Bowman (1988) and Clark and Perry (1983). For k > 1 we need only consider (2) for z E (0, 11. One has much more control on the behavior of Newton’s method on this finite inter- val. We observed that the graphs of G were of two types, see Figure 1. The function G has the properties G(O) = G’(O) = 0, G(z)/z + F, > 0 as z -+ co (so G(z) must be positive for z large), and the convexity/concavity at z = 0 determines the shape of the graph. Computing G”(O) and using the definition for Fj, we have

G”(O)=x’-2E(j-l)F,=x--s2. j=l

If G”(0) < 0, then eventually the graph of G must intersect the z-axis, thereby providing a solution i to G(z) = 0 and a maximum likeli- hood estimator k = l/i. This reproduces the re- sults in Johnson and Kotz (1969) and Willson, Folks and Young (19861, which show the exis- tence of at least one solution when s2 > 2. We now give a more detailed analysis to show uniqueness of the zero when G”(O) < 0 and the nonexistence of zeros when G”(0) 2 0.

Volume 15, Number 5 STATISTICS & PROBABILITY LETTERS 8 December 1992

no roots Fig. 1. Graphs for G(z).

3. The main results

We analyze the problem in three cases.

Case 1. Let M= 1; then

G(z) =F,z-log(l+F1z),

G’(z)=F1-F,/(l+F,z)

and

G”(z) = [F,/(l + F,z)]‘.

Since G”(Z) > 0 and G(O) = G’(O) = 0, the graph of G never intersects the positive z-axis. Thus, there is no root z > 0. Note that xi E (0, 1) im- plies s* =X -X2 <X.

Case 2. Let M = 2. The essential ideas are illustrated by this case. Our approach requires the change of variables u = F, and X = F, + F2.

Note that 1 2 F, > F2 > 0 implies

We will show that G(z; U, X> = 0 implicitly de- fines a unique function z = [(u, XI on some max- imal set of values E = Domain@). By doing so we will have shown that for each (u, XI E E there is a unique z = l(u, XI such that G(J(u, 2); U, XI = 0. The MLE is then given by k = l/l(u, X). The natural approach in constructing z = t(u, X> is to find a set E c D such that for each (u, X) E E there is a unique z corresponding to it. We use a

one root

slightly modified approach by selecting z first, constructing all pairs (u, X> corresponding to it, and showing that each such pair (u, X) cannot be

mapped to any other z. The equation G(z; U, 2) = 0 can be solved to

obtain

U =u(z, 5)

(z + 1) log(1 +xz> -xz = z2 > z>o. (3)

Taking the limit as z + Of yields ~(0, X) =X - $” which is equivalent to G”(0; F,, F2) = 0.

We show that the subregion

E=Dn{(w, X): w<u(O, i)}

is the domain for 5. Figure 2 illustrates the sets D and E, and the graphs of ~(0, XI and ~(1, X) in D. For each value of z > 0 the graph of (3) in D represents all pairs (u, X> for which G has a zero at z. We prove that E is a disjoint union of these graphs. A consequence is that G has a zero if and only if F, < 40, X), or equivalently, if and only if s* > X. Moreover, since the graphs are disjoint in D, a given pair (u, XI has exactly one graph containing it and the corresponding zero z for G is unique.

We now prove that E is a disjoint union of graphs. Differentiate u to obtain

u,,= -(1-x)x/(l+Zz)2<o

Volume 15, Number 5 STATISTICS & PROBABILITY LETTERS 8 December 1992

1

I 21

1

Fig. 2. Region of existence.

for E E (0, 1). Integrate in z from zr to z2 obtain up(z2, nC) - up(zl, i:) < 0 where the strict in- equality holds since the integral of a continuous negative function is negative. Integrate in X from 0 to X and use u(z, 0) = 0 to obtain u(z2, X> < u(zr, Z). Therefore, the graphs are ordered as claimed. As z + w, the graphs of u(z, X) in D

approach the graph of u(w, 2) = 0 in D, which is the single point (0, 0), and so E is a disjoint union of the graphs.

Case 3. Let M 2 3 and make the change of variables u = F,, 2 = C/“= 1 Fj. Denote

F’= (F2,...,FM_1).

The conditions 1 z F, a * * * 2 FM > 0 imply that

(u, X, F’, ED

= ((

u, X, F): l>u>F,> ...

>F,_,>X-(u+F,+ a.* +FM-I)).

The equation G(z; u, F, F’> = 0 can be solved to obtain

u = u(z, x, F) [(M- l)z+ l] log(1 +x2> -xz

= (M- l)z2

- &FZ1 (;:;;:l (4)

for z > 0. Taking the limit as z + Of yields

u(0, x, F) =, - x2

2(M- 1)

- $-&y&4-I)I;( J=2

which is equivalent to G”(0; u, X, F’> = 0. As in Case_2, for each z > 0 the graph of (4) in

the (u,+ 2, F) domain represents all triples (u, ?, F) for which G has a zero at z. We show that

is a disjoint union of these graphs. Existence of a zero for, G is guaranteed if and only if F, <

~(0, X, F), or equivalently, if and only if s2 > X. The disjointness of :he union implies that for a given triple (u, X, F) E E there is exactly one graph which passes through it, and so G has exactly one zero 2.

Differentiate u to obtain

up,= -(M-1-Z)f/[(M-1)(1+xz)]2<0

for X E (0, M - 1). Note that the graph of ~(0, Z, F’> exits D when X = M - 1. Integrate in z from zr to z2 to obtain

up z2, ( x, 2) - UX( zr, x, 3) < 0.

Finally, integrate in X from 0 to X to obtain

u(z,, Ji?, 2) -u(z,, o,$)

<*(zr, x, F) -u(z,, 0, R).

Unlike Case 2, we have two extra terms (where X = 0) which may affect the ordering of the graphs. However, restricting our attention to the domain D, when X = 0 all the parameters must be zero, u = 0 and F’= 0, so on D we have u(z,, X, F’> <

u(z1, X, F’> and the graphs are ordered as claimed. As z + CQ, the graphs o,f u(z, X, F’> in D approach the graphof u(m’, X, F) = 0 in D, which is the single point 0, so E is the disjoint union of the graphs.

378

Volume 15, Number 5 STATISTICS & PROBABILITY LETTERS 8 December 1992

References

Anscombe, F.J. (1949), The statistical analysis of insect counts

based on the negative binomial distribution, Biometrics 5,

165-173.

Anscombe, F.J. (19501, Sampling theory of the negative bino-

mial and logarithmic series distributions, Biometrika 37,

358-382.

Binet, F.E. (1986), Fitting the negative binomial distribution,

Biomefrics 42, 989-992.

Binns, M.R. and N.J. Bostanian (1988), Binomial and cen-

sored sampling in estimation and decision making for the

negative binomial distribution, Biometrics 44, 473-483.

Bliss, C.I. and R.A. Fisher (19531, Fitting the negative bino-

mial distribution to biological data, Biometrics 9, 176-200.

Bliss, C.I. and A.R.C. Owen (1958), Negative binomial distri-

butions with a common k, Biometriku 45, 36-58.

Bowman, K.O. (19841, Extended moment series and the pa-

rameters of the negative binomial distribution, Biometrics

40, 249-252. Clark, S.J. and J.N. Perry (19891, Estimation of the negative

binomial parameter k by maximum quasi-likelihood, Bio-

metrics 45, 309-316.

Eberly, S. (1991), Inferences from length-biased distributions,

Thesis, Univ. of Texas (San Antonio, TX).

Fisher, R.A. (1941), The negative binomial distribution, Ann.

Eugenics London 11, 182-187.

Haldane, J.B.S. (1941), The fitting of the binomial distribu-

tions, Ann. Eugenics London 11, 179-181.

Johnson, N.L. and S. Kotz (19691, Distributions in Statistics,

Vol. I: Discrete Distributions (Wiley, New York).

Katti, S.K. and J. Gurland (1962), Efficiency of certain meth-

ods of estimation for the negative binomial and the Ney-

man type A distribution, Biometrika 49, 215-226.

Kemp, A.W. and CD. Kemp (1987), A rapid and efficient

estimation for the negative binomial distribution, Biomet-

ric J. 29, 856-863.

Lam, H.K., L.R. Shenton and K.O. Bowman (19881, Some

properties of a moment estimator for the index parameter

of the negative binomial distribution, ASA Proc. of Statist.

Cornput., pp. 365-367.

Martin, D.C. and S.K. Katti (1965), Fitting of certain conta-

gious distributions to some available data by the maximum

likelihood method, Biometrics 21, 34-48.

Nedelman, J. (19831, A negative binomial model for sampling

mosquitoes in a malaria survey, Biometrics 39, 1009-1020.

O’Carroll, F.M. (1962), Fitting a negative binomial distribu-

tion to coarsely grouped data by maximum likelihood,

Appl. Statist. 11, 196-201.

Pahl (1969), On testing for goodness-of-fit of the negative

binomial distribution when expectations are small, Biomet-

rics 25, 143-151.

Piegorsch, W.W. (1990), Maximum likelihood estimation for

the Pigative binomial dispersion parameter, Biometrics 46,

863-867.

Pieters, E.P., C.E. Gates, J.H. Matis and W.L. Sterling (19771,

Small-sample comparison of different estimators of nega-

tive binomial parameters, Biometrics 33, 718-723.

Ross, G.J.S. and D.A. Preece (19851, The negative binomial

distribution, The Statistician 34, 323-336.

Shah, S.M. (19611, The asymptotic variances of method of

moments estimates of the parameters of the truncated

binomial and negative binomial distributions, J. Amer.

Statist. Assoc. 56, 880-994.

Shenton, L.R. (1963), A note on bounds for the asymptotic

sampling variance of the maximum likelihood estimator of

a parameter in the negative binomial distribution, Ann.

Inst. Math. Statis. Tokyo 15, 145-151.

Shenton, L.R. and K.O. Bowman (1967), Remarks on large-

sample estimators for some discrete distributions, Techno-

metrics 9, 587-598.

Shenton, L.R. and R. Myers (1965), Comments on estimation

for the negative binomial distribution, in: G.P. Patil, ed.,

Classical and Contagious Discrete Distributions (Statist. Pub.

Society, Calcutta) pp. 241-262.

Shenton, L.R. and P.A. Wallington (1962), The bias of mo-

ment estimators with an application to the negative bino-

mial distribution, Biomettika 49, 193-204.

Willson, L.J., J.L. Folks and J.H. Young (19861, Complete

sufficiency and maximum likelihood estimation for the

two-parameter negative binomial distribution, Metrika 33,

349-362.

Wise, M.E. (19461, The use of the negative binomial distribu-

tion in an industrial sampling problem, J; Roy. Statist. Sot.

Ser. B 8, 202-211.

379