Statistics & Probability Letters 15 (1992) 375-379 North-Holland 8 December 1992 Existence and uniqueness of the maximum likelihood estimator for the two-parameter negative binomial distribution Jorge Arag6n *, David Eberly * * and Shelly Eberly * * * DiGsion of Mathematics, Computer Science, and Statistics, University of Texas, San Antonio, TX, USA Received September 1991 Revised February 1992 Abstract: Given a sample with mean E and second moment s2, Anscombe in 1950 conjectured that the maximum likelihood equations for the two-parameter negative binomial distribution have a unique solution if and only if s* > E. We give a proof of his conjecture. Keywords: Maximum likelihood estimator; negative binomial distribution; Newton's method 1. Introduction The negative binomial distribution has been ap- plied widely in Biology, Psychology, Communica- tions, Insurance, Economics, Medicine, Military, etc. The following parametrization is used here: f(x)=(X~kTl)p*(l-p)X, x=0,1,2 ).... We treat, k as a continuous parameter with k E (0, m) and refer to the distribution as NB(k, p). As a result of the frequent application of the negative binomial distribution, an increasing number of papers on estimation have appeared in the literature (Fisher, 1941; Haldane, 1941; Wise, 1946; Anscombe 1949 and 1950; Bliss and Fisher, 1953; Bliss and Gwen, 1958; Shah, 1961; Katti and Gurland, 1962; O'Carroll, 1962. Shenton and Wallington, 1962; Shenton, 1963; Martin and Katti, 1965; Shenton and Myers, 1965; Johnson and Kotz, 1969; Pahl, 1969; Shenton and Bow- man, 1967; Pieters, Gates, Matis and Sterling, 1977; Nedelman, 1983; Bowman, 1984; Willson, Folks and Young, 1986; Ross and Preece, 1985; Binet, 1986; Kemp and Kemp, 1987; Binns and Bostanian, 1988; Lam, Shenton and Bowman, 1988; and Piegorsch, 1990; among others). A sur- vey of the articles on the topic can be found in Clark and Perry (1989). Forasample xi,..., x,, Anscombe (1950) con- jectured that the maximum likelihood estimator exists and is unique when the second sample moment s2 = Cy==, x:/n -X2 is greater than the sample mean Z = Cy==, xi/n and that no maxi-

Statistics & Probability Letters 15 (1992) 375-379


8 December 1992

Existence and uniqueness of the maximum likelihood estimator for the two-parameter negative binomial distribution

Jorge Arag6n *, David Eberly * * and Shelly Eberly * * * DiGsion of Mathematics, Computer Science, and Statistics, University of Texas, San Antonio, TX, USA

Received September 1991

Revised February 1992

Abstract: Given a sample with mean E and second moment s2, Anscombe in 1950 conjectured that the maximum likelihood

equations for the two-parameter negative binomial distribution have a unique solution if and only if s* > E. We give a proof of his


Keywords: Maximum likelihood estimator; negative binomial distribution; Newton’s method

1. Introduction

The negative binomial distribution has been ap- plied widely in Biology, Psychology, Communica- tions, Insurance, Economics, Medicine, Military, etc. The following parametrization is used here:

f(x)=(X~kTl)p*(l-p)X, x=0,1,2 )....

We treat, k as a continuous parameter with k E

(0, m) and refer to the distribution as NB(k, p).

As a result of the frequent application of the negative binomial distribution, an increasing number of papers on estimation have appeared in the literature (Fisher, 1941; Haldane, 1941; Wise, 1946; Anscombe 1949 and 1950; Bliss and Fisher, 1953; Bliss and Gwen, 1958; Shah, 1961; Katti and Gurland, 1962; O’Carroll, 1962. Shenton and Wallington, 1962; Shenton, 1963; Martin and Katti, 1965; Shenton and Myers, 1965; Johnson and Kotz, 1969; Pahl, 1969; Shenton and Bow- man, 1967; Pieters, Gates, Matis and Sterling, 1977; Nedelman, 1983; Bowman, 1984; Willson, Folks and Young, 1986; Ross and Preece, 1985; Binet, 1986; Kemp and Kemp, 1987; Binns and Bostanian, 1988; Lam, Shenton and Bowman, 1988; and Piegorsch, 1990; among others). A sur- vey of the articles on the topic can be found in Clark and Perry (1989).

Forasample xi,..., x,, Anscombe (1950) con- jectured that the maximum likelihood estimator exists and is unique when the second sample moment s2 = Cy==, x:/n -X2 is greater than the sample mean Z = Cy==, xi/n and that no maxi-

mum likelihood estimator exists when s2 <X. Proofs for existence of the maximum likelihood estimator when s2 > X were given by Johnson and Kotz (19691, although the book contains some misprints, and by Willson, Folks and Young (1986). Several issues pertaining to the negative binomial distribution have been addressed in the literature, such as what to do when s2 <Z, how to treat estimates of k less than one, and compari- son of different estimation methods for small samples. However, the questions of uniqueness when s2 > X and of existence when s2 <X have gone unanswered. We answer these remaining questions by proving the following:

Theorem. Let xi, i = 1,. . . , n be a random sample from NB(k, p). The maximum likelihood estima- tor of (k, p) exists if and only if s2 > X. Moreover, if the maximum likelihood estimator exists, then it must be unique.

The nonexistence of the maximum likelihood estimator when s2 <X fits in well with the nonex- istence of the method-of-moments estimator and the fact that (T* > p for the negative binomial distribution.

2. Formulation of the problem


M=max xi,

so 0 6x, GM. If f, is the proportion of the sample values equal to j, then

X = f jfj j=l


s*= Ej'f,- Ejfi .

j=l i 1



We wish to compute the maximum likelihood estimators (ff, fi> for the sample. The maximum


likelihood estimator for p is given by fi = f/c_? + k> where k is a solution to

g(k) = 5 4 j=l k+j-1 (I)

and where F, = CEj fi is the proportion of the sample values greater than or equal to j.

The approach we took to finding the maximum likelihood estimator was motivated by Eberly (1991). In this thesis, data sets were generated from a length-biased truncated negative binomial distribution 1 + NB(k + 1, p> where k > 0. To construct a maximum likelihood estimator, we used equation (1) with k replaced by k + 1. In applying Newton’s method to (11, we had prob- lems with the zero at infinity for g(k). With an inappropriate initial guess, the iterates tended towards infinity. To avoid this problem, we de- fined z = l/k and G(z) = g(k), so

G(z)= ? . ZF,

j=l (J - 1)” + 1 -lo&l +xz),

2 E (0, w). (2) This reparametrization has been used by Lam, Shenton and Bowman (1988) and Clark and Perry (1983). For k > 1 we need only consider (2) for z E (0, 11. One has much more control on the behavior of Newton’s method on this finite inter- val. We observed that the graphs of G were of two types, see Figure 1. The function G has the properties G(O) = G’(O) = 0, G(z)/z + F, > 0 as z -+ co (so G(z) must be positive for z large), and the convexity/concavity at z = 0 determines the shape of the graph. Computing G”(O) and using the definition for Fj, we have

G”(O)=x’-2E(j-l)F,=x--s2. j=l

If G”(0) < 0, then eventually the graph of G must intersect the z-axis, thereby providing a solution i to G(z) = 0 and a maximum likeli- hood estimator k = l/i. This reproduces the re- sults in Johnson and Kotz (1969) and Willson, Folks and Young (19861, which show the exis- tence of at least one solution when s2 > 2. We now give a more detailed analysis to show uniqueness of the zero when G”(O) < 0 and the nonexistence of zeros when G”(0) 2 0.

no roots Fig. 1. Graphs for G(z).

3. The main results

We analyze the problem in three cases.

Case 1. Let M= 1; then

G(z) =F,z-log(l+F1z),



G”(z) = [F,/(l + F,z)]‘.

Since G”(Z) > 0 and G(O) = G’(O) = 0, the graph of G never intersects the positive z-axis. Thus, there is no root z > 0. Note that xi E (0, 1) im- plies s* =X -X2 <X.

Case 2. Let M = 2. The essential ideas are illustrated by this case. Our approach requires the change of variables u = F, and X = F, + F2.

Note that 1 2 F, > F2 > 0 implies

We will show that G(z; U, X> = 0 implicitly de- fines a unique function z = [(u, XI on some max- imal set of values E = Domain@). By doing so we will have shown that for each (u, XI E E there is a unique z = l(u, XI such that G(J(u, 2); U, XI = 0. The MLE is then given by k = l/l(u, X). The natural approach in constructing z = t(u, X> is to find a set E c D such that for each (u, X) E E there is a unique z corresponding to it. We use a

one root

slightly modified approach by selecting z first, constructing all pairs (u, X> corresponding to it, and showing that each such pair (u, X) cannot be

mapped to any other z. The equation G(z; U, 2) = 0 can be solved to


U =u(z, 5)

(z + 1) log(1 +xz> -xz = z2 > z>o. (3)

Taking the limit as z + Of yields ~(0, X) =X - $” which is equivalent to G”(0; F,, F2) = 0.

We show that the subregion

E=Dn{(w, X): w<u(O, i)}

is the domain for 5. Figure 2 illustrates the sets D and E, and the graphs of ~(0, XI and ~(1, X) in D. For each value of z > 0 the graph of (3) in D represents all pairs (u, X> for which G has a zero at z. We prove that E is a disjoint union of these graphs. A consequence is that G has a zero if and only if F, < 40, X), or equivalently, if and only if s* > X. Moreover, since the graphs are disjoint in D, a given pair (u, XI has exactly one graph containing it and the corresponding zero z for G is unique.

We now prove that E is a disjoint union of graphs. Differentiate u to obtain

u,,= -(1-x)x/(l+Zz)2<o

I 21


Fig. 2. Region of existence.

for E E (0, 1). Integrate in z from zr to z2 obtain up(z2, nC) - up(zl, i:) < 0 where the strict in- equality holds since the integral of a continuous negative function is negative. Integrate in X from 0 to X and use u(z, 0) = 0 to obtain u(z2, X> < u(zr, Z). Therefore, the graphs are ordered as claimed. As z + w, the graphs of u(z, X) in D

approach the graph of u(w, 2) = 0 in D, which is the single point (0, 0), and so E is a disjoint union of the graphs.

Case 3. Let M 2 3 and make the change of variables u = F,, 2 = C/“= 1 Fj. Denote

F’= (F2,...,FM_1).

The conditions 1 z F, a * * * 2 FM > 0 imply that

(u, X, F’, ED

= ((

u, X, F): l>u>F,> ...

>F,_,>X-(u+F,+ a.* +FM-I)).

The equation G(z; u, F, F’> = 0 can be solved to obtain

u = u(z, x, F) [(M- l)z+ l] log(1 +x2> -xz

= (M- l)z2

- &FZ1 (;:;;:l (4)

for z > 0. Taking the limit as z + Of yields

u(0, x, F) =, - x2

2(M- 1)

- $-&y&4-I)I;( J=2

which is equivalent to G”(0; u, X, F’> = 0. As in Case_2, for each z > 0 the graph of (4) in

the (u,+ 2, F) domain represents all triples (u, ?, F) for which G has a zero at z. We show that

is a disjoint union of these graphs. Existence of a zero for, G is guaranteed if and only if F, <

~(0, X, F), or equivalently, if and only if s2 > X. The disjointness of :he union implies that for a given triple (u, X, F) E E there is exactly one graph which passes through it, and so G has exactly one zero 2.

Differentiate u to obtain

up,= -(M-1-Z)f/[(M-1)(1+xz)]2<0

for X E (0, M - 1). Note that the graph of ~(0, Z, F’> exits D when X = M - 1. Integrate in z from zr to z2 to obtain

up z2, ( x, 2) - UX( zr, x, 3) < 0.

Finally, integrate in X from 0 to X to obtain

u(z,, Ji?, 2) -u(z,, o,$)

<*(zr, x, F) -u(z,, 0, R).

Unlike Case 2, we have two extra terms (where X = 0) which may affect the ordering of the graphs. However, restricting our attention to the domain D, when X = 0 all the parameters must be zero, u = 0 and F’= 0, so on D we have u(z,, X, F’> <

u(z1, X, F’> and the graphs are ordered as claimed. As z + CQ, the graphs o,f u(z, X, F’> in D approach the graphof u(m’, X, F) = 0 in D, which is the single point 0, so E is the disjoint union of the graphs.


