Estimating Components of Univariate Gaussian Mixtures Using Prony's Method

7
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO. 1, JANUARY 1987 task. Although the former is, in fact, a special case of the latter, it deserves a separate treatment because it can be presented in an es- pecially simple formulation which yields itself to mathematical analysis much more than the general conjugate gradient algorithm. Thus, the formal results on the gradient algorithm serve as a basis for a more heuristic evaluation of the conjugate gradient algorithm. Computationally, these algorithms are very similar to the Schlein power method [16]. However, their application can significantly reduce the number of iterations and the computation cost. In ad- dition, unlike the power algorithm, these algorithms are equally suitable for the partial SVD associated with minimal eigenvalues as well as with the one associated with maximal eigenvalues. REFERENCES [1] V. C. Klema and A. J. Laub, "The singular value decomposition: Its computation and some applications," IEEE Trans. Automat. Contr., vol. AC-25, pp. 164- 176, 1980. [2] G. Golub and W. Kahan, "Calculating the singular value and pseudo- inverse of a matrix," SIAMJ. Numer. Anal., Ser. B, vol. 2, pp. 205- 224, 1965. [3] G. H. Golub and C. Reinsch, "Singular value decompositions and least square solutions," Num. Math., vol. 14, pp. 403-420, 1970. [4] J. A. Cadzow, "Singular value decomposition approach to time series modelling," Proc. IEE, part F, vol. 130, pp. 202-210, 1983. [5] D. W. Tufts and R. Kumaresan, "Singular value decomposition for improved frequency estimation using linear prediction," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-30, pp. 671-675, 1982. [6] K. Konstantinides and K. Yao, "Applications of singular value de- composition to system modelling in signal processing," in Proc. Int. Conf. Acoust., Speech, Signal Processing, ICASSP-84, San Diego, CA, vol. 1, 1984, pp. 5.7.1 -5.7.4. [7] J. J. Gerbrendes, "On the relationship between SVD, KLT and PCA," Pattern Recognition, vol. 14, pp. 375-386, 1981. [8] P. A. Devijver and J. Kittler, Pattern Recognition-A Statistical Ap- proach. Englewood Cliffs, NJ: Prentice-Hall, 1982. [9] E. Oja, Subspace Methods of Pattern Recognition. Letchworth, En- gland: Research Studies Press, 1983. [10] H. Murakami and B. V. K. Vijaya Kumar, "Efficient calculation of primary images from a set of images," IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-4, pp. 511-515, 1982. [11] V. F. Pisarenko, "On the estimation of spectra by means of non- linear functions of the covariance matrix," Geophys. J. Roy. Astron. Soc., vol. 18, p. 511, 1972. [12] ', "The retrieval of harmonics from a covariance function," Geo- phys. J. Roy. Astron. Soc., vol. 33, pp. 347-366, 1973. [13] K. Fukunaga, Introduction to Statistical Pattern Recognition. New York and London: Academic, 1972. [14] D. R. Fuhrmann and B. Liu, "An iterative algorithm for locating the minimal eigenvector of a symmetric matrix,'" in Proc. 1984 IEEE Int. Conf. Acoust., Speech, Signal Processing, San Diego, CA, Mar. 1984, pp. 45.8.1-45.84. [15] H. Chen, T. K. Sarkar, S. A. Dianat, and J. D. Brule, "Adaptive spectral estimation by the conjugate gradient method," in Proc. 1985 IEEE Int. Conf. Acoust., Speech, Signal Processing, Tampa, FL, Mar. 1985, pp. 81-84. [16] A. Shlein, "A method for computing the partial singular value de- composition,' IEEE Trans. Pattern. Anal. Machine Intell., vol. PAMI-4, pp. 671-676, 1982. [17] H. Ratishauser, "Computational aspects of Bauer's simultaneous it- eration method," Numer. Math., vol. 13, pp. 4-13, 1969. [18] A. R. Gourlay and C. A. Watson, Computational Methods for Matrix Eigenproblems. London and New York: Wiley, 1973. [19] J. Karhunen, "Adaptive algorithms for estimating eigenvectors of correlation type matrices," in Proc. 1984 IEEE Int.Conf Acoust., Speech, Signal Processing, San Diego, CA, Mar. 1984, pp. 14.7.1- 14.7.4. [20] R. Haimi-Cohen, Ph.D. dissertation, Appendix E, submitted to Ben- Gurion Univ., Beer-Sheva, Israel. [21] A. G. Leunberger, Introduction to Linear and Non-Linear Program- mning. Reading, MA: Addison-Wesley, 1973. Estimating Components of Univariate Gaussian Mixtures Using Prony's Method HALUK DERIN Abstract-A new technique for estimating the component parame- ters of a mixture of univariate Gaussian distributions using the method of moments is presented. The method of moments basically involves equating the sample moments to the corresponding mixture moments expressed in terms of component parameters and solving these equa- tions for the unknown parameters. These moment equations, however, are nonlinear in the unknown parameters, and heretofore, an analytic solution of these equations has been obtained only for two-component mixtures [2]. Numerical solutions also tend to be unreliable for more than two components, due to the large number of nonlinear equations and parameters to be solved for. In this correspondence, under the condition that the component distributions have equal variances or equal means, the nonlinear moment equations are transformed into a set of linear equations using Prony's method. The solution of these equations for the unknown parameters is analytically feasible and nu- merically reliable for mixtures with several components. Numerous ex- amples using the proposed technique for two-, three-, and four-com- ponent mixtures are presented. Index Terms-Estimating components of Gaussian mixtures, Gauss- ian mixtures, method of moments, Prony's method. I. INTRODUCTION In this correspondence, a new solution to the classical problem of estimating the components of a mixture of univariate Gaussian distributions that extends the previously known results is pre- sented. The problem of mixture distributions has been extensively studied since 1894 (Pearson [1]) by researchers from a wide variety of areas, as it has applications in different areas from animal biol- ogy to pattern classification. The two aspects of the mixture distri- bution problem are identifiability of the mixture distribution and estimation of the distribution parameters. Of the two, the parameter estimation aspect has attracted most of the attention in the litera- ture. Many methods have been proposed and used for estimating the parameters of mixture distributions, ranging from the method of moments (Pearson [1] and Cohen [2]) through maximum likelihood approaches (Day [3] and Kazakos [4]) to graphical techniques (Bhattacharya [5] and Postaire and Vasseur [6]). There are also studies on estimating the parameters in multivariate Gaussian mix- tures (Cooper and Cooper [7] and Fukunaga and Flick [8]) and studies using the decision-directed approach (Katopis and Schwartz [9] and Kazakos and Davisson [10]). Young and Coraluppi [11], on the other hand, made use of a stochastic approximation algo- rithm for estimating the parameters of Gaussian mixtures. The above-mentioned studies are only a representative few. For an ex- tensive presentation on the methods used in the estimation of finite mixture distributions, the reader is referred to a monograph on the subject by Everitt and Hand [12]. Despite all the work done here- tofore, the interest in the problem persists, as indicated by the many Manuscript received June 10, 1985; revised March 21, 1986. Recom- mended for acceptance by J. Kittler. This work was supported in part by the National Science Foundation under Grant ECS-84-03685 and in part by the Office of Naval Research under Grant N00014-K-0059. This paper was presented at the Ninteenth Annual Conference on Information Sciences and Systems, The Johns Hopkins University, Baltimore, MD, March 1985. The author is with the Department of Electrical and Computer Engi- neering, University of Massachusetts, Amherst, MA 01003. IEEE Log Number 8609398. 0162-8828/87/0100-0142$01.00 © 1987 IEEE 142

Transcript of Estimating Components of Univariate Gaussian Mixtures Using Prony's Method

Page 1: Estimating Components of Univariate Gaussian Mixtures Using Prony's Method

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO. 1, JANUARY 1987

task. Although the former is, in fact, a special case of the latter, itdeserves a separate treatment because it can be presented in an es-

pecially simple formulation which yields itself to mathematicalanalysis much more than the general conjugate gradient algorithm.Thus, the formal results on the gradient algorithm serve as a basisfor a more heuristic evaluation of the conjugate gradient algorithm.

Computationally, these algorithms are very similar to the Schleinpower method [16]. However, their application can significantlyreduce the number of iterations and the computation cost. In ad-dition, unlike the power algorithm, these algorithms are equallysuitable for the partial SVD associated with minimal eigenvaluesas well as with the one associated with maximal eigenvalues.

REFERENCES

[1] V. C. Klema and A. J. Laub, "The singular value decomposition: Itscomputation and some applications," IEEE Trans. Automat. Contr.,vol. AC-25, pp. 164- 176, 1980.

[2] G. Golub and W. Kahan, "Calculating the singular value and pseudo-inverse ofa matrix," SIAMJ. Numer. Anal., Ser. B, vol. 2, pp. 205-224, 1965.

[3] G. H. Golub and C. Reinsch, "Singular value decompositions andleast square solutions," Num. Math., vol. 14, pp. 403-420, 1970.

[4] J. A. Cadzow, "Singular value decomposition approach to time seriesmodelling," Proc. IEE, part F, vol. 130, pp. 202-210, 1983.

[5] D. W. Tufts and R. Kumaresan, "Singular value decomposition forimproved frequency estimation using linear prediction," IEEE Trans.Acoust., Speech, Signal Processing, vol. ASSP-30, pp. 671-675,1982.

[6] K. Konstantinides and K. Yao, "Applications of singular value de-composition to system modelling in signal processing," in Proc. Int.

Conf. Acoust., Speech, Signal Processing, ICASSP-84, San Diego,CA, vol. 1, 1984, pp. 5.7.1-5.7.4.

[7] J. J. Gerbrendes, "On the relationship between SVD, KLT andPCA," Pattern Recognition, vol. 14, pp. 375-386, 1981.

[8] P. A. Devijver and J. Kittler, Pattern Recognition-A Statistical Ap-proach. Englewood Cliffs, NJ: Prentice-Hall, 1982.

[9] E. Oja, Subspace Methods of Pattern Recognition. Letchworth, En-gland: Research Studies Press, 1983.

[10] H. Murakami and B. V. K. Vijaya Kumar, "Efficient calculation ofprimary images from a set of images," IEEE Trans. Pattern Anal.Machine Intell., vol. PAMI-4, pp. 511-515, 1982.

[11] V. F. Pisarenko, "On the estimation of spectra by means of non-

linear functions of the covariance matrix," Geophys. J. Roy. Astron.Soc., vol. 18, p. 511, 1972.

[12] ', "The retrieval of harmonics from a covariance function," Geo-phys. J. Roy. Astron. Soc., vol. 33, pp. 347-366, 1973.

[13] K. Fukunaga, Introduction to Statistical Pattern Recognition. NewYork and London: Academic, 1972.

[14] D. R. Fuhrmann and B. Liu, "An iterative algorithm for locating theminimal eigenvector of a symmetric matrix,'" in Proc. 1984 IEEEInt.Conf. Acoust., Speech, Signal Processing, San Diego, CA, Mar.1984, pp. 45.8.1-45.84.

[15] H. Chen, T. K. Sarkar, S. A. Dianat, and J. D. Brule, "Adaptivespectral estimation by the conjugate gradient method," in Proc. 1985IEEE Int. Conf. Acoust., Speech, Signal Processing, Tampa, FL,Mar. 1985, pp. 81-84.

[16] A. Shlein, "A method for computing the partial singular value de-composition,' IEEE Trans. Pattern. Anal. Machine Intell., vol.PAMI-4, pp. 671-676, 1982.

[17] H. Ratishauser, "Computational aspects of Bauer's simultaneous it-eration method," Numer. Math., vol. 13, pp. 4-13, 1969.

[18] A. R. Gourlay and C. A. Watson, Computational Methods for Matrix

Eigenproblems. London and New York: Wiley, 1973.

[19] J. Karhunen, "Adaptive algorithms for estimating eigenvectors ofcorrelation type matrices," in Proc. 1984 IEEEInt.Conf Acoust.,

Speech, Signal Processing, San Diego, CA, Mar. 1984, pp. 14.7.1-

14.7.4.[20] R. Haimi-Cohen, Ph.D. dissertation, Appendix E, submitted to Ben-

Gurion Univ., Beer-Sheva, Israel.[21] A. G. Leunberger, Introduction to Linear and Non-Linear Program-

mning. Reading, MA: Addison-Wesley, 1973.

Estimating Components of Univariate GaussianMixtures Using Prony's Method

HALUK DERIN

Abstract-A new technique for estimating the component parame-ters of a mixture of univariate Gaussian distributions using the methodof moments is presented. The method of moments basically involvesequating the sample moments to the corresponding mixture momentsexpressed in terms of component parameters and solving these equa-tions for the unknown parameters. These moment equations, however,are nonlinear in the unknown parameters, and heretofore, an analyticsolution of these equations has been obtained only for two-componentmixtures [2]. Numerical solutions also tend to be unreliable for morethan two components, due to the large number of nonlinear equationsand parameters to be solved for. In this correspondence, under thecondition that the component distributions have equal variances orequal means, the nonlinear moment equations are transformed into aset of linear equations using Prony's method. The solution of theseequations for the unknown parameters is analytically feasible and nu-merically reliable for mixtures with several components. Numerous ex-amples using the proposed technique for two-, three-, and four-com-ponent mixtures are presented.

Index Terms-Estimating components of Gaussian mixtures, Gauss-ian mixtures, method of moments, Prony's method.

I. INTRODUCTIONIn this correspondence, a new solution to the classical problem

of estimating the components of a mixture of univariate Gaussiandistributions that extends the previously known results is pre-sented. The problem of mixture distributions has been extensivelystudied since 1894 (Pearson [1]) by researchers from a wide varietyof areas, as it has applications in different areas from animal biol-ogy to pattern classification. The two aspects of the mixture distri-bution problem are identifiability of the mixture distribution andestimation of the distribution parameters. Of the two, the parameterestimation aspect has attracted most of the attention in the litera-ture.Many methods have been proposed and used for estimating the

parameters of mixture distributions, ranging from the method ofmoments (Pearson [1] and Cohen [2]) through maximum likelihoodapproaches (Day [3] and Kazakos [4]) to graphical techniques(Bhattacharya [5] and Postaire and Vasseur [6]). There are alsostudies on estimating the parameters in multivariate Gaussian mix-tures (Cooper and Cooper [7] and Fukunaga and Flick [8]) andstudies using the decision-directed approach (Katopis and Schwartz[9] and Kazakos and Davisson [10]). Young and Coraluppi [11],on the other hand, made use of a stochastic approximation algo-rithm for estimating the parameters of Gaussian mixtures. Theabove-mentioned studies are only a representative few. For an ex-tensive presentation on the methods used in the estimation of finitemixture distributions, the reader is referred to a monograph on thesubject by Everitt and Hand [12]. Despite all the work done here-tofore, the interest in the problem persists, as indicated by the many

Manuscript received June 10, 1985; revised March 21, 1986. Recom-mended for acceptance byJ. Kittler. This work was supported in part bythe National Science Foundation under Grant ECS-84-03685 and in part bythe Office of Naval Research under Grant N00014-K-0059. This paper waspresented at the Ninteenth Annual Conference on Information Sciences andSystems, The Johns Hopkins University, Baltimore, MD, March 1985.

The author is with the Department of Electrical and Computer Engi-neering, University of Massachusetts, Amherst, MA 01003.

IEEE Log Number 8609398.

0162-8828/87/0100-0142$01.00 © 1987 IEEE

142

Page 2: Estimating Components of Univariate Gaussian Mixtures Using Prony's Method

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO. 1, JANUARY 1987

recent studies, perhaps because a completely satisfactory solutionunder general assumptions still does not exist.

In the present work, the interest in the problem stemmed from aneed to estimate the parameters of noisy image data. It is assumedthat the image consists of c regions, each of which has a uniform/constant brightness (gray) level ,i, and that this image is corruptedby an additive i.i.d. Gaussian noise with zero mean and a2 as vari-ance. The resulting image data, then, are modeled as a sample fromthe Gaussian mixture with c components, the i th component havingmean Ai and variance a2; hence the assumption or2 = a2 is made.The relative sizes of the regions are represented by pi, the weightof the ith component in the mixture. The problem we wanted tosolve was, assuming the number of components c is known, toestimate the parameters a2, ii, and pi, using the noisy image dataas the sample. For purposes of this estimation problem, we haveassumed that the image data consist of independent samples.

In this study, the method of moments has been employed as ageneral approach to estimate the component parameters, and theexisting results have been extended in terms of applicability andthe number of components in the mixture. Most of the existingmethods deal with only two-component mixtures, whereas here asolution using the method of moments and Prony's method, formixtures with several components, is presented. The proposedmethod is applied on several two-, three-, and four-componentmixtures, yielding very good estimates of the unknown parameters.A treatment of Prony's method [13] can also be found in Hilde-brand [14].

In the next section, the problem is formally stated, and themethod of moments is briefly described. In Section III, a solutionof the moment equations using Prony's method is described. Sev-eral numerical examples of two-, three-, and four-component mix-tures are presented in Section IV. A discussion on the potential andthe limitations of the proposed method is presented in the final sec-tion. A brief formulation of the mixture moments and the momentsof a Gaussian r.v. are presented in Appendixes A and B.

II. PROBLEM STATEMENT AND METHOD OF MOMENTSThe problem can simply be stated as follows. Based on a set of

N independent samples from the mixture distribution (pdf)

f(x) = i pf (X; Hi,) (1)

where c is the number of components, pi is the weight of the ithcomponent such that pi 2 0, for all i, and Epi = 1 and f (x; Ai,r2) is univariate Gaussian (pdf) with mean Ai and variance a2, it

is desired to estimate the unknown parameters of the mixture dis-tribution. The parameters of concem are c, pi, ,ui, and a . In certaincases, some of these parameters may be known, or if unknown,there may be some restrictions on them, such as that they be equal.We are primarily interested in the case where c is known and theo2are unknown but equal (say, to a2). So, based on the sample

{XI, X2, , }, we wish to estimate the parameters pi, ,ui anda 2. The formulation presented in the next section is also suitablefor the case where c is known and the mi are unknown but equal(say, to m). So in that case, pi, m, and a 2 are to be estimated. Inthese two cases, if the common variance a 2 (or the mean m) is alsoknown, the solution of the estimation problem is further simplified.

The method of moments approach is based on equating the sam-ple moments computed from the data to the corresponding mixturemoments, each of which is expressible to terms of pi, ili, and ai,the parameters to be estimated. As presented in Appendix A, equat-ing sample and mixture moments yields

M(k) = m(k)(P , * * , Pc, /, **, -c, C, , a2)

k = 1, 2, *- - ,K

c

= Zl pimi)(p,i, v2) k = 1, 2, * -, K (2)i=I

where M(k) is the kth sample moment, m(k) is the kth moment of themixture distribution, and m(k) is the kth moment of the ith Gaussiancomponent, N(Iti, a2). The kth moment of a Gaussian r.v. N(M,a2) is readily expressed in terms of pt and a2. See Appendix B fora recursive formulation of Gaussian moments.

For each moment, (2) yields a nonlinear equation in the param-eters to be estimated. Solutions of these nonlinear equations for theunknown parameters are taken as the estimates of these parameters.As many such equations as the number of unknown parameters, oras desired, can be generated. Solution of these equations, however,poses the main difficulty. For the two-component case, the set ofnonlinear equations is solved through a sequence of intricate andtedious substitutions and changes of variables [2]. The algebraicmanipulations are so complicated that a straightforward extensionto more than two-component case is quite hopeless.

In the next section, through a transformation and subsequentlyusing Prony's method, the nonlinear moment equations of (2) aretransformed into a set of linear equations which is readily solved.

III. PRONY'S METHOD SOLUTIONIt is noted that m(k)(A,i ur2) in (2), the kth moment of the ith

Gaussian component, is expressible as

[kl21M~k)(pti, ar2) = Z _YkJ k2j1r2I i

jO=0jt (3)

i= 1, , c

where [ ] denotes the integer part of its argument and the Yk1 areknown coefficients of the moment expansion. This expression forthe kth moment of a Gaussian r.v. is given in Appendix B in a morespecific form. Now, substituting (3) into (2), the moment equationsbecome

c [kl2]M(k) = E E PiYkj/ iaij k 1, * **, K.

i 1J 0=I ,,,,,K (4)

To proceed further, we now consider two special cases: 1) equalvariances and 2) equal means, for the components.

A. Equal Component Variances: a? = Cr2, vi

Assuming that c, the number of components, is known and thatall component variances are equal to a , the moment equations in(4) can be expressed as

[kl2]

M(k) = EkYkjOk 2jJ=O

where

pOnn -iLii n =O0, 1, ***,K.

(5)

(6)

The moment equations of (5) form a set of linear equations in 6,iwith a2 as an unknown parameter. Hence, these equations are read-ily solved in terms of J2 to yield

[n2]

6,n(a2 j=j.M(n -2j)or2.j n =0,l, ,K. (7) K

The problem is now reduced to solving the nonlinear equationsc

E piZ-p = 0,(a2) n = 0, 1, * * *, KZ = I

(8)

for the unknowns a , pi, and ,ui. Note that there are 2c + 1 un-knowns, but not all pi's are independent since Epi = 1 holds. Thiscorresponds to n = 0 in (8). Hence, K > 2c must be true to solvefor the unknowns. A point on notation is: all symbols (e.g., Ai, Pi,a2, i) represent the true values of the quantities, if they are known,or their estimates, if they are unknown.We propose to solve (8) using Prony's method. Consider the

polynomial with ii as its roots:

(m -,u)(M - A2) . . . (m - ,u) = O. (9)

143

(7)

Page 3: Estimating Components of Univariate Gaussian Mixtures Using Prony's Method

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO. 1, JANUARY 1987

TABLE IESTIMATING PARAMETERS IN Two COMPONENT GAUSSIAN MIXTURES

(SAMPLE SIZE: 2000)

DesC. a Al 2 Pt P2 Fig.

A 12.50 75.0 125.0 0.5 0.5B 12.45 75.04 125.03 0.498 0.502C - 75.06 125.00 0.498 0.502

A 14.0 84.0 116.0 0.5 0.5B 13.88 83.83 115.94 0.491 0.509C - 83.92 115.83 0.491 0.509

A 15.0 90.0 110.0 0.5 0.5B 15.11 89.52 109.11 0.456 0.544C - 89.38 109.30 0.458 0.542

A 12.5 75.0 125.0 0.6 0.4 1(a)B 12.48 75.22 125.15 0.601 0.399C - 75.23 125.14 0.601 0.399

A 14.0 84.0 116.0 0.6 0.4 1(b)B 14.06 84.30 116.05 0.602 0.399C - 84.23 116.08 0.601 0.400

A 15.0 90.0 110.0 0.6 0.4 1(c)B 15.06 90.20 109.72 0.592 0.408C - 90.08 109.78 0.590 0.411

A 12.5 75.0 125.0 0.8 0.2B 12.53 75.31 124.41 0.800 0.200C - 75.28 124.37 0.799 0.201

A 14.0 84.0 116.0 0.8 0.2B 14.13 84.45 115.18 0.801 0.199C - 84.25 114.95 0.795 0.205

A 15.0 90.0 110.0 0.8 0.2 1(d)B 15.51 91.49 109.60 0.852 0.148C - 89.96 107.65 0.762 0.238

A = specified values.B = estimates with unknown a2.C = estimates with specified 2.

Multiplying out (9),

mC _ qlmc-l - q2M'-2 * * -I mm-qc = O (10)

is obtained. The objective is to find the coefficients {qi } and sub-sequently the roots { ui } of this polynomial.

For each 1, 1 = 0, 1, , K - c, i+ o(a2) in (8) is multipliedby qc-i for i = 0, 1, , c - 1, and Oc+I(a2) is multiplied by-1, and then all are added. Making use of (10), this sum thenbecomes

c-

E Oi+1(ar2)qc-i = Oc+1(a2) (11)i=o

for 1 = 0, 1, , K - c. Equation (11) is a set of linear equationsin {qi }Ic= 1 with a2 as parameter. First, c of these equations, for 1- 0, 1, c- 1, are solved for qi in terms of a2; then, all ofthese are substituted into the (c + I)st equation, for 1 = c, andsolved for the estimate of the unknown parameter a2. Substitutingfor a2, the coefficients of the polynomial, qi and also 6i are nu-merically determined. Solving for the roots of the polynomial in(10), estimates of the component means A,i are calculated.

Having determined estimates of the p.i and 6i, the set of linearequations in (8) is solved for the estimates of pi. Thus, estimatesfor all unknown parameters are determined. For the above scheme,it suffices to have K = 2c; that is, sample moments up to 2c, wherec is the number of components in the mixture, need to be consid-ered.A point of concern is that the (c + I)st equation in (11), which

is to be solved for a2, is a polynomial in a2 and will possibly yieldmultiple real roots as values for a2 (especially for c > 2). All ofthese solutions are valid, but only the smallest value for a2 will

yield a mixture which truly has c components. The next largest oneyields a solution, where one component has zero weight, that is ineffect a solution with c - 1 components. The next largest value fora2, if it exists, will yield a c-2 component solution. Thus, insearching for c-component mixtures, the technique yields the c-component solution as well as solutions with fewer components, ifthey are compatible with the given set of sample moments. All thisis due to the inherent characteristic of the method of moments thata given finite set of sample moments may not uniquely specify amixture. In summary, to find the c-component mixture, we choosethe smallest real root of the polynomial as the value of a2. Othercomputational concerns are discussed in the next section.

With the procedure described above, the need to solve 2c (for K- 2c) nonlinear equations (4) for 2c unknowns is transformed intoa need to solve two sets of c linear equations and to find the rootsof a polynomial. Thus, the computational complexity is drasticallyreduced.

Another point of interest is that if the common variance a2 isknown, then the parametric solution of a set of linear equations isnot necessary, and neither is the solution of the roots of a polyno-mial in a2. In this case, estimating the ,ui and pi simply involvesthe solution of two sets of c linear equations.

Numerous examples with two-, three-, and four-componentmixtures, for both unknown and known a2, are reported in the nextsection.

B. Equal Component Means: /li = p, Vi

In this case, the problem has exactly the same character and asolution similar to that for the equal component variance case. With

144

Page 4: Estimating Components of Univariate Gaussian Mixtures Using Prony's Method

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO. 1, JANUARY 1987

60

45

40

35

30

25

20

10

5

0r I Al

0 20 40 60 80 100 120 140 160 180

(b)

60

50

40

30

20

00 20 40 60 80 100 120 140 160 180

(d)Fig. 1. Sample distributions with two components.

TABLE IIESTIMATING PARAMETERS IN THREE-COMPONENT GAUSSIAN MIXTURES

SampleDesc. a I /42 3 Pi P2 P3 Size Fig.

A 10.0 60.0 100.0 140.0 0.333 0.333 0.334 2000 2(a)B 10.01 60.55 100.38 139.76 0.337 0.328 0.335C - 60.55 100.38 139.77 0.337 0.328 0.335

A 12.0 72.0 100.0 128.0 0.333 0.333 0.334 2000 2(b)B 13.59 74.47 90.27 123.26 0.315 0.234 0.451C - 71.79 97.13 127.02 0.308 0.330 0.362

A 12.0 72.0 100.0 128.0 0.333 0.333 0.334 10 000B 12.07 71.89 98.13 127.31 0.325 0.323 0.353C - 71.71 98.19 127.45 0.323 0.327 0.350

A 10.0 60.0 100.0 140.0 0.45 0.4 0.15 2000 2(c)B 9.54 59.68 99.41 140.48 0.440 0.410 0.150C - 60.27 100.05 140.28 0.451 0.401 0.149

A 12.0 72.0 100.0 128.0 0.45 0.4 0.15 10 000 2(d)B 11.97 71.42 97.79 126.32 0.432 0.391 0.177C - 71.49 97.85 126.28 0.434 0.389 0.177

A = specified values.B = estimates with unknown a2.C = estimates with specified a2.

,ui = ,u, for all i, and c known, the moment equations in (4) can be whereexpressed as

[kl2]

M(k) = EZYkj - 2j oj k = 1, 2, - * *, KJ=O

ii=L2Pii? j = O, 1 L * *

5045

40

35

30

25

20

15

10

5

0

60

50

40

30

20

10

(a)

0 l . L0 20

(c)

145

(13)(12)

Page 5: Estimating Components of Univariate Gaussian Mixtures Using Prony's Method

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO. 1, JANUARY 1987

40

35

30

25

20

i5

10

5

0

5045

40

35

30

25

20

5

10

5

0

(a)

(c)

40

35

30

25

20

5

0,I M 1 ..II,1020 40 60 80 100 120 140 160 180

(b)200180

160

140

120

10 20 I0 6 O 10 12 4 6too~~~~~~~~~d

80I

60I

40

20

020 40 60 80 100 120 140 180 180(d)

Fig. 2. Sample distributions with three components.

TABLE IIIESTIMATING PARAMETERS IN FOUR-COMPONENT GAUSSIAN MIXTURES

SampleDesc. a /. 2 A3 A44 PI P2 P3 P4 Size Fig.

A 6.0 36.0 78.0 121.0 164.0 0.25 0.25 0.25 0.25 2000B 6.6 36.4 77.1 118.9 163.5 0.252 0.235 0.255 0.258C - 35.9 76.6 119.9 164.0 0.246 0.243 0.259 0.252

A 8.0 48.0 82.0 117.0 152.0 0.25 0.25 0.25 0.25 2000 3(a)B 9.4 49.4 74.9 107.7 149.6 0.254 0.159 0.297 0.290C - 47.4 77.4 114.1 151.8 0.232 0.232 0.280 0.257

A 10.0 60.0 86.0 113.0 140.0 0.25 0.25 0.25 0.25 2000 3(b)B 11.3 63.7 99.9 136.7 225.7 0.328 0.351 0.322 1.6e-6C - 60.6 88.9 115.8 140.3 0.263 0.272 0.224 0.241

A 6.0 36.0 78.0 121.0 164.0 0.3 0.1 0.3 0.3 2000B 3.5 34.6 77.5 124.8 165.8 0.283 0.135 0.308 0.274C - 36.4 84.5 123.0 164.1 0.306 0.116 0.282 0.296

A 8.0 48.0 82.0 117.0 152.0 0.3 0.1 0.3 0.3 10 000B 9.0 47.7 59.2 110.9 150.2 0.244 0.105 0.318 0.334C - 47.1 70.4 114.6 151.7 0.270 0.104 0.318 0.308

A = specified values.B = estimates with unknown r2.C = estimates with specified a2.

Equation (12) is solved for the kj in terms of it as parameter. Sub- IV. EXAMPLESstituting for the X>, (13) can be solved using Prony's method for In this section, several examples of estimating the componentPi, ai , and ,t. The mechanics of the solution is very similar to that parameters from the sample moments using the technique describedof the equal variance case. in the previous section are presented. All examples are of the case

146

Page 6: Estimating Components of Univariate Gaussian Mixtures Using Prony's Method

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO. 1, JANUARY 1987

(a)

40

35

30

25

1 ~~~~~~20

1 ~~~~~~~5

140 10 180 0 20 40

Fig. 3. Sample distributions with four components.

(b)

TABLE IVESTIMATING PARAMETERS IN Two-COMPONENT GAUSSIAN MIXTURES WITH SMALL-SIZE SAMPLES

SampleDesc. a A 12 PI P2 Size

A 10.0 60.0 140.0 0.5 0.5 100B 8.98 59.61 141.48 0.497 0.503C - 59.84 141.24 0.495 0.505

A 10.0 60.0 140.0 0.6 0.4 100B 8.91 59.57 142.15 0.598 0.402C - 59.90 142.03 0.600 0.400

A 10.0 60.0 140.0 0.8 0.2 100B 8.81 59.93 142.99 0.797 0.203C - 60.40 143.53 0.803 0.197

A 10.0 60.0 140.0 0.5 0.5 500B 9.55 60.49 140.07 0.503 0.497C - 60.60 139.96 0.503 0.497

A 10.0 60.0 140.0 0.6 0.4 500B 9.55 60.25 140.26 0.603 0.397C - 60.40 140.21 0.603 0.397

A 10.0 60.0 140.0 0.8 0.2 500B 9.91 60.18 139.22 0.799 0.201C - 60.22 139.27 0.800 0.200

A 12.5 75.0 125.0 0.6 0.4 500B 11.82 75.30 125.58 0.607 0.393C - 75.75 125.43 0.612 0.383

A 14.0 84.0 116.0 0.6 0.4 500B 13.03 83.95 116.83 0.607 0.393C - 85.08 116.51 0.625 0.375

A 15.0 80.0 110.0 0.6 0.4 500B 13.74 88.68 110.61 0.571 0.429C - 91.03 109.64 0.621 0.379

A = specified values.g2.B = estimates with unknown a

C = estimates with specified a2.

where component variances are equal. Mixtures with two, three,and four components are considered. Samples from the mixturedistribution are generated with the random number generator on a

VAX- 1 1/780, which is a multiplicative congruential generator, andGaussian random numbers are generated by summing 12 uniformlydistributed random numbers and properly normalizing the sum. Dif-ferent sample sizes were tried. Some observations on this are re-

ported below.The critical factors in obtaining quality estimates are the number

of components, the sample size, and the mean-separation-stan-dard-deviation (MS-SD) ratio. The effects of these factors are in-terdependent, and they are investigated through numerous exam-

ples. Due to space considerations, only a representative few of theseexamples are reported here, and for some of these examples thesample distributions are presented. The observed effects of the threefactors are discussed below.

The implementation of the technique on examples is carried outusing the program package MACSYMA running on a VAX-11/780

40

35

30

25

20

15

1o

5

0

147

Page 7: Estimating Components of Univariate Gaussian Mixtures Using Prony's Method

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO. 1, JANUARY 1987

computer. This program package is very useful and convenient infinding parametric solutions to sets of linear equations and findingthe real roots of a polynomial.

In Table I, some parameter estimation results are presented fortwo-component mixtures, of which the sample distributions forsome are depicted in Fig. 1. Specified parameter values, their es-

timates with a2 unknown, and their estimates with a2 known tohave the specified value are presented for each example. Similarly,in Table II and Fig. 2 examples on three-component mixtures andin Table III and Fig. 3 examples on four-component mixtures are

presented. Some examples on two-component mixtures, performedwith relatively smaller sample sizes of 100 and 500, are presentedin Table IV. Comparison of the results in Tables I and IV dem-onstrates the effect of sample size on the accuracy of the estimates.The samples from the mixture distribution are discretized for pur-poses of obtaining a sample distribution, and the sample distribu-tion plots in the figures are normalized with the respect to the high-est frequency in each sample distribution. Some observations on

the examples presented here, and on those that are not, are dis-cussed below.

A. Observations

1) As c, the number of components in the mixture, increases,the quality of estimates decreases since higher-order sample mo-

ments are needed in the estimation of more components, and higher-order sample moments are less reliable. This effect is perhaps dueto the particular random number generation scheme used.

2) For c 2, the estimates are excellent, even with small sam-

ple sizes (down to 100) (see Table IV) and relatively low MS-SDratios (down to 1). For c = 3, the estimates are quite good, andfor c 4, they are reasonably good.

3) In almost all cases, the estimates with known a2 are betterthan those where a2 is unknown and is also estimated.

4) Small sample sizes work well for easy problems such as withc = 2 or high MS-SD ratios. Only slight improvement results fromincreasing the sample size. For the difficult problems, however (forexample, with c = 4 and low MS-SD ratios), the estimates are notvery accurate, and increasing the same sample size does not nec-

essarily improve the estimates.5) In using the method, there are no theoretical limitations on

the number of components. On the other hand, numerical calcula-tions may become cumbersome for more than four components.Moreover, the high-order moments are less reliable, and hence yieldless accurate estimates.

6) The restrictions observed in the accuracy of estimations forthe c = 4 case, in our opinion, are partly due to the imperfectrandom number generation scheme and are consequently due to

deviations in the sample moments from their supposed-to-be val-ues.

7) The claim in the last observation is supported by the fact thatfor c = 4, and even with a low MS-SD ratio, when supposed-to-be sample moments are used, very accurate estimates result.

V. CONCLUSIONA new technique that employs the method of moments for esti-

mating the parameters of a Gaussian mixture is presented. Thistechnique, which uses Prony's method, results in drastic simplifi-cation in determining estimates. It is restricted to equal component

variances and equal component means cases. Its applicability isdemonstrated on examples with two-, three-, and four-componentmixtures. It is noted, however, that this technique is only as goodas the method of moments is in determining estimates of the mix-ture parameters. As would be expected, inaccurate sample mo-

ments results in inaccurate estimates.

APPENDIX A

The kth mixture moment can be expressed as

m (PI,P * *

, P;1 I,.* .

* p,c;a2,.**,aC) xJ f(x) dx

where the pi, ,ui, and ur are the relative weights, means, and vari-ances of the mixture components, respectively, andf (x) is the mix-ture pdf. Substituting f(x) from (1) into (Al) and expressing thekth mixture moment [left-hand side of (Al)] by m(k),

c 00

m (k) = 2 i X f(X; /ii, ai') dxpi= ioC

= Epim()(pi, ai )i= l

is obtained where mtnk)(,, aU2) is the kth moment of the ith com-ponent in the mixture.

APPENDIX BFor x, a Gaussian r.v. with mean ,u and variance a 2, i.e., x

N(,, a2), m(n) the nth moment of x, is given by the followingrecursive expressions in terms of p. and aJ2:

m(2k) = A 2k + CUkIA2k-2o2+ °k,2 U + * + k,ka2k

m(2k+ l) = /L2k+1 +k I2k- I2

+ fk,2/L 3or + * * * +Skkka2k. (B 1)

Recursive relationships for Yk, i and ,k,i are

1k,i = Ukj, + 2[k -(i - l)]Uk,i- I i = 1, , k

ak+l,i = 3k,i + {2[k -(i - 1)] + 1 ,3k,i i = 1, , k + 1

(B2)

with Ctk,O =-k,o = 1 and 1ij = 0 for] > i. These recursive expres-sions are readily obtained from the characteristic function of x.

REFERENCES

[1] K. Pearson, "Contribution to the mathematical theory of evolution,"Phil. Trans. Roy. Soc. A, vol. 185, pp. 71-110, 1894.

[2] A. C. Cohen, "Estimation in mixtures of two normal distributions,"Technometrics, vol. 9, pp. 15-28, Feb. 1967.

[3] N. E. Day, "Estimating the components of a mixture of normal dis-tributions," Biometrika, vol. 56, pp. 463-474, 1969.

[4] D. Kazakos, "Recursive estimation of prior probabilities using a mix-ture," IEEE Trans. Inform. Theory, vol. IT-23, pp. 203-211, Mar.1977.

[5] C. G. Bhattacharya, "A simple method of resolution of a distributioninto Gaussian components," Biometrics, vol. 23, pp. 115-135, Mar.1967.

[6] J. G. Postaire and C. P. A. Vasseur, "An approximate solution tonormal mixture identification," IEEE Trans. Pattern Anal. Mach. In-tell., vol. PAMI-3, pp. 163-179, Mar. 1981.

[7] D. B. Cooper and P. W. Cooper, "Non-supervised adaptive signaldetection and pattern recognition," Inform. Contr., vol. 7, pp. 416-444, Sept. 1964.

[8] K. Fukunaga and T. E. Flick, "Estimation of the parameters of aGaussian mixture using the method of moments," IEEE Trans. Pat-tern Anal. Mach. Intell., vol. PAMI-5, pp. 410-416, July 1983.

[9] A. Katopis and S. C. Schwartz, "Decision-directed learning usingstochastic approximation," in Proc. Modelling Simulation Conf.,1972, pp. 473-481.

[10] D. Kazakos and L. D. Davisson, "An improved decision-directeddetector," IEEE Trans. Inform. Theory, vol. IT-26, pp. 113-115,Jan. 1980.

[11] T. Y. Young and G. Coraluppi, "Stochastic estimation of a mixtureof normal density functions using an information criterion," IEEETrans. Inform.Theory, vol. IT-16, pp. 258-263, May 1970.

[12] B. S. Everitt and D. J. Hand, "Finite mixture distributions,"Monogr. Appl. Prob. Stat. London: Chapman and Hall, 1981.

[13] R. de Prony, "Essai experimentale et analytique," J. Ecole Polytech.(Paris), vol. 1, pp. 24-76, Dec. 1795.

[14] F. B. Hildebrand, Introduction to Numerical Analysis. New York:McGraw-Hill, 1956, pp. 378-382.

148

(A2)