Post on 03-Aug-2020
HOW MANY MODES CAN A TWO COMPONENT MIXTURE HAVE?
Surajit Ray and Dan Ren
Boston University
Abstract: The main result of this article states that one can get as many as D + 1 modes from a
two component normal mixture in D dimensions. Multivariate mixture models are widely used
for modeling homogeneous populations and for cluster analysis. Either the components directly
or modes arsing from these components are often used to extract individual clusters. Though in
lower dimensions these strategies work well, our results show that high dimensional mixtures are
often very complex and researchers should take extra precaution while using mixtures for cluster
analysis. Even in the simplest case of mixing only two normal components in D dimensions, we
can show that it can have a maximum of D + 1 modes. When we mix more components or if
the components are non-normal the number of modes might be even higher, which might lead us
to wrong inference on the number of clusters. Further analyses show that the number of modes
depend on the component means and eigenvalues of the ratio of the two component covariance
matrices, which in turn provides a clear guideline as to when one can use mixture analysis for
clustering high dimensional data.
Key words and phrases: Mixture, modal cluster, multivariate mode, clustering, dimension reduc-
tion, topography, manifold
1 Introduction
1.1 Number of modes of a normal mixture
Multivariate normal mixtures provide a flexible method of fitting high-dimensional data.
This fit often provides a primary data reduction through the number, location and shape
of its components. However, a more interesting question relates to the exploration of how
components interact to describe an overall pattern of density. Of particular interest is finding
the number of modes the density displays. The relation between the number of modes and
number of components is not one to one. Often modes are used to determine the number of
homogeneous groups in a population (Li et al., 2007; McLachlan and Peel, 2000; Titterington
et al., 1985). Modes of densities are also widely used to summarize posterior distributions in
Bayesian analysis (Berger, 1985; Lehmann and Casella, 1998) and to build Bayesian inferential
framework.
1
The main results of this paper is summarized in the following theorem:
Theorem 1. A D dimensional normal mixture of two components has at most D + 1 modes
and a mixture with D + 1 modes always exists in D dimensions.
In one dimension a two-component normal mixture can display one or two modes. But
the density shapes become complex in higher dimensions. For example a two-component
normal in two dimensions can give rise to one, two or three modes (see Ray and Lindsay,
2005, for a three mode example). Ray and Lindsay (2005) provide more examples in two
and three dimensions where the number of modes are more than the number of mixing
components. But beside these pathological examples there is no result on the upper bound of
the number of modes that a mixture of normals can display. This paper provides the first set
of results on the upper bound for the number of modes of a two-component normal mixture.
We also show that this bound is tight, i.e., we can provide numerical values for a mixture
which attains this upper bound.
It is well known that the topography, in the sense of the key features as a density of
a mixture of distributions is often extremely complex. Among the different features of the
topography we are especially interested in the number of modes the density displays referred
to as the modality of the density from here on. Ray and Lindsay (2005) provide a detailed
understanding of the topography of mixtures of normal distributions in terms of the means
and variances of the component distributions. But how these density shapes respond to the
rotation or scaling based on the component covariances is not well studied. For example, it
is not clear if rotation and scaling retains all the modes after transformation. In this paper
we present a set of results showing the invariance of modality of normal mixtures under
the operation of translation, scaling and rotation. These results allow us to show that the
modality of a two-component mixture of normals with arbitrary variance-covariance matrices
is mathematically equivalent to the topography of a mixture of normals, with one component
of which has a spherical covariance and the other has an appropriate diagonal covariance
matrix of the same dimension. A follow up analysis shows that, the number of modes are
closely related to the number of unique eigenvalues of the ratio of the covariance matrices, in
a matrix sense (inverse of one matrix multiplied by the other matrix). Finally we use these
results to arrive at the main result on the tight upper bound on the number of modes.
1.2 Relevant Literature
Studies of the number of modes of normal mixtures date back to the beginning of twentieth
century but until recently the results have focused primarily on univariate mixtures. In fact,
there is a simple description of modality when one is mixing two univariate normal com-
ponents. Helguero (1904) determined necessary and sufficient conditions for bimodality in
2
the mixture of two univariate normals with equal variances and mixing proportions. More
research on univariate mixture cases followed. For example, Eisenberger (1964) investigated
the conditions for bimodality in the mixture of two univariate normals with arbitrary variance
and mixing proportions and Behboodian (1970) derived a sufficient condition for unimodal
mixture densities. Kakiuchi (1981) and Kemperman (1991) then extended the problem to
mixtures of non-normal distributions, and derived corresponding necessary and sufficient con-
ditions. In the context of multivariate normal mixtures, a recent result by Carreira-Perpinan
and Williams (2003) shows that for any D-dimensional normal mixture, the number of modes
cannot exceed the number of components if each component has the same covariance matrix
up to a scalar scaling factor. The most recent and comprehensive results in this area of re-
search are provided by Ray and Lindsay (2005), who present the most generalized modality
results for arbitrary dimensions, number of components and component variance structure.
The key result in Ray and Lindsay (2005) shows that the topography of multivariate mix-
tures, in the sense of their key features as a density, can be analyzed rigorously in lower
dimensions by use of a ridgeline manifold that contains all critical points as well as the ridges
of the density. This important topographical result allows them to solve for the number of
modes both analytically and numerically. Besides solving for the number of modes Ray and
Lindsay (2005) provide pathological examples of more modes than components in more than
one dimension. A comprehensive summary of the above results are available in Fruhwirth-
Schnatter (2006) and a recent review paper by Melnykov and Maitra (2010). Much of the
modality theory discussed in Ray and Lindsay (2005) has been widely used for developing
clustering techniques by Ray and Lindsay (2008); Coretto and Hennig (2010); Hennig (2010b)
and Hennig (2010a) and for the advancement of likelihood based inference for normal mix-
tures by Chen and Tan (2009); Holzmann and Vollmer (2008); Dannemann and Holzmann
(2008) and Lindsay et al. (2008). Applications of these results are found in new areas of
research such as signal processing (Li, 2007; Scott et al., 2009) and image retrieval (Sfikas
et al., 2005).
Using the modality theorem in the special case of a two-component normal mixture, Ray
and Lindsay (2005) provide examples of three modes in two dimensions, and four modes in
three dimensions. These mixtures have unequal covariances matrices, but they are limited
to being diagonal in structure. But providing an upper bound of modes for mixtures in arbi-
trary dimensions for arbitrary component variance-covariance matrix remained an unresolved
problem.
1.3 Our Results
The main contribution of this paper is to provide a tight upper bound for the number of
modes of a two-component normal mixture for arbitrary dimension and arbitrary component
3
variance-covariance matrices.
Let us denote the dimension of the multivariate normal density by D and the number
of components of the mixture by K. In this paper, we only consider two-component normal
mixture cases, i.e., K = 2; and the corresponding parameters for each normal density are their
means µi and variance-covariance matrices Σi, i = 1, 2. Let π, π = 1 − π be the respective
proportions of two densities. It can be shown that for specified means and variances the
number of modes depends on the mixing proportions. In fact, Ray and Lindsay (2005) provide
examples of mixtures where different ranges of π display one, two and three modes for the
same means and variance-covariance matrices. But one should notice that the specification
of π is irrelevant in the context of determination of the maximal number of modes displayed
by a mixture of two components. In other words we are asking the following question– given
a pair of component means and covariance matrices what is the maximum number of modes
it can display if one has the complete freedom of choosing the mixing proportion π? Hence
we will ignore the parameter π for our analysis and for notational ease we will denote a D
dimensional mixture of two components with means µ1 and µ2, and variances Σ1 and Σ2
by NM(µ1, Σ1, µ2, Σ2)D. Our main result shows that the number of modes for the above
mixture is bounded above by D + 1 and that bound is achievable for any D. In fact we
provide a recursive algorithm to construct the parameters of the component densities which
attain this bound.
Modes are defined as the local maxima of the density height and understanding the
modes require understanding of the topography of the density along with their higher order
features. Many of the results we will use in this paper are based on these higher order features
of normal mixtures defined in terms of Π-function (different from the omitted parameter π)
and curvature functions defined in Ray and Lindsay (2005). So, in Section 2 we will first
define the terminologies and state some of the important results from Ray and Lindsay (2005)
which will be used in this paper. In particular we will present the concept of Π-functions
and curvature functions of a mixture, which have the advantage of being expressed explicitly
in terms of means and variances of components while retaining full information about the
topography and hence the number of modes of a mixture. Moreover the Π-function and
curvature function attain a very simple form for a two-component normal mixture. This
simplification of the curvature function allows us to show that the number of modes of the
two-component mixtures is explicitly determined by the number of roots of the curvature
function within the range [0,1].
But the roots of the curvature function defined in Section 2 are very difficult to study
for arbitrary mixtures. Ray and Lindsay (2005) explore the roots for curvature functions
only in the case of diagonal covariance matrices up to three dimensions. In this paper we
seek to generalize the modality results for arbitrary dimensions and component variance-
4
covariance matrices. To arrive at these results in Section 3, we first show that modality of
an arbitrary D-dimensional normal mixture NM(µ1, Σ1, µ2, Σ2)D remains unchanged under
any translation and a specified scaling and rotation of the random variable. These results will
be enormously helpful as it will allows us to study the topography of arbitrary D-dimensional
normal mixture by exploring the topography of a simplified class of normal mixtures with
the first component being a standard normal and the second component having a diagonal
covariance matrix. We denote this class by NM(0, I,µ, Λ)D, where, 0 and µ are both D
dimensional means, I is a identity matrix and Λ is a diagonal matrix of dimension D. These
results are derived analytically and examples are provided to illustrate these results.
In Section 4 we explore the modality of normal mixtures of the form NM(0, I,µ, Λ)D.
We show that the maximum number of modes is constrained by d, the number of distinct
diagonal entries in Λ. In fact the modality of such a normal with d distinct diagonal entries,
is less than or equal to (d+1). It is easy to check that d can be equal to the dimension D and
thus we arrive at the first part of our result showing that any arbitrary D dimensional normal
mixture can have at most (D + 1) modes. The tightness of the stated bound is achieved by
providing a recursive method for construction of two-component normals which achieve this
bound. In this section we also show that many previous modality results can be stated as
special cases of our generalized result. For D = 1, this can be used to prove the univariate
results in Helguero (1904) and Robertson and Fryer (1969). For D = 2 and D = 3 our results
show that the examples in Ray and Lindsay (2005) achieve the upper limit of the number of
modes in their respective dimensions.
Section 5 provides some discussion and further research directions regarding the number
of modes of multivariate normal mixture of more than two components. Generalization of the
modality of mixtures of multivariate normals to multivariate-T densities and then ultimately
to multivariate elliptical distributions will also be discussed in this section.
2 Topography of multivariate normals
In this section we state some important results from Ray and Lindsay (2005) that will be
extensively used in this paper. The rest of the paper will use the notations defined in this
section. Readers familiar with the results in Ray and Lindsay (2005) may skip this section.
Ray and Lindsay (2005) presents a unified theory for understanding the topography of
high dimensional normal mixtures. Their main result shows that the topography of mixtures,
in the sense of their key features as a density, can be analyzed rigorously in lower dimensions
by use of a ridgeline manifold that contains all critical points as well as the ridges of the
density.
A K-component mixture of D-dimensional normals can be represented by the probability
5
density function
g(x) = π1φ(x; µ1, Σ1) + π2φ(x; µ2, Σ2) + . . . + πKφ(x; µK , ΣK),x ∈ RD,
where πj is the mixing proportion of component j, πj ∈ [0, 1],∑K
j=1 πj = 1, and φ(x; µ, Σ)
is the density of a multivariate normal distribution with mean µ and variance Σ. We will
sometimes use φj(x) as shorthand notation for φ(x; µj , Σj), and call φj the jth component
density.
2.1 The K-1 dimensional ridgeline manifold
Definition 1. The K − 1 dimensional set of points
SK =
{α∈ R
K : αi ∈ [0, 1],
K∑
i=1
αi = 1
}
will be called the unit simplex. The function x∗(α) from SK into RD defined by
x∗(α) =[α1Σ
−11 + α2Σ
−12 + . . . + αKΣ−1
K
]−1 [
α1Σ−11 µ1 + α2Σ
−12 µ2 + . . . + αKΣ−1
K µK
]
will be called the ridgeline function. It will sometimes be written as x∗
α. The image of this
map will be denoted by M and called the ridgeline surface or manifold. If K = 2, it will be
called the ridgeline as it is a one-dimensional curve.
Theorem 2. (Ray and Lindsay, 2005) Let g(x) be the density of a K-component multivari-
ate normal densities as given by (2). Then all of g(x)’s critical values, and hence modes,
antimodes and saddle points, are points in M.
The previous result states that instead of exploring the whole RD space to find modes,
we now only need to concentrate on the ridgeline, embedded in the (K − 1)-dimensional unit
simplex. In this paper we only deal with two components and for K = 2 the ridgeline can be
represented as
x∗(α) = S−1α
[αΣ−1
1 µ1 + αΣ−12 µ2
], where Sα =
[αΣ−1
1 + αΣ−12
](1)
and α ∈ [0, 1] and α = 1−α. As α varies from 0 to 1, the image of the function x∗(α) defines
a curve from µ1 to µ2 and the critical points of the D-dimensional mixture can be explored
by evaluating the height of the density along the curve x∗(α). Thus we next consider the
diagnostic properties of the elevation plot along the curve x∗(α) defined by
h(α) = g (x∗(α)) .
We will call h(α) the ridgeline elevation function. Analytically, the number of peaks of h(α)
is exactly the maximum number of modes the mixture can display. In some cases a visual
6
inspection of h(α) or numerical root finding methods might allow us the enumerate the roots
of h(α) and hence the number of modes. But depending on the resolution, numerical methods
can always miss some zero crossings. Moreover, numerical solutions will not serve the purpose
of this paper which focuses on determining the upper bound on the number of modes. Hence
we focus our attention to finding analytical solutions for the critical points of h(α) for finding
the number of modes of the mixture.
2.2 The curvature function
To find the number of modes, first note that x∗(α) is a critical value of h(α) if it satisfies
h′(α) = πφ1(x∗(α))′ + πφ2(x
∗(α))′ = 0,
where prime ′ denotes differentiation with respect to α. Solving the last displayed equation
for π, and turning it into a function of α we get:
Π(α) =φ′
2(α)
φ′
2(α) − φ′
1(α).
As we are just interested in the number of modes we can examine the number of up and
down oscillations of the function Π. Section 4 of Ray and Lindsay (2005) shows that the
number of up-down oscillations of Π, is given by n, the zeroes of
Π′(α) = −φ′′
2(α)φ′
1(α) − φ′′
1(α)φ′
2(α)(φ′
2(α) − φ′
1(α))2 .
In general, to determine the sign changes of Π′ we can use any function of α with the
same numerator φ′′
2(α)φ′
1(α) − φ′′
1(α)φ′
2(α), provided the denominator is a positive function
of α. Using the denominator φ1(α)φ2(α) instead of (φ′
2(α) − φ′
1(α))2 the curvature function
κ(α) is defined as:
κ(α) =φ′′
2(α)
φ2(α)
φ′
1(α)
φ1(α)− φ′′
1(α)
φ1(α)
φ′
2(α)
φ2(α). (2)
We use κ(α) as it results in a simple expression for any distribution belonging to the ex-
ponential family. It is closely related to the mixture curvature measures given by Lindsay
(1983).
2.3 Properties of the Curvature function κ(α)
We now study the curvature function κ(α) more closely, as it will be extensively used to
prove the results in Section 3 and Section 4.
The following result, provides a simple expression for the curvature for the mixture of
normals.
7
Theorem 3. (Ray and Lindsay, 2005) Let g(x) be the mixture of two multivariate normal
densities. Then the curvature function in (2) is given by
κ(α) = [p(α)]2[1 − ααp(α)],
where p(α) = (µ2 − µ1)′Σ−1
1 S−1α Σ−1
2 S−1α Σ−1
2 S−1α Σ−1
1 (µ2 − µ1). (3)
By the expression above, p(α) is always positive. Thus zeroes of κ(α) are the same as
the zeroes of (1 − ααp(α)). For notational ease, let us denote
q(α) = 1 − ααp(α). (4)
By calculation, q(0) = q(1) = 1 and hence, κ takes positive values at the two extremes
α=0 and 1. Thus, there are an even number of sign changes of the function κ(α) in the
range [0,1], as also indicated by the nature of Π. In particular at the first zero, α1, of κ, the
function Π has a maximum, at the next α1 a minimum, and so forth. Thus we arrive at the
following result relating the number of solutions of q(α) to the modality of the mixture.
Result 1. Let n be the number of solutions of q(α) in the range [0,1]. Then the corresponding
mixture will display n2 + 1 modes.
We note that both p(α) and q(α) uniquely defines the number of modes. We will use
p(α) to show the invariance in the proof of Theorem 5, and later use q(α) to find the number
of modes while providing the proofs of other theorems.
3 Invariance of modality under scaling and rotation
Studying the modality of arbitrary normal mixtures directly based on the curvature function
κ(α) is a very complex undertaking. Instead in this section we will show that the curva-
ture function which defines the modal features of a two-component normal mixture remains
unchanged under certain transformations. We will use these transformations to show that
the topography of arbitrary D-dimensional normal mixture can be examined by exploring
the topography of a simplified class of normal mixture given by the mixture of a spherical
normal and a normal with a diagonal covariance matrix. We arrive at this result in two steps
described in the following two subsections.
3.1 Invariance of modality under scaling
First we state the theorem that provides the simplification that in D dimensions the modal
properties of arbitrary two-component normal mixture can be fully examined by studying the
modality of mixture of two components, one of which is the standard normal in D dimensions.
8
Theorem 4. For an arbitrary mixture of two multivariate normals, the modality, of NM(µ1, Σ1, µ2, Σ2)D
is the same as that of NM(0, I,µ∗
2, Σ∗
2)D, where µ∗
2 = (Σ∗
2)1
2 Σ−
1
2
2 (µ2 −µ1), Σ∗
2 = Σ1
2
2 Σ−11 Σ
1
2
2 .
Proof. See Appendix
Remark 1. First note that the above transformation is not equivalent to the regular stan-
dardization for the first component alone. Using a regular standardization a single component
can be transformed to a standard normal but the resulting parameters of the second component
will lose its symmetry which is crucial for equating the curvature function of the two mix-
tures detailed in the proof of Theorem 4. Also, note that µ∗
2, Σ∗
2 in Theorem 4 is well-defined,
because the variance matrices Σ1 and Σ2 are both positive definite.
Note that the two components are interchangeable and the strategy is to scale the whole
mixture by the covariance of the component whose mean is translated to the origin. Before
moving on to the next result, we provide an application of Theorem 4. For easy visualization
we will use contour plots of a two dimensional mixture. This example will also serve the
purpose of providing a geometric intuition of the proof of Theorem 4. First, note that it
is easy to check that geometrically shifting the means of both the components by the same
vector is equivalent to changing the origin of the reference frame of the contour plot. This
implies that the modal features and hence the number of modes remain unchanged after
simple translation. So we concentrate on the changes of the contour plot strictly under the
operation of scaling defined in Theorem 4 by taking µ1 = 0.
Example 1. Consider the mixture density with the following parameters:
µ1 =
0
0
!
, Σ1 =
3.899 −4.691
−4.691 5.698
!
, µ2 =
4
−4
!
, Σ2 =
1.04 −0.3
−0.3 0.29
!
.
Applying the transformation defined in Theorem 4 the parameters of the two components afterscaling are given by:
µ∗
1 =
0
0
!
, Σ∗
1 =
1 0
0 1
!
, µ∗
2 =
4.272
−0.394
!
, Σ∗
2 =
18.80 4.743
4.743 1.25
!
.
Figure 1 gives the density contour plots before (left panel) and after (right panel) the trans-
formation and clearly though the contour shapes and the location of the modes have changed,
the number of modes and the number of saddle points remains unchanged.
Note that under the transformation both components are scaled, and in this example the
component centered at zero is scaled to have the identity covariance and the covariance of
the other component is scaled appropriately. This is easily visible from the contour plots of
in Figure 1 where the elongated elliptical component in the left panel with the origin as the
center is transformed into a spherical component with the same center. Of course the change
9
(a)
−4 −2 0 2 4
−4
−2
02
x
y
(b)
−2 0 2 4 6
−4
−2
02
4
xy
Figure 1: Contour plots for the bivariate normal mixture of Example 1 in (a) the original parameters
and (b) the transformed parameter.
in means and covariances of the components have changed the location of the three modes,
but as the theorem suggests the number of modes is strictly preserved between the mixtures.
The contour plots in Figure 1 are not available unless D = 2, so we provide an alternative
graphical display showing the invariance of modes. We compare the ridgeline elevation of the
two mixtures in Example 1. Recall that the ridgeline elevation for a two component mixtures
is simply the height of the mixture density along the ridgeline manifold defined in (1), but
it carries the full modality information for mixtures in any dimensions. Figure 2 displays
the ridgeline elevation plot before and after the transformation. Again note that though the
shape of elevation plots differ, the number of up-down oscillations of the curves in the left
and right panel in Figure 2 are exactly the same. In both cases the ridgeline elevation plot
confirms the presence of three modes.
3.2 Invariance of modality under rotation
By Theorem 4 the topography of any D dimensional mixture can be studied using mixtures
of the form NM(µ1 = 0, Σ1 = I,µ2, Σ2). But uncovering the topography, even when one
component has an arbitrary covariance matrix, is difficult. In this section we seek to provide
a further simplification, which will allow us to find the number of modes of an arbitrary
mixture by studying the modes of another mixture, one component of which is a standard
normal and the other component is a normal with diagonal covariance matrix.
Before we state the result, recall that the maximum number of modes of a two-component
10
(a)
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0.10
0.12
0.14
0.16
arc length
den
sity
(b)
0 1 2 3 4 5
0.05
50.
060
0.06
50.
070
0.07
50.
080
0.08
5
arc length
Figure 2: Ridgeline function with respect to the arc distance for the bivariate normal mixture of
Example 1 in (a) the original parameters and (b) the transformed parameter.
normal is uniquely defined by the number of roots between 0 and 1 of q(α) given in (4) and
for any mixture q(α) is uniquely defined by p(α). So we will first provide a simplification
of the expression for p(α) for mixtures of the form NM(0, I,µ2, Σ2)D and then state the
rotation invariance theorem.
Result 2. For mixture of the form NM(0, I,µ2, Σ2)D, the term p(α) in (3) can be expressed
in terms of the eigenvalues and eigenvectors of Σ2 in the following way:
p(α) =D∑
i=1
ci
[α(λi − 1) + 1]3, (5)
where ci = λi(µ′
2ξi)2, and λi’s and ξi’s are eigenvalues and corresponding eigenvectors of
matrix Σ2.
Proof. See Appendix.
We will now state the following property of invariance of mixture modality under rotation.
Theorem 5. The modality of mixture NM(0, I,µ2, Σ2)D, is the same as that of mixture
NM(0, I,µ0, Λ)D, with µT0 = (µ′
2ξ1, µ′
2ξ2, . . . ,µ′
2ξD) and Λ = diag(λ1, λ2, . . . , λD), where
(λi, ξi, i = 1, · · · , D) are the eigenvalue, eigenvector pairs of Σ2
Proof. Using µ0 and Λ in Result 2 it is easy to check that the p(α) of mixtures NM(0, I,µ2, Σ2)D
and NM(0, I,µ0, Λ)D have the same expression, hence the number of roots, which implies
that the two mixtures will have the same modality.
11
For illustration, we will now apply the rotation described in Theorem 5 to the scaled version
of Example 1 whose first component is a standard normal. Example 1 gives the numerical
values of the parameters after scaling and Figure 3 shows the contour plots of the mixtures
before and after rotation.
Example 2. (Continuation of Example 1) Applying the rotation transformation described inTheorem 5 on the mixture with parameters
µ1 =
0
0
!
, Σ1 =
1 0
0 1
!
, µ2 =
4.272
−0.394
!
, Σ2 =
18.80 4.743
4.743 1.25
!
,
we get the mixture with parameters
µ1 =
0
0
!
, Σ1 =
1 0
0 1
!
, µ0 =
4.472
−1
!
, Λ =
20 0
0 0.05
!
. (6)
The contour plot in Figure 3(a) depicts the unrotated mixture NM(0, I,µ2, Σ2), where as
the Figure 3(b) shows the contours of the rotated mixture NM(0, I,µ0, Λ).
Algebraically the rotation to achieve the diagonal covariance of the second component
is equivalent to using the orthonormal matrix P , whose columns are the eigenvectors of
covariance matrix Σ2, to rotate the random variable. In fact, in two dimensions it has a very
simple interpretation. We simply rotate the mixture contour around the origin (0, 0), such
that the major axis of the ellipse from contour of the second component is parallel to the
x-axis. This will automatically set the minor axis parallel to the y-axis resulting in a diagonal
covariance matrix of the second component (see Figure 3). Note that this rotation does not
affect the covariance matrix of the first component as it remains an identity matrix.
Finally we combine Theorem 4 and Theorem 5 to state the following corollary.
Corollary 1. The modality of any arbitrary mixture is equal to another mixture of the form
NM(0, I,µ0, Λ), where Λ is diagonal.
Proof. First apply Theorem 4 to scale any mixture to the form NM(0, I,µ, Σ) and then
apply Theorem 5 to rotate it to the form NM(0, I,µ0, Λ).
4 Number of modes of a two-component multivariate normal
mixture
In this section we will first focus our attention to exploring the modality of normal mixtures of
the simplified form NM(0, I,µ, Λ)D. We will restrict ourselves to this small class of mixtures
as we have already shown in Section 3 that the modality of any two-component normal
mixture is equivalent to the modality of a corresponding mixture of the form NM(0, I,µ, Λ)D.
12
(a)
−2 0 2 4 6
−4
−2
02
4
x
y
(b)
−2 0 2 4 6
−4
−2
02
4
x
y
Figure 3: Contour plots for the bivariate normal mixture of Example 2 in (a) before and (b) after
rotation.
First we will show that the maximum number of modes is a function of d, the number of
distinct diagonal entries in Λ, by first showing that the maximum number of modes is less
than or equal to (d + 1), and then by showing that the upper bound (d + 1) is achievable. It
is easy to check that d can be equal to the dimension D and thus we arrive at the final result
on the upper bound of the number of modes of an arbitrary D dimensional mixture.
4.1 Upper bound on the number of modes of a two-component normal
mixture
Recall that the number of modes can be directly enumerated using the number of solutions
of q(α) = 1−α(1−α)p(α) = 0 within the range [0,1]. Using the simplified form of p(α) given
in (5) for mixtures of the form NM(0, I,µ, Λ)D we can simplify q(α) as
q(α) = 1 − α(1 − α)D∑
i=1
ci
[α(λi − 1) + 1]3= 0,
where λi’s are the diagonal elements of Λ and ci = λiµ2i .
To find the roots of q(α), we first state the following Lemma.
Lemma 1. The number of solutions of
q(α) = 1 − α(1 − α)
D∑
i=1
ci
[α(λi − 1) + 1]3= 0,
13
where α ∈ [0, 1] is exactly equal to the number of non-negative solutions for the equation
q∗(t) = 1 − t(t + 1)D∑
i=1
ci
(t + λi)3= 0.
Proof. Define α =1
t + 1, then t ∈ [0,∞) corresponds to α ∈ [0, 1] and it is easy to check
q(α) = q∗(t).
This simple change of variable from α to t allows us to relate the number of modes to
the positive solutions of q∗(t) instead of the more difficult problem of finding solutions in
the restricted interval [0, 1] for q(α). This simplification will enable us to find the upper
bound of the number of modes and also allow us to recursively construct extra modes in
extra dimensions.
We will now use the mixture density given in (2) to illustrate the result in Lemma 1.
Example 3. (Continuation of Example 1 and 2) After scaling and rotation the modality ofExample 1 is equivalent to the mixture with parameters
µ1 =
0
0
!
, Σ1 =
1 0
0 1
!
, µ2 =
4.472
−1
!
, Σ2 =
20 0
0 0.05
!
.
For the above mixture
q(α) = 1 − α(1 − α)
[400
(19α + 1)3+
0.05
(−0.95α + 1)3
],
Using the change of variable α = 1t+1 we have
q∗(t) = 1 − t(t + 1)
[0.05
(t + 0.05)3+
400
(t + 20)3
].
Solving the equation q(α) = 0 the 4 solutions in the range [0,1] are
α1 = 0.0029474, α2 = 0.1391142, α3 = 0.8608858, α4 = 0.9970526;
while the equation q∗(t) = 0 also have 4 non-negative solutions, which are
t1 = 337.6199, t2 = 6.189139, t3 = 0.1615732, t4 = 0.00296281.
As a visual aid we have also presented the curves q(α) and q∗(t) along with their zero
crossing in Figure 4. As we are only interested in the positive solutions of q∗(t) we have
changed the axis of t to log(t) to accommodate the wide range of t. In fact the solutions for
Example 3 in log scale are symmetric and they are
log(t1) = 5.821, log(t2) = 1.822, log(t3) = −1.822, log(t4) = −5.821
14
(a)
0.0 0.2 0.4 0.6 0.8 1.0
−2.
0−
1.5
−1.
0−
0.5
0.0
0.5
1.0
α
q(α)
(b)
−6 −4 −2 0 2 4 6
−2.
0−
1.5
−1.
0−
0.5
0.0
0.5
log(t)q(
t)
Figure 4: Plots for (a) q(α) against α and (b) q∗(t) against log(t) for the mixture given in Example 3
Now we state the important result relating the number of non-negative solutions of
q∗(t) = 0, and hence the number of modes to the number of unique diagonal entries of Λ,
which equals to the number of distinct eigenvalues of Σ2.
Lemma 2. Consider mixtures of type NM(0, I,µ, Σ2)D. Suppose Σ2 has d (d ≤ D) distinct
eigenvalues, then irrespective of the value of µ there are at most 2d non-negative solutions
for the corresponding q∗(t) = 0.
Proof. Let the d distinct eigenvalues of Σ2 be λ1, · · · , λd. Let us denote the upper bound of
the number of real roots of q∗(t) by O and the lower bound of its negative roots by N . We are
interested in finding an upper bound for the non-negative roots, i.e O −N . We will calculate
the two bounds in two separate steps. Within each step we will consider two separate cases:
one where all the eigenvalues are distinct from 1 and the other where at least one of the d
distinct eigenvalues is equal to 1.
• Step 1. To enumerate the upper bound of the number of real roots of the rational
function q∗(t) we transform it to a polynomial function, whose roots are easier to enu-
merate.
Case 1: If λi 6= 1 for all i = 1, · · · , d the resulting multiplier for converting q∗(t) = 0 into
a polynomial equation will be∏d
i=1(t + λi)3 and as the highest order of the polynomial
q∗(t)∏d
i=1(t + λi)3 is 3d, we have O = 3d.
15
Case 2: On the other hand if λi = 1 for any one i ∈ {1, · · · , d} the resulting multiplier
for converting q∗(t) = 0 into a polynomial equation will beQ
d
i=1(t+λi)
3
(t+1) and the highest
order of the polynomial is q∗(t)Q
d
i=1(t+λi)
3
(t+1) will now be 3d − 1 giving O = 3d − 1.
Hence, the equation q(t) = 0 has at most O solutions, where
O =
3d if λi 6= 1,∀i ∈ {1, · · · , d};3d − 1 if λi = 1, for any one i ∈ {1, · · · , d}.
(7)
• Step 2. To find the lower bound on the number of negative roots we first note the
following
q∗(t) = 0
=⇒ 1
t(t + 1)=
D∑
i=1
ci
(t + λi)3
=⇒ 1
t=
1
t + 1+
D∑
i=1
ci
(t + λi)3
Thus the solutions to q∗(t) = 0 are equal to the crossing of the two curves1
t, and
r(t) =1
t + 1+
D∑
i=1
ci
(t + λi)3(see Figure 5 for an illustration). Let us denote the
right limit of a function f at point t, limx→t+ f(x) by f(x+). Similarly we denote
the left limit, limx→t− f(x) by f(x−). Notice that r(t) is a rational function and
ci ≥ 0, λi > 0. Thus for each i = 1, 2 . . . d we have a vertical asymptote i.e., r((−λi)+) =
+∞ and r((−λi)−) = −∞. Additionally we have r((−1)+) = +∞ and r((−1)−) = −∞.
[See the dashed lines representing the asymptotes in Figure 5] This implies that r(t)
will have several disjoint branches and those branches traveling from one negative to
its neighboring positive vertical asymptote have to cross the line y = 0 and hence the
curve 1/t at least once. Now we discuss the two distinct cases.
Case 1: If λi 6= 1 for all i = 1, · · · , d the graph of r(t) has d+1 asymptotes one each at
λ1, . . . , λd and 1. This gives rise to d + 2 disjoint branches among which d intermediate
branches will have at least one crossing with the curve1
t, which gives rise to at least d
negative roots of q∗(t) and hence N = d.
Case 2: On the other hand if λi = 1 for any one i ∈ {1, · · · , d} then there are only (d−1)
distinct eigenvalues different from 1, and the graph of r(t) now has (d + 1) branches,
among which the d−1 intermediate branches give rise to at least d−1 negative solutions
and hence N = d − 1.
16
Hence, the equation q(t) = 0 has at most N negative solutions, where
N =
d if λi 6= 1,∀i ∈ {1, · · · , d};d − 1 if λi = 1, for any one i ∈ {1, · · · , d}.
(8)
Combining the (7) and (8) we show that for both cases there can be at most (O−N ) = 2d
non-negative solutions for the equation q∗(t) = 0.
−10 −8 −6 −4 −2 0
−10
−5
05
10
t
1/t a
nd r
(t)
1
t
r(t) =1
t + 1+
1
(t + 2)3+
1
(t + 4)3+
1
(t + 8)3+
1
(t + 9)3
1
t
r(t) =1
t + 1+
1
(t + 2)3+
1
(t + 4)3+
1
(t + 8)3+
1
(t + 9)3
1
t
r(t) =1
t + 1+
1
(t + 2)3+
1
(t + 4)3+
1
(t + 8)3+
1
(t + 9)3
Figure 5: Plots showing the vertical asymptotes of r(t) = 1t+1 + 1
(t+2)3 + 1(t+4)3 + 1
(t+8)3 + 1(t+9)3 and
its crossing with the curve 1/t.
Finally we state the main theorem of this paper giving us the upper bound on the number
of modes of a mixture of two normal components.
Theorem 6. The number of modes of the normal mixture NM(µ1, Σ1, µ2, Σ2)D is at most
(d + 1), where d is the number of distinct eigenvalues of the matrix Σ∗
2 = Σ1/22 Σ−1
1 Σ1/22 and
hence the number of distinct eigenvalues of the matrix ratio of the covariance matrices Σ2
and Σ1 denoted by Σ−11 Σ2.
Proof. By Theorem 4 the modality of the mixture NM(µ1, Σ1, µ2, Σ2)D is the same as the
mixture NM(0, I,µ∗
2, Σ1
2
2 Σ−11 Σ
1
2
2 )D, where µ∗
2 is a vector of dimension D. Now using Lemma 2
17
we know that the corresponding q∗(t) and hence q(α) will have at most 2d roots. Finally,
using Result 1 we can show that NM(µ1, Σ1, µ2, Σ2)D has at most 2d2 + 1 = d + 1 modes.
To show the second part, note that if λ is an eigenvalue of the matrix Σ∗
2 = Σ1/22 Σ−1
1 Σ1/22 ,
then λ satisfies the equation: |Σ∗
2 − λI| = 0. On the other hand,
|Σ2Σ−11 − λI| = |Σ1/2
2 Σ∗
2Σ−1/22 − λI| = |Σ1/2
2 | · |Σ∗
2 − λI| · |Σ−1/22 | = |Σ∗
2 − λI|
Hence, λ is an eigenvalue of the matrix Σ∗
2 if and only if λ is an eigenvalue of the matrix
Σ2Σ−11 , which implies the second part of the Theorem.
Theorem 7. Any D dimensional normal mixture NM(µ1, Σ1, µ2, Σ2)D has at most D + 1
modes.
Proof. Σ∗
2 = Σ1/22 Σ−1
1 Σ1/22 , has D eigenvalues, hence d ≤ D. Using this inequality in Theo-
rem 6 completes the proof.
4.2 Existence of D + 1 modes in D dimensions
In this subsection we will show that it is always possible to find a mixture in any dimension
which will attain D + 1 modes. First we provide two examples for D=2 and D=3 where the
upper bound is achieved.
Remark 2. Example 1, with D = 2, and eigenvalues 20 and 0.05 achieves the upper bound
on the number of modes for a two dimensional mixtures.
Example 4. Consider the three dimensional example with 4 modes given in Ray and Lindsay
(2005) with the parameters being
µ1 =
0
0
0
, Σ1 =
1 0 0
0 1 0
0 0 .05
, µ2 =
1/√
2
2
1/√
2
, Σ2 =
.05 0 0
0 1 0
0 0 1
. (9)
A straightforward calculation based on Theorem 4 shows that Σ∗
2 has eigenvalues 0.05, 1 and
20, i.e., D = d = 3. This density mixture has 4 modes, which again achieves the upper bound
(D + 1).
Though we have come up with examples achieving the upper bound for two and three
components, it is not easy to come up with such pathological examples in higher dimensions.
Hence we will design a construction method which allows one to construct one extra mode
from each additional dimension. Starting from the fact that one can construct a mixture
with two modes in one dimension (or using the examples in D=2 and D=3) one can use the
18
recursive relation to construct the parameters of a mixture in D dimensions which will have
D + 1 modes.
Recall that Theorem 6 shows that in D dimensions the equation q∗(t) = 0, can have at
most 2D non-negative solutions, which in turn implies that the corresponding mixture can
achieve at most D + 1 modes. Therefore, to achieve one extra mode in D + 1 dimensions
we just need to choose the parameters of the mixture such that the corresponding q∗(t) = 0
achieves two extra non-negative solutions. The following Lemma provides the construction
method to find the two extra solution of q∗(t) = 0 starting from any dimension D.
Lemma 3. Let {(ci, λi), i = 1, 2, . . . , D} be such that the equation
y(t, D) = 1 − t(t + 1)D∑
i=1
ci
(t + λi)3= 0
has 2D non-negative solutions. Then one can always find a pair of scalars (cD+1, λD+1) such
that
y(t, D + 1) = 1 − t(t + 1)D+1∑
i=1
ci
(t + λi)3= 0
has 2D + 2 solutions.
Proof. Note that y(t, D) is the same as q∗(t) = 0 for D dimensions.
Since y(t, D) = 0 has 2D non-negative solutions, and y(0, D) and y(∞, D) are both
positive, y(t, D) changes sign 2D times in the positive axis of t. Let y(t, D) be positive at
points t0, t2, · · · , t2D = a, and negative at points t1, t3, · · · , t2D−1, such that
0 ≤ t0 < t1 < t2 < · · · < t2D−1 < t2D = a.
First we choose y0 > 0 such that y0(a + λ)3 < y(tj , D)(tj + λ)3 for j even, and for all
eigenvalues λ > 0. It can be verified that such an y0 always exists.
Then we choose t2D+1 > a such that1
t2D+1(t2D+1 + 1)<
y0
8, and then we choose λD+1 >
max{λ1, · · · , λD}, such thatt2D+1 + λD+1
a + λD+1< 2, which will ensure that
(t2D+1 + λD+1)3
t2d+1(t2D+1 + 1)(a + λD+1)3< y0 (10)
Now define cD+1 = y0(a + λD+1)3.
With the chosen pair of (cD+1, λD+1) we have
Y (tj) = y(tj , D) − cD+1
(tj + λD+1)3
> 0, for j even;
< 0, for j odd.
19
i.e., Y (t) = y(t, D) − cD+1
(t + λD+1)3has the same sign as y(t, d) at points t0, t1, · · · , t2D,
which means that Y (t) has 2D non-negative solutions which are all less than a = t2D.
On the other hand, we have
Y (t2D+1) = y(t2D+1, D) − cD+1
(t2D+1 + λD+1)3
<1
t2D+1(t2D+1 + 1)− cD+1
(t2D+1 + λD+1)3
<1
t2D+1(t2D+1 + 1)− y0(a + λD+1)
3
(t2D+1 + λD+1)3< 0
where the last inequality holds because of the inequality (10) . Hence Y (t) will be negative
at point t2D+1 > a, but limt→∞ Y (t) > 0 so Y (t) = y(t, D + 1) = 0 has two more solutions
than y(t, D) = 0, both of which are greater than a.
Remark 3. Note that the proof of the above theorem provides only one method of constructing
the two extra non-negative solutions. These solutions are not unique.
The following corollary provides the recursive construction method for constructing extra
modes when the dimension of mixture is increased by unity.
Corollary 2. If a mixture of two normals in D dimensions has D +1 modes one can choose
the parameters of the extra dimensions such that the resulting D +1 dimensional normal will
have D + 2 modes.
Proof. Use Theorem 4 and 5 to re-parametrize any mixture to the form NM (0, I, µ,Λ)D,
where µ = (µ1, . . . , µD), Λ = diag(λ1, . . . , λD) and then use Lemma 3 with ci = λiµ2i to
compute (cD+1, λD+1). The new mixture
NM (0, I, µ = (µ1, . . . , µD, µD+1), Λ = diag(λ1, . . . , λD, λD+1, ))D+1 ,
with µD+1 =√
λD+1/ci will have D + 2 modes.
We now apply the method described in Corollary 2 to construct a 4-dimensional example
with 5 modes, starting from the 3-dimensional case in Example 4.
Example 5. We first apply theorem 3 to transform the 3-dimensional normal mixture given
in (9) into the form NM(0, I,µ2, Λ)D=3, where
µ2 =
1/√
2
2√10
, Σ2 =
.05 0 0
0 1 0
0 0 20
. (11)
20
Σ2 has d = 3 eigenvalues: λ1 = 0.05, λ2 = 1, λ3 = 20, with corresponding ci’s given by
c1 = 0.025, c2 = 4, c3 = 200.
Note that the equation q∗(t) = y(t, 3) =1
t(t + 1)−
3∑
i=1
ci
(t + λi)3has 6 positive solutions:
0.00723058, 0.148304, 0.444807, 2.24817, 6.74291 and 138.301.
Now we take 0 < t0 = 0.005 < t1 = 0.1 < t2 = 0.3 < t3 = 1 < t4 = 3 < t5 = 30 < t6 =
200 = a such that y(t) is positive at points t0, t2, t4, t6, and negative at points t1, t3, t5.
Now choose y0 = 7×10−10, then y0(a+λ)3 < y(tj)(tj +λ)3 for all j even, and eigenvalues
λ. Now take t7 = 107000 > a = 200 such that1
t7(t7 + 1)<
y0
8.
Let λ4 = 120000, thent7 + λ4
a + λ4< 2. Let c4 = y0(a + λ4)
3 = 1215658, i.e., the last
component of the new 4-dimensional mean is µ4 =√
c4/λ4 = 3.182842.
This gives a 4-dimensional normal mixture NM(0, I,µnew2 , Σnew
2 )D=4, with
µnew2 =
1/√
2
2√10
3.182842
, Σnew
2 =
.05 0 0 0
0 1 0 0
0 0 20 0
0 0 0 120000
.
The corresponding equation
q∗(t) = 1 − t(t + 1)d∑
i=1
ci
(t + λi)3= 0
has eight positive solutions as following:
0.00723058, 0.148304, 0.444807, 2.24817, 6.74291, 138.304, 82616.8 and 799211.
which implies the existence of five modes.
Figure 6 shows the q∗(t) for the four dimensional example along with the eight non-
negative zero crossings. Among the eight crossings the two on the right are obtained using
the construction method in Corollary 2.
Remark 4. The construction process in Lemma 3 is designed to add two more positive so-
lutions to equation q∗(t) = 0, when the dimension is increased, by adding another term in
the summation, without perturbing the original non-negative solutions too much. In Exam-
ple 5 we started with six roots in three dimensions and constructed two extra roots in four
dimensions. Among the six roots the first five remained exactly the same as the original ones
(according to our precision), and the sixth one is only shifted by a small magnitude (0.001).
21
−5 0 5 10
−0.5
0.00.5
1.0
log(t)
q∗(t
)
Figure 6: Plots for q∗(t), which has eight positive roots, along with the zero crossing. Here q∗(t) is
plotted with respect to log(t) because of the big range of t.
Finally we state arrive at the main theorem of the paper, Theorem 1, which proves the
tightness of the bound given in Theorem 7, using the following argument
Proof of Theorem 1. The upper bound has already been shown in Theorem 7. To show that
this bound can be achieved we show the construction of mixtures with D + 1 modes in any
dimension. In one dimension two normals with equal variance will have two modes if the
distance between their means is more than two times the common standard deviation. Now
one can use Corollary 2 repeatedly to construct one extra mode per dimension resulting in
exactly D + 1 modes in D dimensions.
4.3 Special Cases
The result given in Theorem 1 is the most general modality theorem available for a two-
component normal mixture. Many previous modality results can be stated as special cases of
this generalized result. In the corollaries which follow we show that our modality result can
be used to duplicate some of the univariate and multivariate results found in the literature.
The study of the case when D = 1, i.e., the mixture of two univariate normals, can be
traced back to the early 20th century. For example, Helguero (1904) discussed the equal
variance case, and Robertson and Fryer (1969) discussed the unequal variance case, and they
both showed that there exists at most 2 modes for the univariate normal mixture. Note
that for both cases, the two variances are either equal or proportional to one another in one
dimension, and our result also shows that at most two modes are achievable. Some results
22
on the mixture of two higher-dimensional normals with equal or proportional variances have
also been developed later. A recent result from Ray and Lindsay (2005) shows that for any
dimension, a two-component normal mixture with proportional variances can have at most
two modes. Our result confirms the result from Ray and Lindsay (2005), however with a
different methodology.
Corollary 3. In any dimension the mixture of two normal components with equal or propor-
tional variance (Σ2 = cΣ1 for a scalar c > 0), can have at most two modes.
Proof. By Theorem 6 the maximum number of modes is one more than the number of distinct
eigenvalues, d of Σ∗
2 = Σ1/22 Σ−1
1 Σ1/22 . For the equal or proportional case
Σ∗
2 =
I if Σ2 = Σ1
cI if Σ2 = cΣ1
In both case all the eigenvalues are same, thus they can have at most two modes.
Now we discuss some of the examples stated in Ray and Lindsay (2005). Both the two
dimensional example with three modes with parameters given in Example 3 and the three
dimensional example in Example 4 with four modes were stated earlier as mere examples of
existence of more than two modes. But our results show that they actually achieve the upper
bound possible within their respective dimensions. Moreover the construction method of the
examples in Ray and Lindsay (2005) was not easily generalizable in higher dimensions, but
our construction algorithm described in Lemma 3 provides an easy strategy for constructing
such examples.
5 Conclusion and discussion
In this paper we have developed a powerful theory for understanding the topography of a
multivariate normal mixture model. The results on the upper bound are mainly focused on
the two-component case, where we can provide the clear upper bound of D + 1 for any D
dimensional normal mixtures. Moreover, for any dimension one can produce a mixture which
attains the upper bound. In this paper, we have also verified that the number of modes for a
two-component D-dimensional normal distribution mixture NM(µ1, Σ1, µ2, Σ2)D is bounded
above by the distinct eigenvalues of the ratio matrix Σ2Σ−11 , irrespective of the means.
In the process of doing this analysis, we have not discussed how these new bounds and
construction methods might be used for statistical purposes. We think that there is a wide
area of application for these results. Given a parameter structure one can easily estimate
the upper bound of the number of modes which might be enormous help for many cluster-
ing methods. The construction method might become handy for Bayesian prior elicitation.
23
Finally the results give us a clear understanding of the interplay of component means and
variances in shaping up the topography of mixtures which may be easily generalizable to
mixtures of other distributions.
We also note that there are still a number of open mathematical questions. For example,
mixture of T−distributions are often used as a robust alternative to mixtures of normals,
but there are no available results on the number of modes of the mixture of T ’s. One should
note that the contours of T and normal, which determine the number of modes displays very
similar topographical structure and so one might be able to borrow the results on topography
of normals for exploring the topography of T mixtures. In fact using this intuition one can
then easily generalize the results for any elliptical distribution.
Finally, our results on upper bound are mainly derived for K = 2. It would therefore
be useful to establish relationships between the modality structure of the pairs of densities
in a mixture and the overall modality of the entire mixture of K > 2 components. This
generalization becomes challenging even when K = 3 resulting in the ridgeline manifold of
two dimensions which may involve finding the roots of an equation of two variables.
Acknowledgments: We thank Dr. David Fried of the Department of Mathematics and
Statistics at Boston University for his assistance in solving the algebraic problems for this
paper.
A Proof of Theorems and Results
A.1 Proof of Theorem 4
We only need to compare if the function p(α) is same for the two mixtures NM(µ1, Σ1, µ2, Σ2)D
and NM(0, I,µ∗
2, Σ∗
2)D.
First note that for
Sα = αΣ−11 + αΣ−1
2 = Σ−1/22 (αΣ
1/22 Σ−1
1 Σ1/22 + αI)Σ
−1/22
. Thus
S−1α = Σ
1/22 (αΣ
1/22 Σ−1
1 Σ1/22 + αI)−1Σ
1/22 ,
which implies
Σ−1/22 S−1
α Σ−1/22 = (αΣ
1/22 Σ−1
1 Σ1/22 + αI)−1.
Now for the mixture NM(µ1, Σ1, µ2, Σ2)D,
p(α) = (µ2 − µ1)′Σ−1
1 S−1α Σ−1
2 S−1α Σ−1
2 S−1α Σ−1
1 (µ2 − µ1)
= (µ2 − µ1)′Σ−1
1 Σ1/22 (Σ
−1/22 S−1
α Σ−1/22 )3Σ
1/22 Σ−1
1 (µ2 − µ1)
= (µ2 − µ1)′Σ−1
1 Σ1/22 (αΣ
1/22 Σ−1
1 Σ1/22 + αI)−3Σ
1/22 Σ−1
1 (µ2 − µ1)
24
For the transformed mixture NM(0, I,µ∗
2, Σ∗
2)D,
p∗(α) = (µ∗
2)′(Σ∗
2)1/2(αΣ∗
2 + αI)−3(Σ∗
2)1/2µ∗
2
= (µ2 − µ1)′Σ−1
1 Σ1/22 (αΣ
1/22 Σ−1
1 Σ1/22 + αI)−3Σ
1/22 Σ−1
1 (µ2 − µ1)
(By substituting µ∗
2 = (Σ∗
2)1
2 Σ−
1
2
2 (µ2 − µ1) and Σ∗
2 = Σ1
2
2 Σ−11 Σ
1
2
2 .)
= p(α).
A.2 Proof of Result 2
Proof. Let (λi, ξi, i = 1, · · · , D) be the eigenvalue eigenvector pairs for Σ2. As Σ2 is a positive
definite matrix all λi > 0. Then the matrix (αΣ2 + αI) will have eigenvalue eigenvector pairs
given by (γi, ξi), where γi = αλi + α = α(λi − 1) + 1 > 0. Similarly (αΣ + αI)−1 will have
eigenvalues ξi with corresponding eigenvalues 1/γi and using the spectral decomposition of
matrices we can write.
(αΣ2 + αI)−1 =D∑
i=1
1
γiξiξ
′
i
Moreover, as the eigenvalues ξi’s are orthogonal and (αΣ + αI) is symmetric, we have
p(α) = µ′
2Σ1/22 (αΣ2 + αI)−3Σ
1/22 µ2
= µ′
2Σ1/22
{D∑
i=1
1
γiξiξ
′
i
}3
Σ1/22 µ2
= µ′
2Σ1/22
(D∑
i=1
1
γ3i
ξiξ′
i
)Σ
1/22 µ2
=D∑
i=1
1
γ3i
(µ′
2Σ1/22 ξi)
2
=D∑
i=1
1
[α(λi − 1) + 1]3(µ′
2Σ1/22 ξi)
2
=
D∑
i=1
ci
[α(λi − 1) + 1]3.
where ci = (µ′
2Σ1/22 ξi)
2 = λi(µ′
2ξi)2, as Σ
1/22 has eigenvalues
√λi with corresponding eigen-
vectors ξi.
References
J. Behboodian. On the modes of a mixture of two normal distributions. Technometrics, 12:131–139, 1970.
J. O. Berger. Statistical decision theory and Bayesian analysis. Springer Series in Statistics. Springer-Verlag,
New York, second edition, 1985. ISBN 0-387-96098-8.
25
M. A. Carreira-Perpinan and C. K. I. Williams. On the number of modes of a gaussian mixture. In Scale-Space
Methods in Computer Vision, Lecture Notes in Computer Science, volume 2695, pages 625–640. Springer-
Verlag, 2003.
J. Chen and X. Tan. Inference for multivariate normal mixtures. J. Multivariate Anal.,
100(7):1367–1383, 2009. ISSN 0047-259X. doi: 10.1016/j.jmva.2008.12.005. URL
http://dx.doi.org/10.1016/j.jmva.2008.12.005.
P. Coretto and C. Hennig. A simulation study to compare robust clustering methods based on mixtures.
Advances in Data Analysis and Classification, June 2010. ISSN 1862-5347. doi: 10.1007/s11634-010-0065-4.
URL http://dx.doi.org/10.1007/s11634-010-0065-4.
J. Dannemann and H. Holzmann. Likelihood ratio testing for hidden Markov models under non-standard
conditions. Scand. J. Statist., 35(2):309–321, 2008. ISSN 0303-6898. doi: 10.1111/j.1467-9469.2007.00587.x.
URL http://dx.doi.org/10.1111/j.1467-9469.2007.00587.x.
I. Eisenberger. Genesis of bimodal distributions. Technometrics, 6:357–363, 1964.
S. Fruhwirth-Schnatter. Finite mixture and Markov switching models. Springer Series in Statistics. Springer,
New York, 2006. ISBN 978-0-387-32909-3; 0-387-32909-9.
F. d. Helguero. Sui massimi delle curve dimorfiche,. Biometrika, 3:85–98, 1904.
C. Hennig. Ridgeline plot and clusterwise stability as tools for merging Gaussian mixture components. Clas-
sification as a tool for research. Springer, Berlin, accepted for publication, 2010a.
C. Hennig. Methods for merging gaussian mixture components. Advances in Data Analysis
and Classification, January 2010b. ISSN 1862-5347. doi: 10.1007/s11634-010-0058-3. URL
http://dx.doi.org/10.1007/s11634-010-0058-3.
H. Holzmann and S. Vollmer. A likelihood ratio test for bimodality in two-component mixtures with application
to regional income distribution in the EU. Advances in Statistical Analysis, 92(1):57–69, 2008.
I. Kakiuchi. Unimodality conditions of the distribution of a mixture of two distributions. Kobe University
Mathematics Seminar Notes, 9:315–32w5, 1981.
J. H. B. Kemperman. Mixtures with a limited number of modal intervals. The Annals of Statistics, 19:
2120–2144, 1991.
E. L. Lehmann and G. Casella. Theory of point estimation. Springer Texts in Statistics. Springer-Verlag, New
York, second edition, 1998. ISBN 0-387-98502-6.
J. Li, S. Ray, and B. G. Lindsay. A nonparametric statistical approach to clustering via mode identification.
J. Mach. Learn. Res., 8:1687–1723 (electronic), 2007. ISSN 1532-4435.
W. Li. A study of an active approach to speaker and task adaptation based on automatic analysis of vocabulary
confusability. PhD thesis, The University of Hong Kong, 2007.
B. G. Lindsay. The geometry of mixture likelihoods, Part II: The exponential family. The Annals of Statistics,
11:783–792, 1983.
B. G. Lindsay, M. Markatou, S. Ray, K. Yang, and S.-C. Chen. Quadratic distances on probabilities: a unified
foundation. Ann. Statist., 36(2):983–1006, 2008. ISSN 0090-5364.
26
G. McLachlan and D. Peel. Finite mixture models. Wiley Series in Probability and Statistics: Applied Prob-
ability and Statistics. Wiley-Interscience, New York, 2000. ISBN 0-471-00626-2. doi: 10.1002/0471721182.
URL http://dx.doi.org/10.1002/0471721182.
V. Melnykov and R. Maitra. Finite mixture models and model-based clustering. Statistics Surveys, 4:80–116,
2010.
S. Ray and B. G. Lindsay. The topography of multivariate normal mixtures. Ann.
Statist., 33(5):2042–2065, 2005. ISSN 0090-5364. doi: 10.1214/009053605000000417. URL
http://dx.doi.org/10.1214/009053605000000417.
S. Ray and B. G. Lindsay. Model selection in high dimensions: a quadratic-risk-based approach. Journal of the
Royal Statistical Society: Series B (Statistical Methodology), 70(1):95–118, 2008. doi: 10.1111/j.1467-9868.
2007.00623.x. URL http://www.blackwell-synergy.com/doi/abs/10.1111/j.1467-9868.2007.00623.x.
C. A. Robertson and J. G. Fryer. Some descriptive properties of normal mixtures. Skandinavisk Aktuarietid-
skrift, 69:137–146, 1969.
A. Scott et al. A POMDP framework for coordinated guidance of autonomous UAVs for multitarget tracking.
EURASIP Journal on Advances in Signal Processing, 2009, 2009.
G. Sfikas, C. Constantinopoulos, A. Likas, and N. Galatsanos. An analytic distance metric for Gaussian
mixture models with application in image retrieval. Artificial Neural Networks: Formal Models and Their
Applications-ICANN 2005, pages 835–840, 2005.
D. M. Titterington, A. F. M. Smith, and U. E. Makov. Statistical analysis of finite mixture distributions.
Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley &
Sons Ltd., Chichester, 1985. ISBN 0-471-90763-4.
Surajit Ray
Department of Mathematics and Statistics
Boston University
111 Cummington Street, Boston, MA 02215, USA
E-mail: sray@bu.edu
Dan Ren
Department of Mathematics and Statistics
Boston University
111 Cummington Street, Boston, MA 02215, USA
E-mail: dren@bu.edu
27