Necessary and sufficient conditions for …...Necessary and sufficient conditions for representing...
Transcript of Necessary and sufficient conditions for …...Necessary and sufficient conditions for representing...
Necessary and sufficient conditions for representing
general distributions by Coxians
Takayuki Osogami� Mor Harchol-Balter�
February 24, 2003
CMU-CS-02-178
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
Abstract
A common analytical technique involves using a Coxian distribution to model a general distribution�,where the Coxian distribution agrees with� on the first three moments. This technique is motivated bythe analytical tracability of the Coxian distribution. Algorithms for mapping an input distribution� to aCoxian distribution largely hinge on knowing a priori the necessary and sufficient number of stages in therepresentative Coxian distribution. In this paper, we formally characterize the set of distributions� whichare well-represented by an�-stage Coxian distribution, in the sense that the Coxian distribution matchesthe first three moments of�. We also discuss a few common, practical examples. Lastly, we derivea partial characterization of the set of busy period durations which are well-represented by an�-stageCoxian distribution.
�Carnegie Mellon University, Computer Science Department. Email: [email protected]�Carnegie Mellon University, Computer Science Department. Email: [email protected]. This work was supported by
NSF Career Grant CCR-0133077, by NSF ITR Grant 99-167 ANI-0081396 and by Spinnaker Networks via Pittsburgh DigitalGreenhouse Grant 01-1.
Keywords: moment matching, Coxian distribution, busy period, phase-type distribution, normalized
moment, queueing, matrix analytic
1 Introduction
Background
Approximating general distributions by phase-type (PH) distributions has significant application in the
analysis of stochastic processes. Many fundamental problems in queueing theory are hard to solve when
general distributions are allowed as inputs. For example, the waiting time for an M/G/c queue has no nice
closed formula when� � �, while the waiting time for an M/M/c queue is trivially solved. Tractability
of M/M/c queues is attributed to the memoryless property of the exponential distribution. A popular
approach to analyzing queueing systems involving a general distribution� is to approximate� by a PH
distribution. A PH distribution is a very general mixture of exponential distributions, as shown in Figure 1
[19]. The Markovian nature of the PH distribution frequently allows a Markov chain representation of the
queueing system. Once the system is represented by a Markov chain, this chain can often be solved by
matrix-analytic methods [20, 16], or other means.
When fitting a general distribution� to a PH distribution, it is common to look for a PH distribution
which matches the first three moments of�. In this paper, we say that:
Definition 1 A distribution � is well-representedby a distribution � if � and � agree on their first three
moments.
It has been shown that matching three moments is sufficient for accurate modeling of many computer
systems [9, 23]. Matching fewer moments is less desirable since some queueing systems, e.g. the������
queue, have response times which are heavily dependent on the third moment of�� [31, 11].
Most existing algorithms for fitting a general distribution� to a PH distribution, restrict their attention
to a subset of PH distributions, since general PH distributions have so many parameters that it is difficult
to find time-efficient algorithms for fitting to them [30, 13, 12, 26, 18]. The most commonly chosen
subset is the class of Coxian distributions, shown in Figure 2. Coxian distributions have the advantage
of being much simpler than general PH distributions, while including a large subset of PH distributions
without needing additional stages. For example, for any acyclic PH distribution�, there exists a Coxian
distribution� with the same number of stages such that� and� have the same distribution function
[5]. In this paper we will restrict our attention to Coxian distributions.
Motivation and Goal
When finding a Coxian distribution which well-represents a given distribution�, it is desirable that
beminimal, i.e., the number of stages in be as small as possible. This is important because it minimizes
the additional states necessary in the resulting Markov chain for the queueing system. Unfortunately, it is
1
1µ 2µ 3µ 4µ p40
p13
p12 p23 p34
p43p32
p42p31
p21
p10
p20
p30
p01
p02
p03
p04
p00
p41
p14
p24
Exp Exp Exp Exp
Figure 1: A 4-stage PH distribution. There are � � � states, where the �th state has exponentially-distributed service time with rate ��. With probability �� we start in the �th state, and the next state isstate � with probability �� . Each state � has probability �� that it will be the last state. The value of thedistribution is the sum of the times spent in each of the states.
p p p2 3µ µ µ1 2 nn1p
11−p 2 31−p 1−p
Figure 2: An �-stage Coxian distribution. Observe the recursive definition: with probability � � �, thevalue is zero, and with probability �, the value is an exponential random variable with rate �� followedby an ��� ��-stage Coxian distribution.
not known what is the minimal number of phases necessary to represent a given distribution� by a Coxian
distribution. This makes it difficult to evaluate the effectiveness of different algorithms and also makes the
design of fitting algorithms open-ended.
Theprimary goal of this paper is to characterize the set of distributions which are well-represented by
an�-stage Coxian distribution, for each� � �� �� �� � � �.
Definition 2 Let ���� denote the set of distributions that are well-represented by an �-stage Coxian dis-
tribution for positive integer �.
Our characterization of������ � � �� will allow one to determine, for any distribution�, the minimal
number of stages that are needed to well-represent� by a Coxian distribution.� Such a characterization
will be a useful guideline for designing algorithms which fit general distributions to Coxian distributions.
Another application of this characterization is that some existing fitting algorithms, such as Johnson and
Taaffe’s nonlinear programming approach [13], require knowing the number of stages� in the minimal
�One might initially argue that����, the set of distributions well-represented by a two-stage Coxian distribution, shouldinclude all distributions, since a 2-stage Coxian distribution has four parameters (��, ��, ��, ��), whereas we only need to matchthree moments of�. A simple counter example shows this argument to be false: Let� be a distribution whose first three momentsare 1, 2, and 12. The system of equations gives two solutions for parameters (��, ��, ��) as functions of��. However, in bothsolutions, one of�� and�� is �� � �� �
����� � ��� � ����, which is not positive for all possible choices of��.
2
Coxian distribution. The current approach involves simply iterating over all choices for� [13], whereas
our characterization would immediately specify�.
A secondary goal of this paper is to specify the necessary and sufficient number of stages needed to
well-represent busy period durations by Coxian distributions. Fitting busy period durations to Coxian dis-
tributions has become relevant recently in the solution of common computer systems problems involving
cycle stealing, see [9, 23]. In [9, 23], transitions between states in a Markov chain represent busy period
durations, which are modeled via Coxian distributions for tractability. In addition to standard busy periods,
it is also common to model the busy period started by� jobs. This paper will specify the number of stages
needed to well-represent such busy periods by Coxian distributions.
Providing sufficient and necessary conditions for a distribution to be in���� does not always imme-
diately give one a sense ofwhich distributions satisfy those conditions, or of the magnitude of the set of
distributions which satisfy the condition. Athird goal of this paper is to provide examples of practical
distributions which are included in���� for particular integers�.
In finding simple characterizations of����, it will be very helpful to start by defining an alternative to
the standard moments, which we refer to asnormalized moments.
Definition 3 Let� be a distribution and ���� be the �-th moment of � for � � �� �� �. The normalized
second moment�� of � and the normalized third moment�� of � are defined to be
�� �����
�������� �� �
����
��������
Notice the correspondence to the squared coefficient of variability,�, and the skewness factor,�: �� �
� � and�� � ����.
Relevant previous work
All prior work on characterizing���� has focused only on characterizing�����
, where�����
is the set
of distributions which are well-represented by a 2-stage Coxian distribution, where the definition of the
two-stage Coxian distribution used is more restrictive than our definition – specifically, there is no mass
probability at zero, i.e. � � �. Observe����� � ����. Altiok [2] showed a sufficient condition for
a distribution� to be in�����
. More recently, Telek and Heindl [29] expanded Altiok’s condition and
proved the necessary and sufficient condition for a distribution� to be in�����
. While neither Altiok nor
Telek and Heindl expressed these conditions in terms of normalized moments, the results can be expressed
much simpler with our normalized moments, as shown in Theorem 1. In this paper, we extend these results
to characterize����, as well as characterizing����, for all integers� � �.
3
Our results
While the goal of the paper is to characterize the set����, this characterization turns out to be ugly. One
of the key ideas in the paper is that there is a set����� � ���� which is very close to���� in size, such
that����� has a very simple specification via normalized moments. Thus, much of the proofs in this paper
revolve around����� .
Definition 4 For integers � � �, let ����� denote the set of distributions with the following property on
their normalized moments: �� and��:
�� ��
�� ��� �� � � �
� ���� (1)
The main contribution of this paper is a derivation of the nested relationship between����� and���� for
all � � �. This relationship is illustrated in Figure 3 and proven in Section 3. There are three points to
observe: (i)���� is a proper subset of������ for all integers� � �, and likewise����� is a proper subset
of ������� ; (ii) ����� is contained in���� and close to���� in size; providing a simple characterization
for ����; (iii) ���� is almost contained in������� for all integers� � � (more precisely, we will show
���� � ������� �����, where���� is the set of distributions well-represented by an Erlang-n distribution).
This result yields a necessary number and a sufficient number of stages for a given distribution to be
well-represented by a Coxian distribution. Additional contributions of the paper are described below:
With respect to the set����, we derive the exact necessary and sufficient condition for a distribution�
to be in���� as a function of the normalized moments of�. This extends the results of Telek and Heindl,
who analyzed�����
, which is a subset of����. (See Section 2).
We next investigate the fitting of M/G/1 busy periods by Coxian distributions. Let� denote the
duration of an M/G/1 busy period where� is an arbitrary distribution with finite third moment and where
the size of the job starting the busy period is in����� . We prove that any such� has distribution in����� .
This is surprising in that the number of stages which suffice to represent the busy period is determined
solely by thefirst job starting the busy period, which may be a simple setup cost, and it is not required to
consider the distributions of the other jobs in the busy period. Furthermore, let�� denote the duration of
an M/G/1 busy period where� is an arbitrary distribution with finite third moment and where the busy
period is started by� jobs with service time distribution in����� where� is the number of Poisson arrivals
during a random variable with distribution in����� . We prove that any such�� is in ����� . (See Section 4).
Lastly, we provide a few examples of common, practical distributions included in the set����� �
����. All distributions we consider have finite third moment. The Pareto distribution and the Bounded
Pareto distribution (as defined in [7]) have been shown to fit many recent measurement of job service
4
Sv(4)
S(3) Sv(3)
Sv(2)
S(4)
S(2)
Figure 3: The main contribution of this paper: a simple characterization of ���� by ����� . Solid lines
delineate ���� (which is irregular) and dashed lines delineate ����� (which is regular – has a simple spec-
ification). Observe the nested structure of ���� and ����� . ����� is close to ���� in size and is contained in
����. ���� is almost contained in ������� .
requirement in computing systems, including HTTP requests [3, 4], UNIX jobs [17, 8], and the duration
of FTP transfers [24]. We show that the Bounded Pareto with high variability is in����. We also provide
conditions under which the Pareto and uniform distributions are in����� for each� � �. (See Section 5).�
2 Full characterization of ����
The Telek and Heindl [29] result may be expressed in terms of normalized moments as follows:
Theorem 1 (Telek, Heindl) � � ����� iff � is in the following union of sets: �
���� � �� � �
��������
��
��� �� � ��� � ��
��� �
�� �� � �
� ���� � � �� ��
� ��
��� � �� � � � ��
��
�Our results show that thefirst three moments of the Bounded Pareto distribution are matched by a two-stage Coxian distri-bution and thefirst three moments os the Pareto distribution with high variability are matched by a Coxian distribution with asmall number of stages. Note however that this does not necessarily imply that theshape of these distributions is well-matchedby a Coxian distribution with few stages, since the tail of these distributions is not exponential. Recently, fitting theshape ofheavy-tailed distributions by phase-type distributions such as hyperexponential distributions has been studied [6, 28, 15].
�Throughout this paper,�conditions on normalized moments� denotes the set of distributions that satisfy the conditions. Forexample,
����� � �� � � � ��
�denotes set
�� �
���
� � ��� and� � ��
�
�.
5
Since only an outline of a proof is given in [29], we derive our own proof of Theorem 1 in [21] for
completeness. We now show a simpler characterization of����:
Theorem 2 � � ���� iff � is in the following union of sets:
��
��� � �� �
���� � ��
��� �
�� �� � �
� ������ � (2)
where recall ����� is the set:����� �� � � ��
�.
A summary of Theorems 1 and 2 is shown in Figure 4. Figure 4(a) illustrates how close���� and����� are
in size. Figure 4(b) shows the distributions which are in���� but not�����
.
m2
m3
23/2
3
2
S(2)V
S SV(2) (2)
m2
m3
23/2
3
2
S(2)
S(2)*
Figure 4:(a) The thick solid lines delineate ����. The dashed lines (striped region) show ����� � ����. (b)Again, the thick solid lines delineate ����. The shaded area shows the region ���� � ����� .
Proof:[Theorem 2] The theorem will be proved by reducing���� to �����
and employing Theorem 1. The
proof hinges on the following observation: An arbitrary distribution� � ���� iff � is well-represented by
some distribution�, where
� �
� � with probability�
with probability�� �
for some� � ����� . It therefore suffices to show that� is in set (2).
Let��� be the normalized�-th moment of� and��� be the normalized�-th moment of� for � � �� �.
Observe that���� � ���� for � � �� �� � and��� �
��
for � � �� �.
By Theorem 1, since� � ����� ,� is in the following union of sets:
�� � �� � �
�������
��
�� � � �� � ��
�� �
�� � � �
� ��� � � � � � ��
� ��
�� � � � � � �
��
6
Thus,� is in the following union of sets:�
�� � �� � �
����� ��
��
��� � � �� � ��
��� �
�� � �
�
����
� ��
� �
� ��
����
��
� � �� � �
� �
�
�(3)
We want to show that� is in set (2). To do this, we rewrite set (2) as��
��� � �� � ��� � ��
��� �
�� �� � �
� � ��
��� � �� � �
��� � � � ��
� � ��
��� � �� � � � ��
�(4)
Observe that (3) and (4) are now in similar forms. We now prove that set (3) is a subset of set (4), and set
(4) is a subset of set (3). The technical details are postponed to Appendix B, Lemma 9.1.
3 A characterization of ����
In this section, we prove that����� is contained in����, where����� is the set of distributions whose
normalized moments satisfy (1) and that���� is almost contained in������� . Figure 6 provides a graphical
view of the����� sets with respect to the normalized moments. We prove the following theorem:
Theorem 3 ����� � ���� � ������� �����, where ���� is the set of distributions that are well-represented
by an Erlang-� distribution for integers � � �.
An Erlang-� distribution refers to the distribution shown in Figure 5. Notice that the normalized moments
µ µµ
n stages
Figure 5:An Erlang-� distribution.
of distributions in����,� ���
� and� ���
� , satisfy the following conditions:
�����
� �� �
��� �����
� �� �
�� (5)
Theorem 3 tells us that���� is “sandwiched between”����� and������� . From Figure 6, we see that�����
and������� are quite close for higher�. Thus we have a very accurate representation of����. Theorem 3
follows from the next two lemmas:
��conditions on normalized moments in terms of�� denotes the set of distributions that satisfy the conditions for some�. Forexample,
����� � �� � �
�� ��
�denotes set
���� s.t. � � � � � and �
���
� � ��� and �
�� ��
�
�.
7
Sv(32)
Sv(4)
Sv(3)
Sv(2)
m2
3m
E3
E2
E31
13/24/3
3/2
2
1 2
3
Figure 6:Depiction of ����� sets for � � �� �� �� �� as a function of the normalized moments. The outermostdotted lines (�� � � and �� � ��) delineate the set of all the possible nonnegative distributions (thatis, any nonnegative distribution � satisfies ��� � � and ��
� � ��� ) [14]. ����� for � � �� �� �� �� are
delineated by dashed lines.
Lemma 3.1 ���� � ������� �����.
Lemma 3.2 ����� � ����.
Proof:[Lemma 3.1] The proof proceeds by induction. When� � �, the lemma follows from (1), (5), and
Theorem 2. Assume that���� � ������� ����� for � �� �. For any distribution� � ����, there exists
a�-stage Coxian distribution� by which� is well-represented, where� can be expressed as
� �
� � ���� ����������� �
���� ����������� �� ��
and where� is an exponential distribution and� is a�����-stage Coxian distribution. By the assumption
of induction,� � ����� �������. We prove that (i) if� � ����� , then� � ������� and (ii) if � � ������,
then� � ������� �����.
(i) Suppose� � ����� : We first prove that��� �
����
. First observe that
��� �
� �� � � ��
��� � ����
� �� � ����� ��
��� � ����
where the inequality follows from� � ����� . The derivative of the right hand side with respect to��� is
��� �� �� � ���
��� � ���� � ����
8
which is minimized when��� � � � �. Thus,
��� �
� �
��� � �
��
Next, we prove that��
��� ���
��� for all��� �
����
. Notice that��
��
is independent of :
���
���
��� �� � �� �� � ����� � ��
�� �� � � �����
Since��
��
is an increasing function of��� �, it is minimized at��� � � ������
�� � �
�� , since� � ����� .
Thus,
���
���
� �� � ������ ��� � ��� ��� �� ��� ��� �� �� �� ��� ����
�� ��� ��� �� � � ����� (6)
Denoting the r.h.s. of (6) by����� ��, we now find��� � which minimizes����� ��. Since
��� ���
� ���
�� � ����� �� � �� ���� � ���� ��� ��� ��� ��� � ��
�� ��� ��� �� � � ����
�
the infimum of����� �� occurs at:
��� � � ��
���� ����� �� ��� �
� ���� �� ���� ��� ���
� � ���� �
��
By evaluating��
��
at��� � � �������
�, we have
���
���
� �� � ������ ���� � ��� ��� ���� � ���� � ��� ����� � ��� �� ���� ��� ��
��� �� ���� � �� ��� � ��� � �� ���
� �
By Lemma 9.4 in Appendix B,��
��
� ������ . By evaluating
��
��
at
� �� ���� ��� ��� � ��
� �� � �� ���� � ���
we have��
�
���
� � ���� � �� �� ���� �� ���
���� ���� � ��� � �
� ��
where the last inequality holds iff��� ���� . However,��� ��
�� holds if
��� ����� �� ��� �
� ���� �� ���� ��� ��
�
� � ���� ��
9
since
��� ��� ��� � ��
� �� � �� ���� � ��� �
� � �� ��
�� ��� �� � ��� ����� ��� � ���� � �� � ��� �� ��� ���� � �� �
and���� � and
�
���
� �
��
��� ���� ������ � �� ���
�� ����
for � � �.
(ii) Suppose� � ������: We will prove that (a) if��� � �� � ����� and � �, then� � ����,
and (b) if��� �� �� � ����� or � �, then� � ������� : For part (a), observe that if� � ������,
��� � �� � �����, and � �, then we have already seen that��� � ����
in part (i). It is also easy to
see that��� � ���
�, and hence� � ����. For part (b), if��� �� �� � ����� or � �, then first notice
that��� �
����
, since��� is minimized when��� � �� � ����� and � �. Also, since
� �� �� �
��
� �
� ��
��
��� ���
��� by part (i), and hence� � ������� .
Proof:[Lemma 3.2] When� � �, the lemma follows from Theorem 2. The remainder of the proof
assumes� � �. We prove that for any distribution� � ����� , there exists an�-stage Coxian� such that
the normalized moments of� and� agree. Notice that the mean of� is easily matched to� without
changing the normalizing moments of� by multiplying a constant to the rates,��, ...,��, of �. The proof
consists of two parts: (i) the case when the normalized moments of� satisfy��� � ���� ��; (ii) the case
when the normalized moments of� satisfy��� ���� � �.
(i) Suppose� � ����� and��� � ���
� � �: We need to show that� is well-represented by some�-
stage Coxian distribution. We will prove something stronger: that� is well-represented by a distribution
� where� � � � , and� is a particular two-stage Coxian distribution with no mass probability at
zero and� is a particular Erlang-(� � �) distribution. (For the intuition behind this particular way of
representing�, please refer to [22]). The normalized moments of� are chosen as follows:
��� �
��� ��� ��� ��� ��
��� ��� ��� ��� ��
�
��� �
���� ����
� � ��� �� �
��� ����� � ��� ��
� ���
���
���� ������ � ��
����� �����
� �� � ����� ����� ��� ����� ��
�
���
�
10
The mean of� is chosen as follows:��� � ��� ������ � �����. It is easy to see that the normalized
moments of� and� agree:
��� �
��� �� ��
� ��
�� ���� ��
� �
��� �
��� �
�� ���
� � ���� �
� ��� �
�� �
�
���� �� ��
� ����� ��
� ��� �
where��� � ���
��� and��� � �
��� are the normalized moments of� , and� � �� �� . Finally, we will
show that there exists a two-stage Coxian distribution with no mass probability at zero, with normalized
moments��� and��
� : By Theorem 1, it suffices to show that��� � � and��
� � ���
�� . The first
condition,��� � �, can be shown using�
��� � ��� , which follows from� � ����� . It can also be shown
that��� � ���
� � � � ���
�� using �
��� � ��� and��
� � ���� � �, which is the assumption that we
made at the beginning of (i).
(ii) Suppose:� � ����� and��� ���
� � �: We again must show that� is well-represented by an
�-stage Coxian distribution. We will show that� is well-represented by a distribution�:
� �
� � with probability�
with probability�� ��
where � ���
� ���
and the normalized moments of� satisfy��� � ��� and��
� � ��� . It is
easy to see that the normalized moments of� and� agree. Therefore, it suffices to show that� is well-
represented by an�-stage Coxian distribution� , since then� is well represented by an�-stage Coxian
distribution�:
� �
� � with probability�
with probability�� ��
We will prove that� is well-represented by an�-stage Coxian distribution� � � � , where� is
a two-stage Coxian distribution with no mass probability at zero and� is an Erlang-(� � �) distribution.
The normalized moments of� are chosen as follows:
��� �
��� ��� ��� ��� ��
��� ��� ��� ��� ��
�� ��� � ���
� � ��
the mean of� is chosen as follows:��� � ��� ������ � �����. It is easy to see that the normalized
moments of� and� agree:
��� �
��� �� ��
� ��
�� ���� �
� �
��� �
��� �
�� ���
� � ���� �
� ��� �
�� �
�
���� �� ��
� ����� ��
� ��� � � � �
� �
11
where��� � ���
��� and��� � �
��� are the normalized moments of� , and� � �� �� . Finally, we will show
that there exists a two-stage Coxian distribution with normalized moments��� and��� : By Theorem 2,
it suffices to show that�� ��� , since
�
���
� � ��� � ���
� � � � ����� � ��
���
�
Since� � ����� , ��� ���
������ . Thus,��
� � ��
��� ����
����
�
� ����� Finally, ��
� � �� follows from
��� � ���
�.
4 A characterization of busy periods
In this section we characterize the set of� /�/1 busy period durations which are in����� , and hence in
����. As explained earlier, the tractability of queueing problems often relies not just on representing
general distributions by Coxian distributions, but also on representing busy period durations by Coxian
distributions [9, 23]. This includes busy periods started by� jobs, where� is the number of arrivals
during a period of time. This section provides sufficient conditions on the number of stages needed to
represent common types of busy period durations by Coxian distributions. Formally, we will prove the
following theorems:
Theorem 4 Let � denote the duration of an M/G/1 busy period where � is an arbitrary distribution with
finite third moment and where the job starting the busy period has size � ����� . Then, � � ����� .
The above theorem states that the number of stages which suffice for a busy period duration to be well-
represented by a Coxian distribution is, surprisingly, determined solely by the distribution of thefirst job
in the busy period.
Lemma 4.1 Let �� �!��� be the distribution of the sum of � i.i.d. random variables with distribution
� � ����� where � is the number of Poisson arrivals with rate " during a random time ! � ����� . Then,
�� �!��� � ����� .
The following theorem follows from Theorem 4 and Lemma 4.1.
Theorem 5 Let ��� denote the duration of an M/G/1 busy period where � is an arbitrary distribution
with finite third moment and where the busy period is started by �� �!��� as defined in Lemma 4.1. Then,
��� � ����� .
12
We will now prove Theorem 4 and Lemma 4.1.
Proof:[Theorem 4] When# � �, � � � ����� , and hence the theorem is true. In the following we
assume� � # � �. Let $� ��� � �� �� and$� ���
� �� �� . We prove that$� � ���� and$� � ���
�$�� � �.
Observe that together these imply the conditions in (1). Notice that the first three moments of� are
��� ����
�� ��
���� �����
��� ���
�
��� ����������
����
���� �����
��� ���
��
��� �����������
���
�
��� ����������
���
���
��� �������������
�������
It is easy to see that$� � ���� :
�� � �� �
�� ������
���� �� �
�
�� ��
where�� � ��� � �� �� and�� �
��� � �� �� . Next, we prove that$� � ���
�$�� � �. Note that
�� � �� ��
�� �����
���
���
�
�� ���
����
���
��
���
��� ������
����
���
��
�
where�� � ��� � �� �� and�� �
��� � �� �� . Thus,
�� �� �
���� � �� �
� �
����
�� �
�
�
�� �����
���
���
�
�� ���
����
���
��
��� �
�
��
��� ������
����
���
��
� �� �� �
���� � �
Proof:[Lemma 4.1] Let � ��� ������ � ��� ����� �� and � ��� ������
� ��� ����� �� . We prove that � � ���� and
� � ���� �� � �.
Notice that the Laplace transform of�� �!��� is ��� �!��� � �! �"��� ���%���. Thus, the first three
moments of���!��� are
�� ������ � ��� ��� ��
�� ������� � ���� ���� �� ��� ������
�� ������� � ���� ���� �� ����� ���� ����� ��� ������
13
It is easy to see that � � ���� :
�� � �� ��
��� �� �� �
�
�� ��
where&� � �� � � �� �� and'� �
��� � �� �� . Next, we prove that � � ���
� �� � �. Note that
�� � �� �������� �
��
���� ����
where&� � �� � � �� �� and'� �
��� � �� �� . Thus,
�� �� �
���� �
��� �
� �
����
�
��� �
� �
�
�������� �
�� � ���
����
���� ���� �
5 Examples of some common distributions in ����
In this section, we give examples of distributions in����� � ����, and hence are well-represented by an
�-stage Coxian distribution. A summary is shown in Figure 7.
5.1 Distributions in ����
It is well-known that for all two-phase PH distributions�, there exists a two phase Coxian distribution�
such that� and� has the same distribution function, and hence� � ����. In the following, we show that
the Bounded Pareto distributions with high variability are also in����.
A Bounded Pareto distribution has density function
��'� � ('���� ��
����
�� �� ' ��
where� � ( � �. Bounded Pareto distributions have been empirically shown to fit many recent mea-
surement of computing workloads. These include Unix process CPU requirements measured at Bellcore:
� ( ���� [17], Unix process CPU requirements measured at UC Berkeley:( � � [8], sizes of files
transferred through the Web:��� ( ��� [3, 4], sizes of files stored in Unix filesystems [10], I/O times
[25], sizes of FTP transfers in the Internet:�� ( ��� [24], and Pittsburgh Supercomputing Center
14
m2
3m
Sv(2)
Sv(8)
Sv(4)
13/24/3
3/2
2
1 2
3
PARETO
UNIFORM/TRIANGULAR
BP
Figure 7: A summary of the results in Section 5. A few particular distributions are shown in relation to
����� . BP refers to the class of bounded Pareto distributions with high variability described in Definition 5.
All of these are contained in ����� . UNIFORM refers to the class of all uniform distributions describedin Definition 6. We find that the larger the range of the UNIFORM distribution, the fewer the number ofstages that suffices. TRIANGULAR refers to the set of symmetric triangular distributions, described inDefinition 7. These interestingly have the same behavior as the UNIFORM distribution. Finally, PARETOrefers to the class of Pareto(() distributions with finite third moment, described in Definition 8. For thisclass, we find that the lower the value of the (-parameter, the fewer the number of stages that are needed.
workloads for distributed servers consisting of Cray C90 and Cray J90 machines [27]. In this section, we
prove the necessary and sufficient condition for a Bounded Pareto distribution to be in����� . Formally, we
use the following definition:
Definition 5 � is a set of Bounded Pareto distributions satisfying � � ( � � and ) �
is greater than
the maximum of the two lines shown in Figure 8.
0 0.5 1 1.5 20
50
100
150
200
α
min
imum
p/k
m2>2
m3/m
2 ≥ 4/3
Figure 8:The maximum of the two lines illustrates the lower bound needed on ) �
in the definition ofthe BP distribution. These lines are derived from (7) and (8).
With this definition, the theorem proven in this section is stated:
Theorem 6 � � ����� and �� ����� � �, where �� is the set of Bounded Pareto distributions not
15
in � and � is an empty set.
Proof: Let��� be the normalized�-th moment of a distribution� � � for � � �� �. When( � �, the
moments of the Bounded Pareto are
�� � ���
�� ���
�
�� ���� � ��� �� ���� �
�
����� ���
Thus,
��� �
�� � ���
���� ���� ��
� ��� � ���� ��
�� �� �� ��
���
���
��� �� �� �
��� � ��� (7)
When� � ( � � or � � ( � �, the moments of the Bounded Pareto are
�� � ��
�� �
����
� � �
�����
� � ���� ��
�� �
�����
� � ��
�����
� � �� ���� ��
�� �
�����
� � ��
�����
� �
Thus,
��� �
��� ���
���� ��
�� � ����� � � �
�� � � ���� ��
� ���� ����� ��
���� ��
�� � ����� � � �
�� � � ���� � � �� (8)
and
���
���
���� ���
��� ����� ��
�� � � ���� � � �
��� � � ���
By Lemmas 9.5-9.8 in Appendix B, both��� and �
�
��
are increasing functions of). This makes
intuitive sense since the higher moments are likely to increase as the upper bound (and thus)) increases.
Thus, the minimum) such that��� � � and�
�
��
� �� can be obtained numerically for all( if there
exists such a finite). In the following we prove that there is such an) for � � ( � �. When( � �,
it is easy to see that as) goes to infinity, both��� and ��
��
go to infinity. Thus, there is a finite) such
that��� � � and�
�
��
� �� . Next, consider the case with� � ( � � or � � ( � �. By observing that
� � ( � �, it is easy to see that��� goes to infinity as) does. Thus, there is a finite) such that��� � �.
Next, we consider��
��
. Observe that
��!���
���
���
�
�
��� ����� ���� � ��"� � � � � ��
� ��"� �� � � � ���
Thus, there is a finite) such that��
��� �
� for � � ( � �. When� � ( � �, a finite) gives��
��� �
� if
16
and only if ���������������� �
�� . However, since
��� ���
��� ����� ���
�
��� � ��� � ��
There is a finite) such that��
��
� �� for � � ( � �, too.
5.2 Distributions in ����
In this section, we give examples of distributions in����. It is known that for all acyclic PH distributions
�, there exists a Coxian distribution� with the same number of phases as� such that� and� have
the same distribution function [5]. Therefore, all the�-phase acyclic PH distributions are in����.
In the rest of this section, we discuss uniform distributions, symmetric triangular distributions, and
Pareto distributions. In particular, we derive the necessary and sufficient condition for these distributions
to be in����� . Formally, we use the following definitions:
Definition 6 *�+�,-��.� /� refers to the distribution with lower bound . and upper bound / � �
having density function ��'� � ���� in the region . ' / and zero otherwise.
Definition 7 !-+0��*10-�.� /� is the distribution with density functions of the form
���� �
�������
�����
���� �� �# � � � � ���
�
��
����
���� � �# ���� � � �
���"���$"�
where � . / and / � �.
Definition 8 0-�!,�(� is the distribution with density functions of the form ��'� � ('����, where
( � �.
With these definitions, the three distributions are formally characterized as follows:
Theorem 7 The normalized moments of *�+�,-��.� /� satisfy � �� �� and �� � � � �
�for
all � . / and / � �.
Theorem 8 The normalized moments of !-+0��*10-�.� /� satisfy � �� � and �� � � � �
�
for all � . / and / � �.
17
Theorem 9 The normalized moments of distributions in 0-�!,�(� satisfy
� � �� ��
��� �� �
����� ��� ���� � ��
������ � ��
�� ���
for all ( � �.
Simple consequences of the theorems are:
Corollary 1 *�+�,-��.� /� � ����� if and only if � � ��������������������������������� , where ) � �
�. In particu-
lar, � � � if . � � and / � �, and � � � for all � � . /.
Corollary 2 !-+0��*10-�.� /� � ����� if and only if � � ��������������������������������� , where ) � �
�. In
particular, � � � if . � � and / � �, and � � � for all � � . /.
Corollary 3 0-�!,�(� � ����� if and only if � � �(� ���. In particular, � � � for all ( � �.
Below we prove Theorem 7. Theorems 8 and 9 can be proved very similarly (see Appendix A).
Proof:[Theorem 7] The first three moments of� � *�+�,-��.� /� are��� � ����������� , ���
� ������������ , and���� � �����
������ . and the normalized second and third moments of� are
��� �
�
�
� � ��
�� ����� ��
� ��
�
� � �� ��
�� ���� � ����
where) � ��
. Since �����
� � ��
��������� � for � ) �,��
� is a nonincreasing function of). So, the
minimum of��� is given by evaluating it at) � � and the maximum is given by evaluating it at) � �.
Thus,� � ��� �
�� . Also, it is easy to see that��
� and��� satisfy��
� � �� ��
�.
6 Conclusion
The contribution of this paper is a characterization of the set���� of distributions� which are well-
represented by an�-stage Coxian distribution. Prior work has only analyzed����� � ����, and this char-
acterization was messy. We introduce several ideas which help in creating a simple formulation of����.
The first is the concept of normalized moments. The second is the notion of����� , a nearly complete subset
of ���� with an extremely simple representation. The arguments required in proving the above results have
an elegant structure which repeatedly makes use of the recursive nature of the Coxian distributions.
Our characterization of���� provides the minimum number of necessary phases and the sufficient
number of phases for a given distribution to be well-represented by a Coxian distribution, and these bounds
18
are nearly tight. This result has several practical uses: First, in designing algorithms which fit general
distributions to Coxian distributions (fitting algorithms), the goal is to come up with aminimal (fewest
number of stages) Coxian distribution. Our characterization allows algorithm designers to determine how
close their Coxian distribution is to the minimal Coxian distribution, and provides intuition for coming
up with improved algorithms. We have ourselves benefitted from exactly this point: In a companion
paper [22], we develop an algorithm for finding a minimal Coxian distribution that well-represents a given
distribution. We find that the simple characterization of���� provided herein is very useful in this task.
Our results are also useful as an input to some existing fitting algorithms, such as Johnson and Taaffe’s
nonlinear programming approach [13], which require knowing a priori the number of stages� in the
minimal Coxian distribution.
In addition to characterizing those distributions in����, we also consider which M/G/1 busy periods
have durations in����. We find that the number of stages which suffice for a busy period duration to be
well-represented by a Coxian distribution is, surprisingly, determined solely by the distribution of thefirst
job in the busy period. Furthermore we classify a few examples of common and practical distributions as
being subsets of���� for some�.
Future work includes a simple characterization of the set of distributions that are well-represented
by general�-phase PH distributions. It is known that the Erlang distribution has the lowest normalized
second moment among all the�-phase PH distributions [1]. However, a lower bound on the normalized
third moment of�-phase PH distributions is not known.
References
[1] D. Aldous and L. Shepp. The least variable phase type distribution is Erlang.Communications in Statistics -Stochastic Models, 3:467 – 473, 1987.
[2] T. Altiok. On the phase-type approximations of general distributions.IIE Transactions, 17:110 – 116, 1985.
[3] M. E. Crovella and A. Bestavros. Self-similarity in World Wide Web traffic: Evidence and possible causes.IEEE/ACM Transactions on Networking, 5(6):835–846, December 1997.
[4] M. E. Crovella, M. S. Taqqu, and A. Bestavros. Heavy-tailed probability distributions in the world wide web.In A Practical Guide To Heavy Tails, chapter 1, pages 1–23. Chapman & Hall, New York, 1998.
[5] A. Cumani. On the canonical representation of homogeneous markov processes modelling failure-time distri-butions.Microelectronics and Reliability, 22:583 – 602, 1982.
[6] A. Feldmann and W. Whitt. Fitting mixtures of exponentials to long-tail distributions to analyze networkperformance models.Performance Evaluation, 32:245–279, 1998.
[7] M. Harchol-Balter. Task assignment with unknown duration.Journal of the ACM, 49(2), 2002.
[8] M. Harchol-Balter and A. Downey. Exploiting process lifetime distributions for dynamic load balancing. InProceedings of SIGMETRICS ’96, pages 13–24, 1996.
[9] M. Harchol-Balter, C. Li, T. Osogami, A. Scheller-Wolf, and M. Squillante. Task assignment with cycle stealingunder central queue. InTo appear in Proceedings of 23rd International Conference on Distributed ComputingSystems (ICDCS ’03)., May 2003. For a copy see http://www.cs.cmu.edu/harchol/.
19
[10] G. Irlam. Unix file size survey - 1993. Available athttp://www.base.com/gordoni/ufs93.html,September 1994.
[11] M. A. Johnson and M. F. Taaffe. A graphical investigation of error bounds for moment-based queueing ap-proximations.Queueing Systems, 8:295–312, 1991.
[12] M. A. Johnson and M. R. Taaffe. Matching moments to phase distributions: Density function shapes.Commu-nications in Statistics — Stochastic Models, 6:283–306, 1990.
[13] M. A. Johnson and M. R. Taaffe. Matching moments to phase distributions: Nonlinear programming ap-proaches.Communications in Statistics — Stochastic Models, 6:259–281, 1990.
[14] S. Karlin and W. Studden.Tchebycheff Systems: With Applications in Analysis and Statistics. John Wiley andSons, 1966.
[15] R. E. A. Khayari, R. Sadre, and B. Haverkort. Fitting world-wide web request traces with the EM-algorithm. InProceedings of SPIE: Internet Performance and Control of Network Systems II, volume 4523, pages 211–220,2001.
[16] G. Latouche and V. Ramaswami.Introduction to Matrix Analytic Methods in Stochastic Modeling. ASA-SIAM,Philadelphia, 1999.
[17] W. E. Leland and T. J. Ott. Load-balancing heuristics and process behavior. InProceedings of Performanceand ACM Sigmetrics, pages 54–69, 1986.
[18] R. Marie. Calculating equilibrium probabilities for����!"�!�!� queues. InProceedings of Performance 1980,pages 117 – 125, 1980.
[19] M. F. Neuts.Matrix-Geometric Solutions in Stochastic Models. Johns Hopkins University Press, 1981.
[20] M. F. Neuts.Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach. The Johns HopkinsUniversity Press, 1981.
[21] T. Osogami and M. Harchol-Balter. Necessary and sufficient conditions for representing general distributionsby Coxians. Technical Report CMU-CS-02-178, School of Computer Science, Carnegie Mellon University,2002.
[22] T. Osogami and M. Harchol-Balter. A closed-form solution for mapping general distributions to minimal PHdistributions. Technical Report CMU-CS-03-114, School of Computer Science, Carnegie Mellon University,2003.
[23] T. Osogami, M. Harchol-Balter, and A. Scheller-Wolf. Analysis of cycle stealing with switching cost. InToappear in ACM Sigmetrics 2003 Conference on Measurement and Modeling of Computer Systems (SIGMET-RICS ’03)., June 2003. For a copy see http://www.cs.cmu.edu/harchol/.
[24] V. Paxson and S. Floyd. Wide-are traffic: The failure of poisson modeling.IEEE/ACM Transactions onNetworking, pages 226 – 244, June 1995.
[25] D. L. Peterson and D. B. Adams. Fractal patterns in DASD I/O traffic. InCMG Proceedings, December 1995.
[26] C. Sauer and K. Chandy. Approximate analysis of central server models.IBM Journal of Research andDevelopment, 19:301 – 313, 1975.
[27] B. Schroeder and M. Harchol-Balter. Evaluation of task assignment policies for supercomputing servers: Thecase for load unbalancing and fairness. InProceedings of HPDC 2000, pages 211–219, 2000.
[28] D. Starobinski and M. Sidi. Modeling and analysis of power-tail distributions via classical teletraffic methods.Queueing Systems, 36:243–267, 2000.
[29] M. Telek and A. Heindl. Moment bounds for acyclic discrete and continuous phase type distributions of secondorder. InProceedings of UK Performance Evaluation Workshop, UKPEW 2002, 2002.
[30] W. Whitt. Approximating a point process by a renewal process: Two basic methods.Operations Research,30:125 – 147, 1982.
[31] W. Whitt. On approximations for queues, iii: Mixtures of exponential distributions.AT&T Bell LaboratoriesTechnical Journal, 63:163 – 175, 1984.
20
A Proof of Theorems 8 and 9
Proof:[Theorem 8] The first three moments of� � !-+0��*10-�.� /� are
�� � � �
�� ���� �
� � � � ���
��� �� ���� �
� � � �� � �� ���
���
and the normalized second and third moments are
��� �
� �� ���
��� ����� ��
� ���� �� ��� ����
�� ���� �� �����
where) � ��
.Since �
����
� � ��
��������� � for � ) �, ��
� is a nonincreasing function of). So, the minimum
of ��� is given by evaluating it at) � � and the supremum is given by evaluating at) � �. Thus,
� ��� �
. Also, it is easy to see that��� and��
� satisfy��� � �� �
��
.
Proof:[Theorem 9] The first three moments of� � 0-�!,�(� are
�� � ��
�� �� ���� �
�
�� �� �� ���� �
�
�� ��
and the normalized second and third moments are
��� �
��� ���
��� � ���� ��
� ���� ����� ��
��� � ���
Since �����
� � � �������������� � � for ( � �, ��
� is a decreasing function of(. So, the supremum of
��� is given by evaluating it at( � � and the infimum is given by letting( � �. Thus,� � ��� �
�� .
Also, it is easy to see that��� and��
� satisfy
��� �
������ �� ���
� ����� � ��
���
� ���� � ��
�� ����
�
B Technical lemmas
Lemma 9.1 This lemma proves that (3) and (4) are equivalent sets.
Proof: Recall that set (3) is the union of the following three sets:
#� �
����� � �� �
���� ����
��
����� �� �
����� � ��
����� �
��� �� �
�
�
��
#� �
��� �
�
�� �� �
�
�
��
#� �
��
��� � �� �
�
�� ��
��
21
set (4) is the union of the following three sets:
�� �
��
��� �� ���� � ��
�� �
� �� �
��
�� �
��
��� �� �
��� � � ��
��
�� �
��
��� � �� � � ��
�
It suffices to prove that (i)0� � �� � ��, (ii) 0� � �� � ��, and (iii) 0� � ��. (ii) and (iii) areimmediate from definition. To prove (i), we prove that0� � �� ��� and�� ��� � 0�.
Consider a distribution� � 0�. We first show that� � �� ���. Let/� � be the upper bound of��� :
��� ������
� � ��
�����
�
and let.� � be the lower bound of��� :
���� �������
� � � ���� ���
� ���
�����
�
�
Then,/� � and.� � are both continuous and increasing functions of for ���
�
��
�
by Lemmas
9.2 and 9.3. When��� �, the range of is �
���
�. Thus,
�
���
� � �
��
����
�� ��
� � ��� �����
� � ��
���
�
and hence� � ��. When� � ��� , the range of is �
���
��
�
. Thus,
�
���
� � �
��
����
�� ��
� �
��
���
��
�
���
� �
and hence� � ��. Therefore,0� � �� ���. However, since/� � and.� � are continuous functions of ,��
� can take any value between the lower and upper bounds. Therefore,�� ��� � 0�.
Lemma 9.2 Let 2 � �. Then, �� � � ����
is an increasing function of for � � � ��
.
Proof: Note that� �� � � �����
� �. The inequality holds when� � � ��.
Lemma 9.3 Let 2 � �. Then, �� � � ������������ ���
is an increasing function of for ��� �
�.
Proof:
� ���� ��� �$��
��� ��� $����� $��
��
��
22
Let �� � � �� �2 ���� ��� 2 ���� 2 � �� . Then,
����� � �$
��
�
��� $��
��� $����
� �
�� �$
��
�
��� $��
��� �� �
��
� �
�� �$
��
�
��� ��
��� �� �
��
� �
��
The first inequality follows from��� and the second inequality follows from �
�. So,�� � is a
non-decreasing function of . Thus,��� � � � ����2� ����� �.
Lemma 9.4 Let � � � and � � �. Then,
���� �� ��� ��
���� ���� � ��� ��� ���� � ���� ��� ����� � ���� ���� ����
��� �� ���� � �� ��� � ��� ����
� � � �
� ��
Proof: Let
���� �� � �� ������ ���� � ��� ��� ���� � ���� ��� ����� � ���� ���� ����
��� ��
��� ������ � �� ��� � ��� ���
���� ��
� ���� �� ����� � ��� �� ��� ����� � �� �� �� � ��� � ���� ��� ���� � ����
��
We prove that���� �� � �. Let
3��� �� � �� �� ��������� �� ��� ������ �� �� ����������� ��� ���� � �����
It suffices to prove3��� �� � �. Observe that���������
� � iff � ��������������
�����
���������� , where
%��� � �� ��� ���� ���� ���� ��� ��
Notice that%��� � �� �� �� ����� (9)
Thus,� �� ��� ���
�%���
��� �� ���� � �� ��� ��� �� �� �� ���
��� �� ���� �
for � � �. Therefore,3��� �� is minimized when
� �� �� ��� ���
�%���
��� �� ���� (10)
Let
&��� � '
���
� �� ��� ��� �%���
��� �� ���
�
������ ��� ���� ���%���� %���
�� � ���� ���� ����� ������ ����� �������
���� �� �����
It suffices to prove���� � �. Let���� be the numerator of����. It suffices to prove���� � �. Notice
23
that���� � �. Thus, it suffices to prove����� � � for � � �.
����� ���%���
� ����
where
� ��� � ����� ���� ������ ������ ����� ������ ��� ��� �����%���
���� ���� ����� ���� ���� ����%���
� ����� ���� ������ ������ ����� ������ ��� ��� ������ �� �� ���
���� ���� ����� ���� ���� ����%���
� ������� �� ������� ������� ����� ������ ����� ���� �����
� �
Lemma 9.5 ��)� � ������
����� ���is an increasing function for ) � �.
Proof: Note that
� ���� �� � �
����� ������ �� �� �� �� �� �
Note that ��������� ��� � � for ) � �, and let��)� � �� �) �� )� ��� ). Then,��)� is positive for) � �,
since���� � �, ���)� � �� ��� ) � �, ����� � �, and����)� � ���
��� �.
Lemma 9.6 ��)� � ������ ��� ) is an increasing function for ) � �.
Proof: Note that
� ���� ��� � �� �� � � �
�� � �����
Note that �������� � � for ) � �, and let��)� � )� � �) ��� ) � �. Then,��)� is positive for) � �, since
���� � �, ���)� � �) � � ��� ) � �, ����� � �, and����)� � �������
� �.
Lemma 9.7 Let � � ( � � or � � ( � �. Then, ��)� � �������������������� is an increasing function for
) � �.
Proof: Note that
� ���� ��� � ���
�� � � ����� ��� �� ��� ��� � �� ��
�
Note that�� � ���
�� � � ��
�� ���"� � � � ��� ���"� � � � � ��
24
for ) � �. Let ��)� � () ��� (� �(� ��)� � ()���. Then,
����
�� ���"� � � � ��� ���"� � � � � ��
for ) � �, since���� � �, ���)� � ( �(� ���� (�)��� ��� (�� (�)���, ����� � �, and
������ � ��� � ����� ��� ���� � ����� ���"� � � � ��� ���"� � � � � ���
Lemma 9.8 Let ��)� � ��������������������� . If � � ( � �, ��)� is an increasing function for ) � �. If
� � ( � �, ��)� is a decreasing function for ) � �.
Proof: Note that
� ��)� ��) � ��)���
�)� � )�������� (��� � )����� �(� ���) � )����
��
Since ���������
�������� � � for ) � �, it is easy to see that���)� � � for ) � � when� � ( � �. To see that
� ��)� � � for ) � � when� � ( � �, we let
��)� � ���� (��� � )����� �(� ���) � )�����
Then,��)� � � for ) � �, since���� � �, ���)� � �� � (��( � ���)����� �( � ���� � �( � ��)����,����� � �, and
����)� � ��(� ���( � ���(� ���)����)����) � ���� � ����� � � ( � ��� � ����� � � ( � ���
25