TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of...
Transcript of TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of...
TIME SERIES OF CONTINUOUS PROPORTIONS
by
Gary K. GrunwaldAdrian RafteryPeter Guttorp
TECHNICAL REPORT No. 164
*(date: month and year)
DepartmentofStatistics, GN-ZZ
University of Washington
Seattle, Washington 98195 USA
Time Series of Continuous Proportions
Gary K. Grunwald
Department of StatisticsUniversity of Melbourne
Adrian E. Raftery
Departments of Statistics and SociologyUniversity of Washington
Peter Guttorp
Department of StatisticsUniversity of Washington
ABSTRACT
A vector of continuous proportions consists of the proportions ofsome total accounted for by its constituent components. An example isthe proportions of U.S. tax revenues from each of personal tax, corporate tax and social tax. We consider the situation where time seriesdata are available and where interest focuses on theprop(}.rti?~srath7r
thaIl.tbe ··actual aIl1ounts. Reasons foranaly:z;ingsul:h •• tiIl1eseriesinclude estimation of the underlying trend, estimation of the effect ofcovariates and interventions, and forecasting.
We develop a state space mode! for time. series of continuous proportions.Conditionally on the unobserved state, the observations areassumed to follow the Dirichlet distribution, often considered to be themost •. natural. distribtItion •. OIl. the siIl1ple;<.1'hestate> follows •theDirichlet-conjugate (DC) distribution which is .introduced here. Thusthe model, while based on the Dirichlet distribution, does not have itsrestrictive independence properties. The state transition distribution, or"state equation ", allows for the ccvariates, trends,seasonality and a model towork to the tax p.v'~"""'T"lI""
forecasting; data; dis-UL'JUI.LVJ,l. Recersive updating; State
Gary K. Grunwald is Lecrurer j Tempcrsry), Department of Statistics. of Melbourne, Parkville,Viaoria 3052, E. Raftery fa Associate of and SociQlogy,
WA Professor, ofWuhington, Seattle, WA 98195. The research of Grunwa14 and was
supported by National Seienee Foundation Grant no. SES-&615S41. The research of was partially supported by a cooperarrve agreement between the United States Environmental Prctecnon Agency and the So-cietal Institute for Mad:u:matica1 Sciences, The authors are grateful to Michael Kahn for comments.
- 2 -
1. INTRODUCTION
The U.S. Government receives money from three main sources: personal tax and
non-tax receipts (referred to as personal tax), corporate tax and social tax (contribu
tions for social insurance). Table 1 gives the annual amounts of receipts in each of
the three categories for the 40-year post-war period 1946-1985. The total has grown
very rapidly, and has obscured the relative changes in the three sources, which are
often of interest. The most natural way to study these is to consider the proportions
of the total accounted for by each source.
Table 1U. S. Federal Government Receipts
(billions of 1982 dollars)
Year Personal Corporate Social Year . Personal Corporate Social1946 17.3 16.5 5.6 1966 61.6 46.9 33.11947 19.7 18.4 5.1 1967 67.4 46.2 36.71948 18.9 19.7 4.6 1968 79.7 54.2 40.81949 16.2 17.7 4.9 1969 94.7 55.1 46.91950 18.2 26.0 6.0 1970 92.2 49.9 49.81951 26.1 31.1 7.0 1971 89.9 53.9 54.81952 31.0 29.0 7.5 1972 108.3 56.5 62.71953 32.2 30.3 7.4 1973 114.8 63.7 79.41954 28.9 26.7 8.2 1974 131.3 67.7 89.41955 31.4 31.7 9.4 1975 125.6 63.3 93.51956 35.2 32.2 10.6 1976 147.4 77.9 106.51957 37.5 32.3 12.2 1977 170.1 86.6 118.51958 36.8 29.4 12.4 1978 194.9 99.5 137.21959 40.0 34.9 14.9 1979 230.6 103.6 159.61960 43.7 34.9 17.6 1980· 257.6 109.3 173.91961 44.7 35.2 18.2 1981 298.7 121.8 204.01962 48.5 37.0 20.4 1982 304.4 97.21963 51.5 40.0 23.2 1983 294.5 113.01964 48.6 42.3 24.0 1984 309.3 131.71965 54.0 45.4 25.0 1985 345.6 129.6 311.6
Source: Survey of Current Business
Figure 1 shows. one common display of such information, the profile plot. This is
the time series version stacked bar graphs. The general trend is
While personal tax has consistently accounted for around 45% of the total, there has
been a trend in the trade-off between the other two sources. ~l;;JJ.,lUJ.U.15 "hr""1- 1952,
over to about
seven
to
presidential party affiliation, a war and several recessions. For analysis, the data are
best thought of as a time series of vectors, each of which has positive components
summing to 1. This sample space is called the simplex, and for three-component pro
portions a graph of the series in this two-dimensional set (Figure 2) is often useful.
Again, the trend described above is evident.
FigurelThe proportion o(U. S. government receipts
from thn:e sources.
1990
....
1980
. .......... ...
197019601950
... . .. .•
Social. ....... ....... ... ......
Personal
Corporate
0.....cod
c: <.0.2 dt:0a.
'<Te ..a- d
C\Id
0d
1940
Year
Figun: 2Simplex plot o( the proportion of U. S. government
n::ceipts from t.hteesources.
social
Reasons for analyzing time series of continuous proportions such as that in Fig
ure 1 include estimation of the underlying trend. assessment of the effects of covari
ates and interventions. and forecasting. For example, one may wish to produce a
predictive distribution of the composition of 1987 tax receipts, as a baseline for assess
ing the effect of the Tax Reform Act of 1986. Alternatively. an estimate of the trend
noted above, or of the effect of economic growth on the composition, may be of
interest.
Such time series arise in many other areasoLapplication. Examples include the
breakdown of household consumption by type of item in successive household budget
surveys (Aitchison. 1982). market shares in successive time periods. proportions of
time spent on different activities by individuals•. groups or animals in successive time
periods. and changes in the chemical composition of rock samples taken from succes
sively deeper layers, corresponding to more distant time (Chayes, 1971). Although
the data constitute a multivariate time series, standard techniquessu.ch as multivari
ateARIMA modeling (Tiao and Box. 1981) and Kalman filtering (Kalman. 1960) are
not applicable to it because the values at a single time-point are positive and con
strained to sum to one.
In this paper we develop a methodology for modeling,. forecasting. estimating
trends and seasonal effects. deseasonalizing, and assessing the effects of covariares and
interventions on time series of continuous proportions. We take a state-space model
ing approach based on recursive updating. Conditionally on the unobserved state. the
observations are assumed to follow distribution. This often considered to
be a natural distribution for continuous proportions. since its marginal distributions.
conditional distributions and the joint distributions of sums of components all have
as
not sutter a
limitation.
A major advantage of our method is the direct interpretability of the results in
terms of the original proportions. The approach is fairly easily implemented and most
of our results are exact. In the few cases where approximation is necessary we
obtained good results using the accurate approximations of Tierney and Kadane
(1986).
In Section 2 we present the model, in Section 3 we use it to analyze the tax data,
and in Section 4 we review the literature and discuss unresolved questions.
2. THE MODEL
In this section we review the Dirichlet distribution which describes the observa-
tions, We introduce and give some properties of the distribution conjugate to it, the
Dirichlet-conjugate (DC) distribution, which describes the state. We then define the
state space model, and show how it can incorporate covariates, trends, seasonality and
interventions. Finally, we consider forecasting, estimation, model checking and model
selection.
2.1 The Dirichlet distribution and the Dtrichlet-conjugate (DC) distribution
(2.1)
where aj> 0 for j = I, .. , , d+ 1, and
d+l a..-lp(y la) = IT Yj J D (a)-I,
I
(2.1),
Let y= (YI, ... ,Yd+I)T be a vector of continuous proportions, namely a vector
with positive components such that yT u= 1 where u= (1, ... r l)T is a (d+ l I-vecrcr
of ones. Then y follows the Dirichlet distribution if it has the density
In
d+ 1
=IT1
y- is d-
- 6 -
dimensional simplex Sd = {y E Rt+ 1 : yT u = l}.
We write (2.1) in exponential family form in the following way. Let v= logy,
v= vT ul (d+ 1) and z= v- V. We call z the vector of symmetric Iogratios, and we
write z= logit (y), in multivariate analogy with the usual univariate Iogir. Also, let
a= «t «, where t= aTu, so that y- D'irf t B). Then (2.1) becomes
p(z la,t) = exp {tzTa+ tv-IogD(ta)}. (2.2)
The sample space is Hd = {z E Rd+l: zT u= O} and the parameter space is
(a, t) E SdX R+. The purpose of this reparametrization is to separate, so far as is pos
sible, the effects of location a and spread t.
The moments of the proportion vector yare.
E [y I a, t ] = a,
Var[y la,t] = eaT I (t+l).
Thus a determines the location of the distribution of y in the simplex, and t affects
only the dispersion. By exponential family theory the moments of z are
E [z la,t] = '1'- 'PTul (d+l), (2.3)
is
(2.5)
is 9 Estate
p(alcr, lC,t) IX exp[cr{'tx:Ta-IogD(ta)}],
9-
family of conjugate prior distributions for a, conditional on t, is
Var[z la,t]= [('P,T u)(uuT)/(d+ I)+(d+ l)diag{'P'}- 'P'uT -u'P'T]/(d+ 1). (2.4)
In (2.3) and (2.4), '1'= '!'(ta) and '1"= '!"(ta) , where we adopt the convention that
v(w) = ('!'(wl), ... ,'!'(Wd+l»T and ,!,,(W) = ('!"(wl)' ... ,'!"(Wd+l»T, W being any
positive (d+ I)-vector, '!' the digamma function ,!,(w) = dlogr(w)ldw and '!" the tri
gamma function 'I"(w) = d,!,(w)1dw .
(2.6)
(o , te) e R+ x n-. Because 0 e s-, this is a distribution on the simplex which does
not appear to have been written down before. The mode, a, of the distribution (2.5)
satisfies the equation
'1'(1: a) - 'l'(1:)u = te.
Using (2.6), ais readily found by Newton-Raphson iteration.
If (2.2) and (2.5) hold we say that y follows the compound Dirichlet-conjugate-
Dirichlet (DCD) distribution, and we denote this situation by y- DCD(cr, x, 1:). This
is also a new distribution on the simplex. It follows from Theorem 2 of Diaconis and
Ylvisaker (1979) that if y - D CD (o , x, 1:), then
E [z I o , x, 1:] = x, (2.7)
so that lC determines the location of the DCD distribution, and o and 1: affect the
dispersion in different ways. The DCD distribution is a mixture of Dirichlet distribu
tions; 1: is a common dispersion parameter for the individual Dirichlet distributions
being mixed, while o is a dispersion parameter for the DC mixing distribution.
2.2 The state space model
Now we consider a. time series {Yt: t = 1, ... , T} of continuous proportions,
where Yt=(Ylt"",Yd+l,t)T (t=I, ... ,T). We shall model the symmetric
logratios Zt as defined in Section 2.1. The basic assumption of the state space model
is that there is an unobserved state at such that
(2.8)
Equation (2.8) is called the observation equation. The state {Ot} is assumed to evolve
time according to "<.'t,·"rlu model" of Smith (1979, 1981), U .......... V4'1
p(ar+l Dt0<: at n.:
Dt = .n., I £';2 1, It =
0< 1< 1).
and all other relevant information available at time t but not t - I}, and we denote by
Do all relevant information available at time t = O. The "state equation", or, more
accurately, the state transition density (2.9), has the property that the distribution of
(at+ 1 'Dt ) has mode unchanged from that of (at 'D t ) but has greater dispersion.
Smith (1979) argues that in the normal case there are many analogies between this
model and the random walk with observation error.
Knowledge about the state at given data is specified by the standard recursive
updating scheme, which follows from Bayes' theorem (e.g. Kitagawa (1987». It is
readily shown that this has the form
(2.10)
for s = t or t+ 1. The recursion starts with p (at 'Dt) and con sists of two. steps. The
first step, called the prediction step, consists of obtaining p(a t+ 1IDt ) from (2.9) and
(2.10). The second step, called the updating step, consists of obtaining p(at+l ID t + 1)
using (2.8) and Bayes' Theorem.
The prediction step consists of
0' t+ 1 It = 'Y 0' tit'
where the notation is defined by (2.10). The updating step is
0' t+ Ilt+ 1 = 0' t+ lit + 1,
1I::t+ 1It+ 1 = ( gt+ 1) 1I::t+ Ift + gt+ 1Zt+ 1>
(2.11)
(2.12)
(2.13)
(2.14)
where gt+ 1= 1/ (0' t+ Ilt+ 1) is analogous to the gain in the usual Kalman filter. In the
absence of specific prior information, the recursions conveniently initialized by
setting 0' = a on
ance
over time. This is analogous to the Gaussian Kalman filter, in which the variance of
the observation distribution, conditional on the state, is assumed to be constant over
time. Define a new model parameter ~ to be the value of 'tt when at is at the center,
ul (d+ 1), of the simplex. Let 0'+1 be the mode of p(at+ 1 ID t ) . Then by (2.4) and
(2.6) 'tt+ 1 is the solution of the equation
(2.15)
This one-dimensional equation is readily solved by Newton-Raphson iteration. While
it might initially seem that 'tt could simply be made constant over time, this does not
account for the relation between location and spread in equation (2.4).
2.3 Covariates, trends, seasonality and interventions
We incorporate independent variables by changing the location of the predictive
state distribution p(at+ 1 IDt) to take account of such information at time t+ 1. Let 0t
be the mode of p (at ID t). Then we define a new mode, Ot: l' for the predictive state
distribution by
A* A
g(at+ 1) = fBcattX t+1)' (2.16)
where x t + 1 is an z-vector of independent variables at time t+ land B is the matrix of
regression parameters. f and g are functions; g is similar to the link function in gen-
eralized linear models (McCullagh and Nelder, 1983).
Here we consider only a subset of the class of models defined by (2.16). This
consists of models which work with Ot+ 1 on the legit scale and treat the covariates
linearly, namely
A" A
logit at+ 1 = logit at + BXt+l, (2.17)
is cenneu as 1. matrrx B regression parameters is
x r, so as to ensure
also. The recursion is still given by equations (2.11)-(2.15) with the exception that,
using (2.6), lCt + lIt is now specified by
A *lCt+llt = V('ttOt+l)- V('tt)u, (2.18)
so (2.12) is replaced by (2.18).
For interpretation in terms of the original proportions, this prediction at time t+ I
can be written in terms of relative odds for two categories j and k as
(2.19)
As will be seen in Section 3, this gives an easily described interpretation of the effects
of independen t variables.
The model (2.17) can represent trends, seasonality and interventions as well as
covariates. For a constant linear trend (on the legit scale), x t = 1 (t = 1, ... , T).
Seasonal effects may be represented by a set of dummy variables, one for each sea-
son, or by a set of deterministic periodic functions such as sinusoids. Given an
estimated seasonal effect, s., at time t , a time series of continuous proportions may
be deseasonalized, for example, by forming the quantities logiC1(logitYt-st) and res-
caling so that they sum to unity. An intervention may be represented by a dummy
variable (Box and Tiao, 1975).
2.4 Forecasting, estimation, model checking and model comparison
(2.20)
unobserv-
seems avanaote,(
since it no ._.. ,..,_. conditions on
no a."<JL.I.YL.U... expression for
density is the basis for forecastin
state 0H i-
The predictive distribution of ZH 1 given the past is
we
Section 3), which is both fast and accurate in this case.
The external parameters, or hyperparameters, in the model are r, ~ and, if there
are independent variables, B. These are all included in Dr' Given data at times
t = 1, ... , T, these may be estimated by maximum likelihood. The log-likelihood is
TL (r,~,B) = 2: logp(zt IDt-l;r,~,B),
t= 1
and this can be maximized numerically. The log-likelihood is a smooth function of r,
~ and B providedthat B is written in terms of the rd independent parameters it con
tains, since each column is constrained to sum to zero. Thus, standard arguments
show that the maximum likelihood estimator is asymptotically normal with the usual
limiting distribution. Similar results have been established for the hyperparameters of
other, Gaussian, linear, state space models (e.g. Pagan, 1980; Los, 1984).
In order to compare models involving different covariates, we prefer to use an
approximation to the posterior odds as a measure of evidence. We do not use the
alternative approach of significance testing because the models are often non-nested
and multiple comparisons are involved. Suppose that we have models M i with covari
ates xP) of dimensions ri (i = 0,1). Then, given that the maximum likelihood esti
mators of the hyperparameters have the usual limiting distribution, the arguments of
Schwarz (1978) show that
where B 01 is the posterior odds for M 0 against il1 1, L, is the maximized log-likelihood
for M i (i = 0,1), and .! d~notes asymptotic equivalence in probability. If we are
one
= - +
is smallest. The rules of thumb of Jeffreys (1961, Appendix B) suggest that such a
preference should not be decisive unless the smallest value of BIC is exceeded by the
next smallest by at least 210g100 = 9.2.
The model can be checked by examining the residuals
(2.21)
by (2.7). Visual analysis of the residuals is important, and the use of dynamic interac
tive graphics is helpful (e.g., the spin command in S-Plus (SSI 1988), or the Data
Viewer (Hurley and Buja 1988». We shall give an example of residual analysis in the
next section.
3. THE TAX DATA
We now return to the tax data described in Section 1. The results of fitting
several of the models discussed in Section 2 to these data are shown in Table 2. For
the steady model of Section 2.2, the maximum likelihood estimators are 1 = 0.02 and
; =25,800. There is more information in the likelihood surface, which is a ridge
aligned roughly along the curve 1;= 516. In the analogous normal Kalman filter
steady model (random walk with observation error), "{ ~ represents the limiting fore
cast variance, and here this is well estimated. The very small value of 1 is near the
edge of the parameter space (0,1), indicating that the likelihood is trying to select a
model outside the class under consideration. In other words, the steady model does
not very well. Also, the small value of 1 gives a very fiat predictive distribution,
indicating that with this model the past data are not providing much information about
<lrp,u'lV moue! is not
more 3 31 as
(2.21), tends to be greater than zero, corresponding to the underprediction of social
tax by the model.
Table 2Maximum likelihood estimates and BlC
Model
Parameter Steadv Trend Covariate
r .02 .20 .151: 25800 3100 6500
13~.1 -.003 -.003
13CfNPD"'JI".l -.028 -.028
13:odaJ.1 + .032 + .032
13~.2 -.008
13CfNPD"'JJ"•2 + .013
13srxial.2 -.005
BlC -389.4 -393.6 -418.3
Figure 3Residuals against time for the steady model fit to the tax datil
tn tn-ci 0..III
III . . 0
0 . . c:ici . ....
0;... . . . 0;
::> ::> III
:2 ., !2 0... ... 9'" '"a:: III a::
0
9 III.. -9III III....9
1940 1950 19709
1960 1980 1990 1940 1950
Y...
..' ....
1960 1970 1980 1990
Vear
... •... .... .:g ••9
tn
~1940 1950 1960 1970 1980 1990
Year
The model incorporating a constant time trend does better, with the smaller BIC
indicating a significant improvement. Using (2.19), the quantitative information in
the parameter estimates can be described in terms of the odds as follows. The ratio of
personal to corporate tax has increased, on average, by a factor of about
exp(Bpersonal.1 - Bcorporate,l) = 1.026, or 2.6%, per year. Similarly, the ratio of the
social to personal shares has increased by about 3.5% per year and the ratio of the
social to corporate shares has increased by about 6.2% per year.
In addition to the underlying trend, it seems reasonable to assume that the gen
eral state of the U.S. economy might be a factor in accounting for the relative propor
tions of the three sources of federal receipts. Figure 4 shows a plot of the residuals
against the change in the rate of growth of the economy, as measured by V Gt , where
G, is the percent change in the Gross National Product (GNP)· in year t
(G, = 100(GNPt - GNPt _ 1) I GNPt - 1) , and V is the first difference operator, i.e.,
V G, = G, - Gt - 1• Figure 4 reveals a clear linear relationship between V G, and the
residuals.
Table 3Percent change in U. S. Gross National Product
(Constant Dollars)
Year Gr Year Gr Year Gt Year Gr I1946 -19.0 1956 2.1 1966 5.8 1976 4.9 I1947 -2.8 1957 1.7 1967 2.9 1977 4.71948 3.9 1958 -0.8 1968 4.2 1978 5.3 I1949 0.0 1959 5.8 1969 2.4 1979 2.5 [1950 8.5 1960 2.2
f
1970 -0.3 1980 -0.21951 10.3 1961 2.6 1971 2.8 1981 1.91952 3.9 1962 5.3 1972 5.0 1982 -2.51953 4.0 1963 4.1 1973 5.2 1983 3.51954 .-1.3 1964 5.3 1974 -0.5 1984 6.51955 5.6 1965 5.8 1975 -1.3 1985 2.3
Source: SlIrvpv of Current Business
Figure 4Residuals against change i1'l.1. GNP for the Ileady model fit
to the tax data
VG,
'"0
..,'" .'0; 0 .. .
::> ci . . .::!2 . " .:.'" .' '.'" .'c: .
'" ... . .q<;>
'" '"-0 0
'"0'" 00- 0 : .
N. - .0; '. . 0;::> . . ::> '"::!2 '. .'" . ::!2 0
'" . -,'" 9
'".
'"c: '". c:0
9 . . '". .9
!-
'" l:(l~ 9
-10 -5 a 6 10 15 20 -10
VG,
......,.' ... ' .., ......
-5 a
.....
5 10 15 20
'"ci• -10 -5 a 5 10 15 20
The inclusion of V G, as a covariate again gives a very significant improvement in
the model. Quantitatively, a one percent increase in the growth rate from the previ
ous year is associated with a decrease of about 2.1% in the ratio of the personal to cor
porate shares and an increase of about 1.8% in the ratio of the social to corporate
shares. The ratio of social to personal shares did not change appreciably. Thus, in a
year that is economically better than the preceding one (as measured by the percent
change in GNP), the share of the tax burden assumed by corporations increases. This
seems quite reasonable, since GNP is in some sense a measure of corporate activity.
In practice, of course, this information would not be available for future times and a
forecast of economic growth would be used instead for forecasting. Our main
interest, though, is in getting a good picture of trends and relations that have held
over the past, and the above modeling present such a picture.
4. DISCUSSION
- 17 -
To our knowledge, the only other approach to time series of continuous propor
tions to have been developed in any detail is that of Smith and Brunsdon (1986) and
Brunsdon (1986), building on the proposals of Aitchison and Shen (1980) and
Aitchison (1982, 1987). This involves applying multivariate ARIMA models (Box and
Jenkins, 1976; Tiao and Box, 1981) to Aitchison's (1982) (asymmetric) log ratios
log is« I Yd+l,t) (i = 1, ... ,d). In the univariate case (d= 1) there is no difficulty
about which category to use in the denominator. However, when d?!. 2 this may
present more problems, at least for the interpretation of the estimated parameters. A
possible solution, suggested by Aitchison (1983) in the context of principal com
ponents, is to work instead with the symmetric logratios Zt. This idea is developed in
Grunwald (1987, Section 4.2), where the symmetric logratios are modeled using the
normal Kalman filter rather than multivariate ARIMA models.
A full comparison between the approach of Smith and Brunsdon (1986), that of
Grunwald (1987, Section 4.2) and the one proposed in this paper has yet to be made.
However, the present approach appears to have the advantages of working with
models that are readily interpretable in terms of the underlying data generating
mechanism, and of yielding results that are easily interpretable in terms of the odds
ratios among the components of the composition. It is also based on the Dirichelet
distribution, which many consider to be more natural for compositions. The present
approach also may well prove more robust to outliers because the Dirichlet distribu
tion has thicker tails than the logistic-normal distribution. For example in the univari
ate case (d= 1). for large Iz I, the Dirichlet density (2.2) is logp(z) ee - Iz I, while in
the logistic-normal case logp(z) oe - z2. Thus, in contrast with what happens in the
usual unrestricted data case, we can perhaps both use a natural distribution and
achieve robustness-a pleasant surprise!
considered the use of the logistic transformation. However, these refer only to the
univariate case.
One difficulty which arises no matter what approach is used is the problem of
zeros. This is because in the Dirichlet distribution, as in the logistic-normal, an exact
zero in one of the categories is an event of probability zero. In applications, however,
one does encounter exact zeros, and they cannot be accomodated by any of the
methods discussed here. One possible solution is to allow a singular component of
the conditional distribution on the boundary of the simplex, with density proportional
to the (non-singular) limiting conditional density at the boundary. This idea, which
appears to be a refinement of an idea of Aitchison (1982, end of Section 7.4), may be
useful more generally for the analysis of continuous proportions with zeros, outside
the time series context.
One by-product of the present work is the development of two new distributions
on the simplex, the DC distribution (2.5) and the DCD distribution defined in Section
2.2. These are based on the Dirichlet distribution, but generalize it to allow for depen
dence between the components. They have yet to be studied in detail, but the general
idea of using mixtures of Dirichlet distributions does seem worth investigating,
perhaps in conjunction with other generalizations of the Dirichlet, such as that of
Aitchison (1985).
REFERENCES
Aitchison, J. (1982) The statistical analysis of compositional data (with Discussion).
Journal of the Royal Statistical Society B 44 139-177.
Aitchison, J. (1983) Principal component analysis compositional Biometrika 70
A on
R tatisticai soaetv B 47 1
to
l:!.n)?me'enll.1l 82D
new
Aitchison, J. (1986) The statistical analysis of compositional data. London: Chapman
and HalL
Aitchison, J. and Sherr, S. M. (1980) Logistic-normal distributions: some properties
and uses. Biometrika 67 261-272.
Azzalini, A. (1984) A Markov process with Beta marginal distribution. Statistica 44
241-243.
Box, G.E.P. and Jenkins, G.M. (1976) Time series analysis forecasting and control (2nd
ed.). San Francisco: Holden-Day.
Box, G .E.P. and Tiao, G.C. (1975) Intervention analysis with applications to economic
and environmental problems. Journal of the American Statistical Association 70 70
79.
Brunsdon, T.M. (1986) Time series of compositional data. Unpublished Ph.D. thesis,
University of Southampton.
Chayes, F. (1971) Ratio Correlation. Chicago: University of Chicago Press.
Grunwald, G.K. (1987). Time series models for continuous proportions. Unpublished
Ph.D. dissertation, Department of Statistics, University of Washington.
Harrison, P. J. and Stephens, C. F. (1971) A Bayesian approach to short-term fore
casting. Operations Research Quarterly 22 341-362.
Harrison, P. J. and Stephens, C. F. (1976) Bayesian forecasting. Journal of the Royal
Statistical Society B 38 205-247.
Hurley, C. and A. Buja (1988) Analyzing high-dimensional data with motion graphics.
Tech. Rep. STAT-88-03, Dept. of Statistics and Actuarial Science, University of
Waterloo.
.. 20-
Kitagawa, G. (1987) Non-Gaussian state-space modeling of nonstarionary time series
(with Discussion). Journal of the American Statistical Association 82 1032-1063.
Los, C. A. (1984). Econometrics of models with evolutionary parameter structures.
Unpublished Ph.D. dissertation, Dept. of Economics, Columbia University.
McCullagh, P. and Nelder, J.A. (1983). Generalized Linear Models. London: Chapman
and Hall.
McKenzie, E. (1985) An autoregressive process for Beta random variables. Manage
ment Science 31· 988-997.
Pagan, A. (1980) Some identification and estimation results for regression models
with stochastically varying coefficients. Journal of Econometrics 12 341-363.
Schwarz, G. (1978) Estimating the dimension of a model. Annals of Statistics 6 461
464.
Smith, 1. Q. (1979) A generalization of the Bayesian steady forecasting model. Jour
naliof. the Royal Statistical Society B 41 375-387.
Smith, J. Q. (1981) The multiparameter steady model, Journal of the Royal Statistical
Society B 43 256-260.
Smith, T. F. M. and Brundson, T. M. (1986) Time series methods for small areas.
Technical report, University of Southampton.
SSI (1988) S-Plus reference manual. Seattle: Statistical Science Inc.
Tiao, G .C. and Box, G .E.P. (1981) Modeling multiple series with applications.
Journal of the American Statistical Association 76 802-816.
Tierney, and Kadane, J. B. (1986) Accurate approximations for posterior moments
and marginal densities. Journal of the American Statistical Association 81 82-86.
(
- 21 -
West. M .• Harrison. P. J. and Migon, H. S. (1985) Dynamic generalized linear models
and Bayesian forecasting (with Discussion). Journal of the American Statistical
Association 80 73-97.