TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of...

22
TIME SERIES OF CONTINUOUS PROPORTIONS by Gary K. Grunwald Adrian Raftery Peter Guttorp TECHNICAL REPORT No. 164 *(date: month and year) DepartmentofStatistics, GN-ZZ University of Washington Seattle, Washington 98195 USA

Transcript of TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of...

Page 1: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

TIME SERIES OF CONTINUOUS PROPORTIONS

by

Gary K. GrunwaldAdrian RafteryPeter Guttorp

TECHNICAL REPORT No. 164

*(date: month and year)

DepartmentofStatistics, GN-ZZ

University of Washington

Seattle, Washington 98195 USA

Page 2: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

Time Series of Continuous Proportions

Gary K. Grunwald

Department of StatisticsUniversity of Melbourne

Adrian E. Raftery

Departments of Statistics and SociologyUniversity of Washington

Peter Guttorp

Department of StatisticsUniversity of Washington

ABSTRACT

A vector of continuous proportions consists of the proportions ofsome total accounted for by its constituent components. An example isthe proportions of U.S. tax revenues from each of personal tax, cor­porate tax and social tax. We consider the situation where time seriesdata are available and where interest focuses on theprop(}.rti?~srath7r

thaIl.tbe ··actual aIl1ounts. Reasons foranaly:z;ingsul:h •• tiIl1eseriesinclude estimation of the underlying trend, estimation of the effect ofcovariates and interventions, and forecasting.

We develop a state space mode! for time. series of continuous pro­portions.Conditionally on the unobserved state, the observations areassumed to follow the Dirichlet distribution, often considered to be themost •. natural. distribtItion •. OIl. the siIl1ple;<.1'hestate> follows •theDirichlet-conjugate (DC) distribution which is .introduced here. Thusthe model, while based on the Dirichlet distribution, does not have itsrestrictive independence properties. The state transition distribution, or"state equation ", allows for the ccvariates, trends,seasonality and a model towork to the tax p.v'~"""'T"lI""

forecasting; data; dis-UL'JUI.LVJ,l. Recersive updating; State

Gary K. Grunwald is Lecrurer j Tempcrsry), Department of Statistics. of Melbourne, Parkville,Viaoria 3052, E. Raftery fa Associate of and SociQlogy,

WA Professor, ofWuhington, Seattle, WA 98195. The research of Grunwa14 and was

supported by National Seienee Foundation Grant no. SES-&615S41. The research of was partially sup­ported by a cooperarrve agreement between the United States Environmental Prctecnon Agency and the So-cietal Institute for Mad:u:matica1 Sciences, The authors are grateful to Michael Kahn for comments.

Page 3: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

- 2 -

1. INTRODUCTION

The U.S. Government receives money from three main sources: personal tax and

non-tax receipts (referred to as personal tax), corporate tax and social tax (contribu­

tions for social insurance). Table 1 gives the annual amounts of receipts in each of

the three categories for the 40-year post-war period 1946-1985. The total has grown

very rapidly, and has obscured the relative changes in the three sources, which are

often of interest. The most natural way to study these is to consider the proportions

of the total accounted for by each source.

Table 1U. S. Federal Government Receipts

(billions of 1982 dollars)

Year Personal Corporate Social Year . Personal Corporate Social1946 17.3 16.5 5.6 1966 61.6 46.9 33.11947 19.7 18.4 5.1 1967 67.4 46.2 36.71948 18.9 19.7 4.6 1968 79.7 54.2 40.81949 16.2 17.7 4.9 1969 94.7 55.1 46.91950 18.2 26.0 6.0 1970 92.2 49.9 49.81951 26.1 31.1 7.0 1971 89.9 53.9 54.81952 31.0 29.0 7.5 1972 108.3 56.5 62.71953 32.2 30.3 7.4 1973 114.8 63.7 79.41954 28.9 26.7 8.2 1974 131.3 67.7 89.41955 31.4 31.7 9.4 1975 125.6 63.3 93.51956 35.2 32.2 10.6 1976 147.4 77.9 106.51957 37.5 32.3 12.2 1977 170.1 86.6 118.51958 36.8 29.4 12.4 1978 194.9 99.5 137.21959 40.0 34.9 14.9 1979 230.6 103.6 159.61960 43.7 34.9 17.6 1980· 257.6 109.3 173.91961 44.7 35.2 18.2 1981 298.7 121.8 204.01962 48.5 37.0 20.4 1982 304.4 97.21963 51.5 40.0 23.2 1983 294.5 113.01964 48.6 42.3 24.0 1984 309.3 131.71965 54.0 45.4 25.0 1985 345.6 129.6 311.6

Source: Survey of Current Business

Figure 1 shows. one common display of such information, the profile plot. This is

the time series version stacked bar graphs. The general trend is

While personal tax has consistently accounted for around 45% of the total, there has

been a trend in the trade-off between the other two sources. ~l;;JJ.,lUJ.U.15 "hr""1- 1952,

over to about

seven

to

Page 4: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

presidential party affiliation, a war and several recessions. For analysis, the data are

best thought of as a time series of vectors, each of which has positive components

summing to 1. This sample space is called the simplex, and for three-component pro­

portions a graph of the series in this two-dimensional set (Figure 2) is often useful.

Again, the trend described above is evident.

FigurelThe proportion o(U. S. government receipts

from thn:e sources.

1990

....

1980

. .......... ...

197019601950

... . .. .•

Social. ....... ....... ... ......

Personal

Corporate

0.....cod

c: <.0.2 dt:0a.

'<Te ..a- d

C\Id

0d

1940

Year

Figun: 2Simplex plot o( the proportion of U. S. government

n::ceipts from t.hteesources.

social

Page 5: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

Reasons for analyzing time series of continuous proportions such as that in Fig­

ure 1 include estimation of the underlying trend. assessment of the effects of covari­

ates and interventions. and forecasting. For example, one may wish to produce a

predictive distribution of the composition of 1987 tax receipts, as a baseline for assess­

ing the effect of the Tax Reform Act of 1986. Alternatively. an estimate of the trend

noted above, or of the effect of economic growth on the composition, may be of

interest.

Such time series arise in many other areasoLapplication. Examples include the

breakdown of household consumption by type of item in successive household budget

surveys (Aitchison. 1982). market shares in successive time periods. proportions of

time spent on different activities by individuals•. groups or animals in successive time

periods. and changes in the chemical composition of rock samples taken from succes­

sively deeper layers, corresponding to more distant time (Chayes, 1971). Although

the data constitute a multivariate time series, standard techniquessu.ch as multivari­

ateARIMA modeling (Tiao and Box. 1981) and Kalman filtering (Kalman. 1960) are

not applicable to it because the values at a single time-point are positive and con­

strained to sum to one.

In this paper we develop a methodology for modeling,. forecasting. estimating

trends and seasonal effects. deseasonalizing, and assessing the effects of covariares and

interventions on time series of continuous proportions. We take a state-space model­

ing approach based on recursive updating. Conditionally on the unobserved state. the

observations are assumed to follow distribution. This often considered to

be a natural distribution for continuous proportions. since its marginal distributions.

conditional distributions and the joint distributions of sums of components all have

as

not sutter a

Page 6: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

limitation.

A major advantage of our method is the direct interpretability of the results in

terms of the original proportions. The approach is fairly easily implemented and most

of our results are exact. In the few cases where approximation is necessary we

obtained good results using the accurate approximations of Tierney and Kadane

(1986).

In Section 2 we present the model, in Section 3 we use it to analyze the tax data,

and in Section 4 we review the literature and discuss unresolved questions.

2. THE MODEL

In this section we review the Dirichlet distribution which describes the observa-

tions, We introduce and give some properties of the distribution conjugate to it, the

Dirichlet-conjugate (DC) distribution, which describes the state. We then define the

state space model, and show how it can incorporate covariates, trends, seasonality and

interventions. Finally, we consider forecasting, estimation, model checking and model

selection.

2.1 The Dirichlet distribution and the Dtrichlet-conjugate (DC) distribution

(2.1)

where aj> 0 for j = I, .. , , d+ 1, and

d+l a..-lp(y la) = IT Yj J D (a)-I,

I

(2.1),

Let y= (YI, ... ,Yd+I)T be a vector of continuous proportions, namely a vector

with positive components such that yT u= 1 where u= (1, ... r l)T is a (d+ l I-vecrcr

of ones. Then y follows the Dirichlet distribution if it has the density

In

d+ 1

=IT1

y- is d-

Page 7: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

- 6 -

dimensional simplex Sd = {y E Rt+ 1 : yT u = l}.

We write (2.1) in exponential family form in the following way. Let v= logy,

v= vT ul (d+ 1) and z= v- V. We call z the vector of symmetric Iogratios, and we

write z= logit (y), in multivariate analogy with the usual univariate Iogir. Also, let

a= «t «, where t= aTu, so that y- D'irf t B). Then (2.1) becomes

p(z la,t) = exp {tzTa+ tv-IogD(ta)}. (2.2)

The sample space is Hd = {z E Rd+l: zT u= O} and the parameter space is

(a, t) E SdX R+. The purpose of this reparametrization is to separate, so far as is pos­

sible, the effects of location a and spread t.

The moments of the proportion vector yare.

E [y I a, t ] = a,

Var[y la,t] = eaT I (t+l).

Thus a determines the location of the distribution of y in the simplex, and t affects

only the dispersion. By exponential family theory the moments of z are

E [z la,t] = '1'- 'PTul (d+l), (2.3)

is

(2.5)

is 9 Estate

p(alcr, lC,t) IX exp[cr{'tx:Ta-IogD(ta)}],

9-

family of conjugate prior distributions for a, conditional on t, is

Var[z la,t]= [('P,T u)(uuT)/(d+ I)+(d+ l)diag{'P'}- 'P'uT -u'P'T]/(d+ 1). (2.4)

In (2.3) and (2.4), '1'= '!'(ta) and '1"= '!"(ta) , where we adopt the convention that

v(w) = ('!'(wl), ... ,'!'(Wd+l»T and ,!,,(W) = ('!"(wl)' ... ,'!"(Wd+l»T, W being any

positive (d+ I)-vector, '!' the digamma function ,!,(w) = dlogr(w)ldw and '!" the tri­

gamma function 'I"(w) = d,!,(w)1dw .

Page 8: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

(2.6)

(o , te) e R+ x n-. Because 0 e s-, this is a distribution on the simplex which does

not appear to have been written down before. The mode, a, of the distribution (2.5)

satisfies the equation

'1'(1: a) - 'l'(1:)u = te.

Using (2.6), ais readily found by Newton-Raphson iteration.

If (2.2) and (2.5) hold we say that y follows the compound Dirichlet-conjugate-

Dirichlet (DCD) distribution, and we denote this situation by y- DCD(cr, x, 1:). This

is also a new distribution on the simplex. It follows from Theorem 2 of Diaconis and

Ylvisaker (1979) that if y - D CD (o , x, 1:), then

E [z I o , x, 1:] = x, (2.7)

so that lC determines the location of the DCD distribution, and o and 1: affect the

dispersion in different ways. The DCD distribution is a mixture of Dirichlet distribu­

tions; 1: is a common dispersion parameter for the individual Dirichlet distributions

being mixed, while o is a dispersion parameter for the DC mixing distribution.

2.2 The state space model

Now we consider a. time series {Yt: t = 1, ... , T} of continuous proportions,

where Yt=(Ylt"",Yd+l,t)T (t=I, ... ,T). We shall model the symmetric

logratios Zt as defined in Section 2.1. The basic assumption of the state space model

is that there is an unobserved state at such that

(2.8)

Equation (2.8) is called the observation equation. The state {Ot} is assumed to evolve

time according to "<.'t,·"rlu model" of Smith (1979, 1981), U .......... V4'1

p(ar+l Dt0<: at n.:

Dt = .n., I £';2 1, It =

0< 1< 1).

Page 9: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

and all other relevant information available at time t but not t - I}, and we denote by

Do all relevant information available at time t = O. The "state equation", or, more

accurately, the state transition density (2.9), has the property that the distribution of

(at+ 1 'Dt ) has mode unchanged from that of (at 'D t ) but has greater dispersion.

Smith (1979) argues that in the normal case there are many analogies between this

model and the random walk with observation error.

Knowledge about the state at given data is specified by the standard recursive

updating scheme, which follows from Bayes' theorem (e.g. Kitagawa (1987». It is

readily shown that this has the form

(2.10)

for s = t or t+ 1. The recursion starts with p (at 'Dt) and con sists of two. steps. The

first step, called the prediction step, consists of obtaining p(a t+ 1IDt ) from (2.9) and

(2.10). The second step, called the updating step, consists of obtaining p(at+l ID t + 1)

using (2.8) and Bayes' Theorem.

The prediction step consists of

0' t+ 1 It = 'Y 0' tit'

where the notation is defined by (2.10). The updating step is

0' t+ Ilt+ 1 = 0' t+ lit + 1,

1I::t+ 1It+ 1 = ( gt+ 1) 1I::t+ Ift + gt+ 1Zt+ 1>

(2.11)

(2.12)

(2.13)

(2.14)

where gt+ 1= 1/ (0' t+ Ilt+ 1) is analogous to the gain in the usual Kalman filter. In the

absence of specific prior information, the recursions conveniently initialized by

setting 0' = a on

ance

Page 10: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

over time. This is analogous to the Gaussian Kalman filter, in which the variance of

the observation distribution, conditional on the state, is assumed to be constant over

time. Define a new model parameter ~ to be the value of 'tt when at is at the center,

ul (d+ 1), of the simplex. Let 0'+1 be the mode of p(at+ 1 ID t ) . Then by (2.4) and

(2.6) 'tt+ 1 is the solution of the equation

(2.15)

This one-dimensional equation is readily solved by Newton-Raphson iteration. While

it might initially seem that 'tt could simply be made constant over time, this does not

account for the relation between location and spread in equation (2.4).

2.3 Covariates, trends, seasonality and interventions

We incorporate independent variables by changing the location of the predictive

state distribution p(at+ 1 IDt) to take account of such information at time t+ 1. Let 0t

be the mode of p (at ID t). Then we define a new mode, Ot: l' for the predictive state

distribution by

A* A

g(at+ 1) = fBcattX t+1)' (2.16)

where x t + 1 is an z-vector of independent variables at time t+ land B is the matrix of

regression parameters. f and g are functions; g is similar to the link function in gen-

eralized linear models (McCullagh and Nelder, 1983).

Here we consider only a subset of the class of models defined by (2.16). This

consists of models which work with Ot+ 1 on the legit scale and treat the covariates

linearly, namely

A" A

logit at+ 1 = logit at + BXt+l, (2.17)

is cenneu as 1. matrrx B regression parameters is

x r, so as to ensure

Page 11: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

also. The recursion is still given by equations (2.11)-(2.15) with the exception that,

using (2.6), lCt + lIt is now specified by

A *lCt+llt = V('ttOt+l)- V('tt)u, (2.18)

so (2.12) is replaced by (2.18).

For interpretation in terms of the original proportions, this prediction at time t+ I

can be written in terms of relative odds for two categories j and k as

(2.19)

As will be seen in Section 3, this gives an easily described interpretation of the effects

of independen t variables.

The model (2.17) can represent trends, seasonality and interventions as well as

covariates. For a constant linear trend (on the legit scale), x t = 1 (t = 1, ... , T).

Seasonal effects may be represented by a set of dummy variables, one for each sea-

son, or by a set of deterministic periodic functions such as sinusoids. Given an

estimated seasonal effect, s., at time t , a time series of continuous proportions may

be deseasonalized, for example, by forming the quantities logiC1(logitYt-st) and res-

caling so that they sum to unity. An intervention may be represented by a dummy

variable (Box and Tiao, 1975).

2.4 Forecasting, estimation, model checking and model comparison

(2.20)

unobserv-

seems avanaote,(

since it no ._.. ,..,_. conditions on

no a."<JL.I.YL.U... expression for

density is the basis for forecastin

state 0H i-

The predictive distribution of ZH 1 given the past is

we

Page 12: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

Section 3), which is both fast and accurate in this case.

The external parameters, or hyperparameters, in the model are r, ~ and, if there

are independent variables, B. These are all included in Dr' Given data at times

t = 1, ... , T, these may be estimated by maximum likelihood. The log-likelihood is

TL (r,~,B) = 2: logp(zt IDt-l;r,~,B),

t= 1

and this can be maximized numerically. The log-likelihood is a smooth function of r,

~ and B providedthat B is written in terms of the rd independent parameters it con­

tains, since each column is constrained to sum to zero. Thus, standard arguments

show that the maximum likelihood estimator is asymptotically normal with the usual

limiting distribution. Similar results have been established for the hyperparameters of

other, Gaussian, linear, state space models (e.g. Pagan, 1980; Los, 1984).

In order to compare models involving different covariates, we prefer to use an

approximation to the posterior odds as a measure of evidence. We do not use the

alternative approach of significance testing because the models are often non-nested

and multiple comparisons are involved. Suppose that we have models M i with covari­

ates xP) of dimensions ri (i = 0,1). Then, given that the maximum likelihood esti­

mators of the hyperparameters have the usual limiting distribution, the arguments of

Schwarz (1978) show that

where B 01 is the posterior odds for M 0 against il1 1, L, is the maximized log-likelihood

for M i (i = 0,1), and .! d~notes asymptotic equivalence in probability. If we are

one

= - +

Page 13: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

is smallest. The rules of thumb of Jeffreys (1961, Appendix B) suggest that such a

preference should not be decisive unless the smallest value of BIC is exceeded by the

next smallest by at least 210g100 = 9.2.

The model can be checked by examining the residuals

(2.21)

by (2.7). Visual analysis of the residuals is important, and the use of dynamic interac­

tive graphics is helpful (e.g., the spin command in S-Plus (SSI 1988), or the Data­

Viewer (Hurley and Buja 1988». We shall give an example of residual analysis in the

next section.

3. THE TAX DATA

We now return to the tax data described in Section 1. The results of fitting

several of the models discussed in Section 2 to these data are shown in Table 2. For

the steady model of Section 2.2, the maximum likelihood estimators are 1 = 0.02 and

; =25,800. There is more information in the likelihood surface, which is a ridge

aligned roughly along the curve 1;= 516. In the analogous normal Kalman filter

steady model (random walk with observation error), "{ ~ represents the limiting fore­

cast variance, and here this is well estimated. The very small value of 1 is near the

edge of the parameter space (0,1), indicating that the likelihood is trying to select a

model outside the class under consideration. In other words, the steady model does

not very well. Also, the small value of 1 gives a very fiat predictive distribution,

indicating that with this model the past data are not providing much information about

<lrp,u'lV moue! is not

more 3 31 as

Page 14: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

(2.21), tends to be greater than zero, corresponding to the underprediction of social

tax by the model.

Table 2Maximum likelihood estimates and BlC

Model

Parameter Steadv Trend Covariate

r .02 .20 .151: 25800 3100 6500

13~.1 -.003 -.003

13CfNPD"'JI".l -.028 -.028

13:odaJ.1 + .032 + .032

13~.2 -.008

13CfNPD"'JJ"•2 + .013

13srxial.2 -.005

BlC -389.4 -393.6 -418.3

Figure 3Residuals against time for the steady model fit to the tax datil

tn tn-ci 0..III

III . . 0

0 . . c:ici . ....

0;... . . . 0;

::> ::> III

:2 ., !2 0... ... 9'" '"a:: III a::

0

9 III.. -9III III....9

1940 1950 19709

1960 1980 1990 1940 1950

Y...

..' ....

1960 1970 1980 1990

Vear

... •... .... .:g ••9

tn

~1940 1950 1960 1970 1980 1990

Year

Page 15: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

The model incorporating a constant time trend does better, with the smaller BIC

indicating a significant improvement. Using (2.19), the quantitative information in

the parameter estimates can be described in terms of the odds as follows. The ratio of

personal to corporate tax has increased, on average, by a factor of about

exp(Bpersonal.1 - Bcorporate,l) = 1.026, or 2.6%, per year. Similarly, the ratio of the

social to personal shares has increased by about 3.5% per year and the ratio of the

social to corporate shares has increased by about 6.2% per year.

In addition to the underlying trend, it seems reasonable to assume that the gen­

eral state of the U.S. economy might be a factor in accounting for the relative propor­

tions of the three sources of federal receipts. Figure 4 shows a plot of the residuals

against the change in the rate of growth of the economy, as measured by V Gt , where

G, is the percent change in the Gross National Product (GNP)· in year t

(G, = 100(GNPt - GNPt _ 1) I GNPt - 1) , and V is the first difference operator, i.e.,

V G, = G, - Gt - 1• Figure 4 reveals a clear linear relationship between V G, and the

residuals.

Table 3Percent change in U. S. Gross National Product

(Constant Dollars)

Year Gr Year Gr Year Gt Year Gr I1946 -19.0 1956 2.1 1966 5.8 1976 4.9 I1947 -2.8 1957 1.7 1967 2.9 1977 4.71948 3.9 1958 -0.8 1968 4.2 1978 5.3 I1949 0.0 1959 5.8 1969 2.4 1979 2.5 [1950 8.5 1960 2.2

f

1970 -0.3 1980 -0.21951 10.3 1961 2.6 1971 2.8 1981 1.91952 3.9 1962 5.3 1972 5.0 1982 -2.51953 4.0 1963 4.1 1973 5.2 1983 3.51954 .-1.3 1964 5.3 1974 -0.5 1984 6.51955 5.6 1965 5.8 1975 -1.3 1985 2.3

Source: SlIrvpv of Current Business

Page 16: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

Figure 4Residuals against change i1'l.1. GNP for the Ileady model fit

to the tax data

VG,

'"0

..,'" .'0; 0 .. .

::> ci . . .::!2 . " .:.'" .' '.'" .'c: .

'" ... . .q<;>

'" '"-0 0

'"0'" 00- 0 : .

N. - .0; '. . 0;::> . . ::> '"::!2 '. .'" . ::!2 0

'" . -,'" 9

'".

'"c: '". c:0

9 . . '". .9

!-

'" l:(l~ 9

-10 -5 a 6 10 15 20 -10

VG,

......,.' ... ' .., ......

-5 a

.....

5 10 15 20

'"ci• -10 -5 a 5 10 15 20

The inclusion of V G, as a covariate again gives a very significant improvement in

the model. Quantitatively, a one percent increase in the growth rate from the previ­

ous year is associated with a decrease of about 2.1% in the ratio of the personal to cor­

porate shares and an increase of about 1.8% in the ratio of the social to corporate

shares. The ratio of social to personal shares did not change appreciably. Thus, in a

year that is economically better than the preceding one (as measured by the percent

change in GNP), the share of the tax burden assumed by corporations increases. This

seems quite reasonable, since GNP is in some sense a measure of corporate activity.

In practice, of course, this information would not be available for future times and a

forecast of economic growth would be used instead for forecasting. Our main

interest, though, is in getting a good picture of trends and relations that have held

over the past, and the above modeling present such a picture.

Page 17: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

4. DISCUSSION

Page 18: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

- 17 -

To our knowledge, the only other approach to time series of continuous propor­

tions to have been developed in any detail is that of Smith and Brunsdon (1986) and

Brunsdon (1986), building on the proposals of Aitchison and Shen (1980) and

Aitchison (1982, 1987). This involves applying multivariate ARIMA models (Box and

Jenkins, 1976; Tiao and Box, 1981) to Aitchison's (1982) (asymmetric) log ratios

log is« I Yd+l,t) (i = 1, ... ,d). In the univariate case (d= 1) there is no difficulty

about which category to use in the denominator. However, when d?!. 2 this may

present more problems, at least for the interpretation of the estimated parameters. A

possible solution, suggested by Aitchison (1983) in the context of principal com­

ponents, is to work instead with the symmetric logratios Zt. This idea is developed in

Grunwald (1987, Section 4.2), where the symmetric logratios are modeled using the

normal Kalman filter rather than multivariate ARIMA models.

A full comparison between the approach of Smith and Brunsdon (1986), that of

Grunwald (1987, Section 4.2) and the one proposed in this paper has yet to be made.

However, the present approach appears to have the advantages of working with

models that are readily interpretable in terms of the underlying data generating

mechanism, and of yielding results that are easily interpretable in terms of the odds

ratios among the components of the composition. It is also based on the Dirichelet

distribution, which many consider to be more natural for compositions. The present

approach also may well prove more robust to outliers because the Dirichlet distribu­

tion has thicker tails than the logistic-normal distribution. For example in the univari­

ate case (d= 1). for large Iz I, the Dirichlet density (2.2) is logp(z) ee - Iz I, while in

the logistic-normal case logp(z) oe - z2. Thus, in contrast with what happens in the

usual unrestricted data case, we can perhaps both use a natural distribution and

achieve robustness-a pleasant surprise!

Page 19: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

considered the use of the logistic transformation. However, these refer only to the

univariate case.

One difficulty which arises no matter what approach is used is the problem of

zeros. This is because in the Dirichlet distribution, as in the logistic-normal, an exact

zero in one of the categories is an event of probability zero. In applications, however,

one does encounter exact zeros, and they cannot be accomodated by any of the

methods discussed here. One possible solution is to allow a singular component of

the conditional distribution on the boundary of the simplex, with density proportional

to the (non-singular) limiting conditional density at the boundary. This idea, which

appears to be a refinement of an idea of Aitchison (1982, end of Section 7.4), may be

useful more generally for the analysis of continuous proportions with zeros, outside

the time series context.

One by-product of the present work is the development of two new distributions

on the simplex, the DC distribution (2.5) and the DCD distribution defined in Section

2.2. These are based on the Dirichlet distribution, but generalize it to allow for depen­

dence between the components. They have yet to be studied in detail, but the general

idea of using mixtures of Dirichlet distributions does seem worth investigating,

perhaps in conjunction with other generalizations of the Dirichlet, such as that of

Aitchison (1985).

REFERENCES

Aitchison, J. (1982) The statistical analysis of compositional data (with Discussion).

Journal of the Royal Statistical Society B 44 139-177.

Aitchison, J. (1983) Principal component analysis compositional Biometrika 70

A on

R tatisticai soaetv B 47 1

Page 20: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

to

l:!.n)?me'enll.1l 82D

new

Aitchison, J. (1986) The statistical analysis of compositional data. London: Chapman

and HalL

Aitchison, J. and Sherr, S. M. (1980) Logistic-normal distributions: some properties

and uses. Biometrika 67 261-272.

Azzalini, A. (1984) A Markov process with Beta marginal distribution. Statistica 44

241-243.

Box, G.E.P. and Jenkins, G.M. (1976) Time series analysis forecasting and control (2nd

ed.). San Francisco: Holden-Day.

Box, G .E.P. and Tiao, G.C. (1975) Intervention analysis with applications to economic

and environmental problems. Journal of the American Statistical Association 70 70­

79.

Brunsdon, T.M. (1986) Time series of compositional data. Unpublished Ph.D. thesis,

University of Southampton.

Chayes, F. (1971) Ratio Correlation. Chicago: University of Chicago Press.

Grunwald, G.K. (1987). Time series models for continuous proportions. Unpublished

Ph.D. dissertation, Department of Statistics, University of Washington.

Harrison, P. J. and Stephens, C. F. (1971) A Bayesian approach to short-term fore­

casting. Operations Research Quarterly 22 341-362.

Harrison, P. J. and Stephens, C. F. (1976) Bayesian forecasting. Journal of the Royal

Statistical Society B 38 205-247.

Hurley, C. and A. Buja (1988) Analyzing high-dimensional data with motion graphics.

Tech. Rep. STAT-88-03, Dept. of Statistics and Actuarial Science, University of

Waterloo.

Page 21: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

.. 20-

Kitagawa, G. (1987) Non-Gaussian state-space modeling of nonstarionary time series

(with Discussion). Journal of the American Statistical Association 82 1032-1063.

Los, C. A. (1984). Econometrics of models with evolutionary parameter structures.

Unpublished Ph.D. dissertation, Dept. of Economics, Columbia University.

McCullagh, P. and Nelder, J.A. (1983). Generalized Linear Models. London: Chapman

and Hall.

McKenzie, E. (1985) An autoregressive process for Beta random variables. Manage­

ment Science 31· 988-997.

Pagan, A. (1980) Some identification and estimation results for regression models

with stochastically varying coefficients. Journal of Econometrics 12 341-363.

Schwarz, G. (1978) Estimating the dimension of a model. Annals of Statistics 6 461­

464.

Smith, 1. Q. (1979) A generalization of the Bayesian steady forecasting model. Jour­

naliof. the Royal Statistical Society B 41 375-387.

Smith, J. Q. (1981) The multiparameter steady model, Journal of the Royal Statistical

Society B 43 256-260.

Smith, T. F. M. and Brundson, T. M. (1986) Time series methods for small areas.

Technical report, University of Southampton.

SSI (1988) S-Plus reference manual. Seattle: Statistical Science Inc.

Tiao, G .C. and Box, G .E.P. (1981) Modeling multiple series with applications.

Journal of the American Statistical Association 76 802-816.

Tierney, and Kadane, J. B. (1986) Accurate approximations for posterior moments

and marginal densities. Journal of the American Statistical Association 81 82-86.

(

Page 22: TIMESERIES OF CONTINUOUS PROPORTIONS › research › reports › 1989 › tr164.pdf · A vector of continuous proportions consists of the proportions of some total accounted for

- 21 -

West. M .• Harrison. P. J. and Migon, H. S. (1985) Dynamic generalized linear models

and Bayesian forecasting (with Discussion). Journal of the American Statistical

Association 80 73-97.