Flexible Bivariate INAR(1) Processes Using Copulas

19
This article was downloaded by: [University of Calgary] On: 30 April 2013, At: 08:23 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Communications in Statistics - Theory and Methods Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lsta20 Flexible Bivariate INAR(1) Processes Using Copulas Dimitris Karlis a & Xanthi Pedeli a a Department of Statistics, Athens University of Economics and Business, Athens, Greece Published online: 02 Jan 2013. To cite this article: Dimitris Karlis & Xanthi Pedeli (2013): Flexible Bivariate INAR(1) Processes Using Copulas, Communications in Statistics - Theory and Methods, 42:4, 723-740 To link to this article: http://dx.doi.org/10.1080/03610926.2012.754466 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Transcript of Flexible Bivariate INAR(1) Processes Using Copulas

Page 1: Flexible Bivariate INAR(1) Processes Using Copulas

This article was downloaded by: [University of Calgary]On: 30 April 2013, At: 08:23Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Theory and MethodsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/lsta20

Flexible Bivariate INAR(1) Processes Using CopulasDimitris Karlis a & Xanthi Pedeli aa Department of Statistics, Athens University of Economics and Business, Athens, GreecePublished online: 02 Jan 2013.

To cite this article: Dimitris Karlis & Xanthi Pedeli (2013): Flexible Bivariate INAR(1) Processes Using Copulas,Communications in Statistics - Theory and Methods, 42:4, 723-740

To link to this article: http://dx.doi.org/10.1080/03610926.2012.754466

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form toanyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses shouldbe independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims,proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly inconnection with or arising out of the use of this material.

Page 2: Flexible Bivariate INAR(1) Processes Using Copulas

Communications in Statistics—Theory and Methods, 42: 723–740, 2013Copyright © Taylor & Francis Group, LLCISSN: 0361-0926 print/1532-415X onlineDOI: 10.1080/03610926.2012.754466

Flexible Bivariate INAR(1) ProcessesUsing Copulas

DIMITRIS KARLIS AND XANTHI PEDELI

Department of Statistics, Athens University of Economicsand Business, Athens, Greece

Multivariate count time series data occur in many different disciplines. The classof INteger-valued AutoRegressive (INAR) processes has the great advantage toconsider explicitly both the discreteness and autocorrelation characterizing this typeof data. Moreover, extensions of the simple INAR(1) model to the multi-dimensionalspace make it possible to model more than one series simultaneously. However,existing models do not offer great flexibility for dependence modelling, allowing onlyfor positive correlation. In this work, we consider a bivariate INAR(1) (BINAR(1))process where cross-correlation is introduced through the use of copulas for thespecification of the joint distribution of the innovations. We mainly emphasize onthe parametric case that arises under the assumption of Poisson marginals. Othermarginal distributions are also considered. A short application on a bivariatefinancial count series illustrates the model.

Keywords BINAR; Count data; Frank copula; Negative correlation.

Mathematics Subject Classification 62M10; 62H12.

1. Introduction

Non negative integer-valued time series occur in many different sciences, usuallyin the form of counts of events at consecutive points in time. Moreover, there areseveral circumstances where the collected data are counts observed in different timepoints, while the counts at each time point are correlated. Examples of multivariatecount time series can be found in epidemiology, where the number of incidencesof certain diseases with similar clinical signs or a common causal path are counteddaily, and in finance, where the number of orders of different stocks observed intime can provide interesting information about the behavior of the market.

Count data need special treatment since normal approximations failsubstantially, especially when low counts are observed or the count series includesa lot of zeros. Hence, a wide variety of models appropriate for treating such type ofdata have been discussed in the literature (Grunwaid et al., 2000). One of the most

Received January 29, 2011; Accepted November 27, 2012Address correspondence to Dimitris Karlis, Department of Statistics, Athens University

of Economics and Business, Athens, Greece; E-mail: [email protected]

723

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 3: Flexible Bivariate INAR(1) Processes Using Copulas

724 Karlis and Pedeli

prominent classes of models for count series consists of the so-called INteger-valuedAutoRegressive (INAR) processes, introduced by McKenzie (1985) and Al-Osh andAlzaid (1987). In recent years, several extensions and generalizations of the simplestformulation, i.e., the first-order integer-valued autoregressive (INAR(1)) model,have appeared in the literature (Jung and Tremayne, 2006; McKenzie, 2003).

Extensions of the INAR(1) process to the multi-dimensional space have beendiscussed by Franke and Rao (1995), Latour (1997), Brännäs and Nordström (2000),Quoreshi (2006), and Kachour (2009). Also, Pedeli and Karlis (2011) recentlyintroduced a bivariate INAR(1) (BINAR(1)) model, appropriate to model bivariatecount time series. The main focus in this article is on models with bivariate Poissonand bivariate negative binomial innovations appropriate for equidispersed andoverdispersed data, respectively. An important limitation of such specifications isthat they allow only for positive correlation between the two series.

Selection of a distribution for the innovations with negative correlation can alsoproduce negative correlation between the two series. Such models are available inthe literature, as for example, the bivariate Poisson-lognormal model of Aitchisonand Ho (1989) (see also Chib and Winkelmann, 2001) or the finite mixturemodel developed in Karlis and Meligkotsidou (2007). Such models suffer from thedifficulty to generalize to other families of marginal distributions and also the rangeof dependence structure offered is limited (see Nikoloulopoulos and Karlis, 2008).

A recently popular and fashionable alternative is the construction of modelsbased on copulas. While the literature on continuous models created via copulasabounds, the literature on discrete models created via copulas is less developed (see,e.g., Nikoloulopoulos and Karlis, 2009; Kolev and Paiva, 2009, and the referencetherein). Copula-based models can be used in order to define flexible bivariatediscrete distributions which can serve as the distribution of the innovations in thebivariate INAR model. Such a construction allows for certain useful properties likenegative correlation and/or overdispersion. Also, the marginals need not to be inthe same family.

In this work, the idea of copulas is adapted to the integer-valued autoregressiveframework. In the present setting, we use copulas to create flexible bivariatedistributions for the distribution of the innovations making it possible toaccommodate both positive and negative correlation. As is shown, the marginal timeseries of the generated BINAR(1) process are simple Poisson INAR(1) models whenthe marginals are Poisson.

The remainder of this article proceeds as follows. Section 2 presents brieflythe BINAR(1) process. In Sec. 3, we discuss the idea of copula-based models andtheir adaptation to the present setting. The performance of maximum likelihoodestimators is assessed through a short simulation experiment in Sec. 4. An empiricalapplication on financial data series illustrates the model in Sec. 5. Finally, Sec. 6concludes.

2. The BINAR(1) Process

In this section the BINAR(1) process introduced by Pedeli and Karlis (2011) isconsidered. We start by specifying the model, we then provide its basic statisticalproperties and conditional likelihood, and briefly discuss its peculiarities andinterpret its structure.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 4: Flexible Bivariate INAR(1) Processes Using Copulas

Flexible Bivariate INAR Processes 725

2.1. The model

Definition 2.1. A non negative integer-valued bivariate time series �Xt�t∈� is said tobe a BINAR(1) process if

Xt = A � Xt−1 + Rt =[�1 00 �2

]�[X1�t−1

X2�t−1

]+[R1t

R2t

]� t ∈ � (2.1)

for some 2× 2 diagonal matrix A with independent elements ��jj�j=1�2 (fornotational simplicity we use instead ��j�j=1�2� and a non negative integer-valuedrandom 2-vector Rt.

In the above specification, A� is a matricial operation which acts as the usualmatrix multiplication keeping in the same time the properties of the binomialthinning operation (see, e.g., Silva and Oliveira, 2004; Weiß, 2008). Thus, the jthelement of the bivariate series, j = 1� 2, is given by

Xjt = �j � Xj�t−1 + Rjt� (2.2)

where � � X =∑Xi=1 Yi = Y , with Yi being a sequence of i.i.d. Bernoulli random

variables such that P�Yi = 1� = � = 1− P�Yi = 0� and � ∈ �0� 1� (Steutel and vanHarn, 1979). The � � X operator represents binomial thinning of X.

The non negative integer-valued random process �Xt�t∈� is the unique strictlystationary solution of (2.1), if the largest eigenvalue of the matrix A is less than 1,i.e., max��1� �2� < 1, and E�Rt� < �.

In the BINAR(1) setting, the role of the innovations Rt is twofold. Firstly,they serve as the vehicle for the introduction of dependence between the two series�X1t� X2t� that comprise the BINAR(1) model. Secondly, it is the choice of theform of the innovations distribution that leads to the specification of the underlyingmarginal distributions.

2.2. Properties

The unconditional first- and second-order moments of the BINAR(1) process canbe derived under the assumptions of independence between the thinning operationsand of �Rjt� being an i.i.d. sequence with finite mean j and variance 2

j = �jj ,�j > 0� j = 1� 2. Moreover, assuming some sort of dependence between R1t and R2t,it can be shown that

Cov�X1t� R2t� = Cov�R1t� R2t�� (2.3)

That is, the covariance between the innovations of the two series at time t, totallydetermines the covariance between the current value of the one process and theinnovations of the other process at the same point in time t, irrespective of theunderlying joint distribution of �R1t� R2t�.

Proposition 2.1 summarizes the unconditional first- and second-order momentsof the BINAR(1) process. Analytical proofs can be found in Pedeli and Karlis (2011).

Proposition 2.1. Properties of the BINAR(1) process:

1. E�Xjt� = Xj= j

1−�j, j = 1� 2;

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 5: Flexible Bivariate INAR(1) Processes Using Copulas

726 Karlis and Pedeli

2. Var�Xjt� = 2Xj

= ��j+�j �j

1−�2j, j = 1� 2;

3. Cov�Xjt� Xj�t+h� = �Xj�h� = �hj

2Xj, j = 1� 2� h = 1� 2� � � � ;

4. Corr�Xjt� Xj�t+h� = �Xj�h� = �hj , j = 1� 2� h = 1� 2� � � � ;

5. Cov�X1�t+h� X2t� = �h1�1−�1�2�

Cov�R1t� R2t�� h = 0� 1� � � � .

Note that the mean, variance, and autocovariance functions can take onlypositive values, since j ,

2j , and �j are all positive. Depending on whether �j > 1,

�j ∈ �0� 1�, or �j = 1, the variance may be larger than the mean (overdispersion),smaller than the mean (underdispersion), or equal to the mean (equidispersion)respectively.

2.3. The Conditional Likelihood

The definition of the BINAR(1) model given in (2.1) is helpful for derivingthe conditional likelihood and hence yields conditional ML estimates. Also theconditional distributions are needed for prediction purposes.

The conditional density for the BINAR(1) model is given by the convolution oftwo binomials and the joint distribution of the innovations. Let

f1�x1� =(X1�t−1

x1

)�x11 �1− �1�

X1�t−1−x1 (2.4)

and

f2�x2� =(X2�t−1

x2

)�x22 �1− �2�

X2�t−1−x2� (2.5)

be the two binomials related to the thinning (recall that thinning is appliedindependently and hence the joint distribution is merely the product) and thebivariate distribution of the innovations, namely f3�k� s� = P�R1t = k� R2t = s�.Thus, the conditional density becomes

f�xt � xt−1� �� =∑k

∑s

f1�x1t − k�f2�x2t − s�f3�k� s�� (2.6)

where � is the vector of unknown parameters. The conditional likelihood functionis then given by

L�� � x� =T∏t=1

f�xt � xt−1� �� (2.7)

for some initial value x0 and hence maximization provides the ML estimates.Numerical maximization is straightforward with standard statistical packages.

The above general formula implies that having available the joint probabilitymass function of the innovation distribution one can compute the conditionallikelihood and derive estimates easily.

Also, (2.6) is the one-step ahead conditional distribution which can be used forprediction. Since the summations involved are finite, hence, one can compute thepredictive distributions relatively easily (see Pedeli and Karlis, 2011).

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 6: Flexible Bivariate INAR(1) Processes Using Copulas

Flexible Bivariate INAR Processes 727

3. Copula-Based Models

Copulas (see Nelsen, 2006) have found a remarkable large number of applications infinance, hydrology, biostatistics, etc., since they allow the derivation and applicationof flexible multivariate models with given marginal distributions. The key ideais that the marginal properties can be separated from the association propertiesleading thus to a wealth of potential models. For the case of discrete data, copula-based modeling is less developed (see Genest and Nešlehová, 2007; Nikoloulopoulosand Karlis, 2009). Also, some of the desirable properties of copulas are not validwhen dealing with count data, as for example dependence properties are nowdependent on the marginal properties.

Our interest is restricted to the bivariate case, so we will focus on bivariatecopulas. To make the article self-contained we give a definition.

Definition 3.1 (Nelsen, 2006). A bivariate copula is a function C from �0� 1�2 to�0� 1� with the following properties.

1. For every �u� v� ∈ �0� 1�

C�u� 0� = 0 = C�0� v� and C�u� 1� = u�C�1� v� = v�

2. For every �u1� u2� v1� v2� ∈ �0� 1� such that u1 ≤ u2 and v1 ≤ v2,

C�u2� v2�− C�u2� v1�− C�u1� v2�+ C�u1� v1� ≥ 0�

If F�x��G�y� are the cumulative distribution functions (cdf’s) of the univariaterandom variables X and Y , then C�F�x��G�y�� is a bivariate distribution for �X� Y�with marginal distributions F and G, respectively. Conversely, if H is a bivariate cdfwith univariate marginal cdf’s F�G, then, according to Sklar (1959)’s theorem thereexists a bivariate copula C such that for all �X� Y�, H�x� y� = C�F�x��G�y��. If F�Gare continuous, then C is unique, otherwise, C is uniquely determined on range F ×range G. This lack of uniqueness is not a problem in practical applications as itimplies that there may exist two copulas with identical properties.

Actually, copulas provide the joint cumulative function. In order to derivethe joint density (for continuous data) or the joint probability function (for discretedata) we need to take the derivatives or the finite differences of the copula. In thebivariate case we have that for discrete data, the probability mass function (pmf) isobtained by finite differences of the cdf through its copula representation (Genestand Nešlehová, 2007), namely

h�x� y� �1� �2� �� = C�F�x� �1��G�y� �2�� ��− C�F�x − 1� �1��G�y� �2�� ��

− C�F�x� �1��G�y− 1� �2�� ��+C�F�x− 1� �1��G�y− 1� �2�� ���

(3.1)

where F�·� and G�·� are the marginal cumulative functions, �1 and �2 are theparameters associated with the marginal distributions, respectively, and � is theparameter(s) of the copula.

Example 3.1. Consider that F�·� and G�·� are cumulative functions from Poissondistributions with parameters 1 and 2, respectively. Then using (3.1) we obtain

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 7: Flexible Bivariate INAR(1) Processes Using Copulas

728 Karlis and Pedeli

Figure 1. The probability mass function when the marginal distributions are Poisson withparameters 1 = 2, 2 = 2�5 and varying copula parameter � for the Frank copula. Thevalues were 0.5, −0.5, 2, and −2. The darker the color, the higher the probability. Onecan easily see that for negative values we can have negative correlation while we still havePoisson marginal distributions.

a bivariate distribution with Poisson marginals, while the correlation propertiesdepend on the choice of the copula. Figure 1 shows the joint pmf for Poissonmarginals with means 1 = 2 and 2 = 2�5, respectively, and the usage of a Frankcopula with varying parameters. Frank copula allows for both positive and negativedependence and hence we can create a BINAR model with negative correlationbetween the two time series.

The literature on copulas contains a huge number of different copulas, eventhough only few of them have been used in applications. In this article, we use twocopulas albeit there are many other copulas that one may use. We selected the twomore common ones that allow for negative correlation, i.e., the Frank copula andthe normal copula, since in this article we focus on negative correlation not providedby other models.

The Frank copula has cdf given by

C��u� v� = −1�

(1+ �exp�−u��− 1��exp�−v��− 1�

�exp�−��− 1�

)� (3.2)

where 0 ≤ u� v ≤ 1, � ∈ �−����− �0�. In fact, � is the copula parameter, beingnegative implies negative correlation. We also use in this article the normal copula

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 8: Flexible Bivariate INAR(1) Processes Using Copulas

Flexible Bivariate INAR Processes 729

Figure 2. Realization of a BINAR(1) model when the innovation distribution has Poissonmarginals connected through a Frank copula with negative parameter.

defined as

C��u� v� = ��

(�−1�u���−1�v�

)�

where � is the N�0� 1� cdf, �−1 is the functional inverse of � and �� is the bivariatestandard normal cdf with correlation �.

A very exciting property when working with copulas is that the marginaldistributions can be selected separately. Hence, we may derive a bivariate discretedistribution with one marginal Poisson and one negative binomial, hence quiteflexible choices can be made.

Figure 2 shows simulated data when Poisson marginals and a Frank copulawith negative correlation is used as the distribution of the innovations. Fitting sucha model to real data is easy through numerical maximization of the conditionallikelihood, where (3.1) is used. Interestingly, the marginal time series are simplePoisson INAR(1) models.

Hence, we propose the use of innovation distributions which are defined viacopulas. This provides a pool of potential models that can have suitable propertiesfor both the correlation structure between the two series as well as the marginaldistributions. The cost is that typically distributions defined with copulas are hardto work with. For example, while Pedeli and Karlis (2011) provide the probabilitygenerating function (pgf) of the marginal distribution, the form of the pgf for abivariate discrete distribution is usually hard to derive (see, for example, some fewresults in Piperigou, 2009).

Consider a BINAR(1) model that has innovations with Poisson marginals anda copula say C�·� ·� to account for their correlation. The two marginal processes will

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 9: Flexible Bivariate INAR(1) Processes Using Copulas

730 Karlis and Pedeli

be simple Poisson INAR(1) processes due to the Poisson marginals but they will becorrelated, though not necessarily via the copula C�·� ·�. This interesting propertyimplies that one can create models with known marginal properties by consideringsimple univariate processes and a joint innovation distribution through copulas.

Finally, one can derive numerically the stationary distribution relatively easyby considering the definition of the process. Consider that X0 = R0 for some jointprobability function PR for the innovations. The unconditional distribution of X1

can be derived via the definition as the convolution of two random vectors, namelyY and R where

P�Y� =∑X0

P�X1 �X0�P�X0�

while the distribution of R is merely PR as assumed. Calculations while cumbersomeare relatively simple and working forward one can derive the unconditionaldistribution of Xt since after a few (typically) steps the distribution stops changing(or changes negligibly), and hence the stationary distribution can be at least wellapproximated. We have tested this and typically after 10–15 steps the stationarydistribution is available with very good accuracy. Interestingly working this waywe have seen that if the innovations distribution is a bivariate Poisson definedthrough a Frank copula the stationary distribution has Poisson marginals but notrelated through a Frank copula. It remains an interesting problem to check if thereis a copula that has the property that the stationary distribution is in the samefamily (apart from the trivial one about the copula inducing the bivariate Poissondistribution in Pedeli and Karlis, 2011).

Assessing the goodness of fit of the introduced model requires assessment of theadequacy of each random component. Thus, the classical definition of residuals asdifferences between the observed and fitted values is rather an inadequate diagnostictool. Based on the definition of Freeland and McCabe (2004), Pedeli and Karlis(2011) distinguished between a set of residuals for the survival process r

�j�1t = �j �

Xj�t−1 − �jXj�t−1 and another for the innovation component r�j�2t = Rjt − j , j = 1� 2,of each one of the two series �X1t� X2t� that comprise the BINAR(1) model. Thisdefinition can also be adapted to the present setting by replacing the unobservablequantities �j � Xj�t−1 and Rjt with their conditional expectations given the observedvalues of Xjt and Xj�t−1.

The one-step ahead predictive distribution of �X1�T+h� X2�T+h� given �x1T � x2T � isgiven by

P�X1�T+1 = x1� X2�T+1 = x2 � x1T � x2T �

=min�x1�x1T �∑

k=0

min�x2�x2T �∑s=0

(x1T

x1 − k

)��1�

x1−k�1− �1�x1T−x1+k

×(

x2Tx2 − s

)��2�

x2−s�1− �2�x2T−x2+sf

(R1�T = k� R2�T = s � x1T � x2T

)with means

E�xj�T+1 � x1T � x2T � = �jxjT + E�Rjt�� j = 1� 2� (3.3)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 10: Flexible Bivariate INAR(1) Processes Using Copulas

Flexible Bivariate INAR Processes 731

and variances,

Var�xj�T+1 � x1T � x2T � = �j�1− �j�xjT + Var�Rjt�+ E�Rjt�� j = 1� 2�

(see also Pedeli and Karlis, 2011, for a detailed proof).

4. Simulations

As already noted in Sec. 2.3, the evaluation of the conditional likelihood andderivation of ML estimates are quite straightforward and easy for the BINAR(1)model. In this section we aim to assess the small sample properties of the MLestimators of the BINAR(1) model when copulas are used for the specification ofthe joint distribution of the innovations. For this purpose, we conducted a seriesof simulation experiments assuming that the innovation distribution has Poissonmarginals connected through a Frank copula. We considered two alternativesample sizes, i.e., n = 200 and n = 500 and in each experiment we conducted200 replications. Initial parameter values were chosen so as to obtain somerepresentative configurations. More specifically, the parameters �1 and �2 wereallowed to alternate between 0.3 and 0.7. For 1 and 2 we considered three pairs ofinitial values, i.e., �1� 2� = ��1� 1�� �1� 3�� �3� 3��, while � was fixed at either −1 or 1.The alternative combinations of the design parameters ��1� �2� 1� 2� �� resulted ina total number of 24 simulation experiments per sample size. For the optimizationof the likelihood function we used the optim function in R adopting suitableparameter transformations. Assessment of the behavior of the ML estimators wasbased on their biases and corresponding mean square errors (MSE).

Table 1 includes results for nine representative configurations. This is only apart of our simulation results but similar conclusions hold also for the full set ofexperiments. All estimators perform well and exhibit very low biases and MSE’s. Asexpected, increasing sample size to n = 500 reduces all biases and MSE’s.

The performance of the ML estimators is also depicted in Fig. 3. This is justone representative case, i.e., ��1� �2� 1� 2� �� = �0�3� 0�7� 1� 3�−1�. Apparently, allestimators perform well in terms of location with median estimates that are veryclose to the real parameter values. Moreover, both the interquartile ranges and theoverall range of the produced estimates are very narrow, indicating low dispersion.

Closing this section we would like to mention some numerical issues. Numericalmaximization was easy in all the cases via standard functions in R. No particularproblems were encountered and for all the cases we tested we verified that the globalmaximum as found. Very efficient staring values can be found using univariateINAR(1) models, while the staring value for the copula parameter can be found bymatching the copula parameter to the sample Kendall’s tau value.

5. Application

The main part of the microstructure literature has been devoted to the theoreticaland empirical modeling of the behavior of bid/ask spreads (Zhang et al., 2008).However, the cross-sectional and time series relationship between spread andvolume has also been of substantial interest in many studies (Easley and O’Hara,1992; Harris and Raviv, 1993; Lee et al., 1993). Due to its relationship with spreads,transaction costs and also market price variability (Campbell et al., 1993; Chan and

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 11: Flexible Bivariate INAR(1) Processes Using Copulas

732 Karlis and Pedeli

Table 1Biases and MSE’s of the ML estimators ��̂1� �̂2� ̂1� ̂2� �̂�

T of a BINAR(1)model when the innovation distribution has Poisson marginals connected

through a Frank copula. Results for different configurations of the parametersat n = 200 and n = 500

n = 200 n = 500

(�1, �2, 1, 2, �) Bias MSE Bias MSE

(0.3, 0.7, 1, 1, −1) �̂1 −0�015 0.005 0�000 0.002�̂2 −0�003 0.001 −0�000 0.000̂1 0�018 0.014 −0�004 0.005̂2 0�010 0.014 0�008 0.006�̂ −0�065 0.551 −0�012 0.255

(0.3, 0.7, 1, 1, 1) �̂1 −0�004 0.005 −0�004 0.002�̂2 −0�008 0.001 −0�003 0.000̂1 −0�010 0.012 0�009 0.005̂2 0�014 0.015 0�007 0.006�̂ 0�029 0.508 −0�008 0.179

(0.7, 0.7, 1, 1, −1) �̂1 0�000 0.001 0�002 0.000�̂2 0�002 0.001 −0�002 0.000̂1 −0�003 0.012 −0�004 0.005̂2 −0�008 0.014 0�000 0.006�̂ −0�022 0.642 −0�048 0.270

(0.7, 0.7, 1, 1, 1) �̂1 −0�002 0.001 −0�000 0.000�̂2 −0�004 0.001 −0�002 0.000̂1 0�004 0.014 −0�000 0.005̂2 0�001 0.013 0�004 0.005�̂ −0�012 0.689 0�052 0.275

(0.3, 0.7, 1, 3, −1) �̂1 0�004 0.005 −0�000 0.002�̂2 −0�002 0.001 −0�004 0.000̂1 −0�006 0.012 −0�003 0.006̂2 0�005 0.100 0�041 0.051�̂ −0�050 0.587 −0�013 0.201

(0.3, 0.7, 1, 3, 1) �̂1 0�000 0.005 0�005 0.001�̂2 0�001 0.001 0�000 0.000̂1 0�008 0.011 −0�005 0.004̂2 0�000 0.101 −0�005 0.040�̂ 0�030 0.548 −0�038 0.181

(0.7, 0.7, 1, 3, −1) �̂1 −0�003 0.001 −0�002 0.000�̂2 −0�002 0.001 0�001 0.000̂1 0�010 0.014 0�001 0.005̂2 −0�002 0.110 −0�002 0.046�̂ −0�002 0.768 −0�007 0.248

(continued)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 12: Flexible Bivariate INAR(1) Processes Using Copulas

Flexible Bivariate INAR Processes 733

Table 1(Continued)

n = 200 n = 500

(�1, �2, 1, 2, �) Bias MSE Bias MSE

(0.7, 0.7, 1, 3, 1) �̂1 −0�003 0.001 −0�001 0.000�̂2 −0�001 0.001 −0�000 0.000̂1 0�004 0.013 0�000 0.006̂2 0�004 0.094 −0�003 0.039�̂ −0�022 0.614 0�129 0.273

(0.3, 0.7, 3, 3, −1) �̂1 −0�004 0.003 0�002 0.001�̂2 −0�002 0.001 0�001 0.000̂1 0�017 0.068 0�000 0.036̂2 0�025 0.098 −0�027 0.040�̂ −0�111 0.553 0�061 0.204

(0.3, 0.7, 3, 3, 1) �̂1 −0�012 0.004 −0�003 0.002�̂2 0�000 0.001 −0�002 0.000̂1 0�048 0.085 0�015 0.032̂2 0�003 0.106 0�027 0.042�̂ 0�057 0.402 0�012 0.224

(0.7, 0.7, 3, 3, −1) �̂1 −0�006 0.001 −0�001 0.000�̂2 0�001 0.001 −0�002 0.000̂1 0�041 0.097 0�005 0.046̂2 −0�000 0.095 0�013 0.042�̂ −0�053 0.561 −0�019 0.270

(0.7, 0.7, 3, 3, 1) �̂1 −0�002 0.001 −0�000 0.000�̂2 −0�004 0.001 −0�001 0.000̂1 −0�011 0.110 0�000 0.042̂2 0�031 0.102 0�010 0.044�̂ 0�134 0.672 −0�031 0.256

Fong, 2000; Gallant et al., 1992), volume can also be thought as a good indicatorof operational efficiency.

On the other hand, the implicit assumption of “symmetry” which is oftenimposed in the microstructure literature is in contrast to the compositionof markets which contain both buyers and sellers. More specifically, undercertain circumstances and trading environments, buyer-induced and seller-inducedcomponents have not a common impact on price setting (see, e.g., Zhang et al.,2008). Hence, this asymmetry is expected to also affect the volumes, i.e., bid and askquotes do not necessarily move up or down simultaneously after an information-related shock. This implies a potential for negatively correlated bid and ask quotesof a specific stock, which is most frequently observed during a short time period as,e.g., within a day.

In the aforementioned setting, we study intraday bid (X1) and ask (X2) numberof quotes of a specific stock, namely “Merck,” during the period of August 1995.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 13: Flexible Bivariate INAR(1) Processes Using Copulas

734 Karlis and Pedeli

Figure 3. Boxplots with results from the simulation experiment with initial parametervalues ��1� �2� 1� 2� �� = �0�3� 0�7� 1� 3�−1�.

The data have been taken from the “The Trade and Quote” (TAQ) database, acollection of intraday trades and quotes of all securities listed on the New YorkStock Exchange (NYSE), American Stock Exchange, Nasdaq National MarketSystem, and SmallCap issues. TAQ provides historical tick-by-tick data of all stockslisted on NYSE back to 1993. The evolution of the quoted bids and asks has beenrecorded in 5-min intervals between 9:30 a.m. and 4:05 p.m., resulting in a bivariatetime series of 1817 observations.

Table 2 includes descriptive statistics for the two quote series. As can beseen, approximately the same number of ask and bid quotes have been recordedduring the study period. Both series are overdispersed with significant first-orderautocorrelations which decrease as passing through higher lags (see also Fig. 4). Thelarge tails and the presence of dependence on the data are also portraited in Fig. 5.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 14: Flexible Bivariate INAR(1) Processes Using Copulas

Flexible Bivariate INAR Processes 735

Table 2Descriptive statistics for the ask and bid quotes series

of the Merck stock (August 1995)

Ask quotes Bid quotes

Mean 2�49 2�55Median 2�00 2�00St. Deviation 2�73 2�68Dispersion 3�00 2�82Maximum 18�00 16�00Minimum 0�00 0�00Autocorr. �lag = 1� 0�21 0�25Cross-corr. −0�17

The bid and ask quote series are negatively correlated �cross-correlation = −0�17�.The existence of both overdispersion and negative correlation implies a need formore flexible models able to account for all of the data peculiarities.

Thus, the data are modeled in the BINAR(1) setting by assuming copula basedinnovation distributions. We selected two copulas allowing for negative correlation:the Frank and normal copula. As marginals, we worked with Poisson and negativebinomial distributions to account for the overdispersion of the data.

We first fit a model with Poisson marginals and a Frank copula, as the onedescribed in Example 3.1. This model effectively accounts for autocorrelation and

Figure 4. Time series plots and acf plots for the intraday bids and ask quotes of the Merckstock (August 1995). (color figure available online.)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 15: Flexible Bivariate INAR(1) Processes Using Copulas

736 Karlis and Pedeli

Figure 5. Histogram of the intraday bids and ask quotes of the Merck stock (August 1995).

negative cross-correlation but ignores the significant overdispersion present in thedata. Replacement of the assumption of Poisson marginals with one of negativebinomial marginals provides us with a model able to account for this additional(but also common) characteristic of our data. We have used the mean valueparametrization of the negative binomial, namely

P�Xj = k� = ��k+ �−1j �

���−1j �k!

(�−1j

�−1j + j

)�−1j(

j

�−1j + j

)k

� k = 0� 1� � � � �

where j > 0 is the mean and �j is the overdispersion parameter. For this modelwe also replace the Frank copula with the normal copula. Normal copula, beinga bivariate cdf, is more demanding since it involves a bivariate integral in eachcalculation of probabilities. Finally, we also fit another model with a bivariatePoisson lognormal (BPLN) innovation distribution, introduced by Aitchison andHo (1989). This is one of the very few bivariate discrete distributions allowing fornegative correlation. So, we consider it as a candidate. One limitation of the modelis that it assumes Poisson-lognormal marginal distribution, which due to its shapeis very restrictive. The parameter estimates and corresponding standard errors forall models are summarized in Table 3.

Although the model with negative binomial marginals estimates two additionalparameters, namely �1 and �2, the parameters estimates ��̂1� �̂2� ̂1� ̂2� �̂� do notdiffer markedly from the corresponding estimates provided by the model withPoisson marginals. As expected, the choice of the marginal distributions mainlyaffects the standard errors of the produced estimates. Thus, choosing a nonappropriate distribution can lead to incorrect inference since the standard errorsare underestimated. Also, comparison of the log-likelihoods and corresponding AIC

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 16: Flexible Bivariate INAR(1) Processes Using Copulas

Flexible Bivariate INAR Processes 737

Table 3Maximum likelihood estimates from fitting BINAR(1) models with negativecorrelation to the ask and bid quotes series of the Merck stock (August 1995)

Poisson marginals Neg.Bin marginals

Frank Frank Normal Biv Poisson-Lognormal

Estimate SE Estimate SE Estimate SE Estimate SE

�̂1 0�148 0.012 0�141 0.016 0�138 0.016 0�137 0.016�̂2 0�172 0.013 0�164 0.016 0�166 0.016 0�154 0.016̂1 2�095 0.044 2�144 0.076 2�147 0.076 ̂1 0�283 0.047̂2 2�081 0.044 2�137 0.075 2�118 0.073 ̂1 0�322 0.046�̂1 1�230 0.080 1�244 0.082 ̂2

1 1�029 0.038�̂2 1�156 0.079 1�143 0.077 ̂2

2 0�983 0.037�̂ −0�838 0.115 −1�207 0.057 −0�221 0.026 �̂ −0�329 0.039Log-Lik −8862.28 −7518.17 −7517.02 −7611.275AIC 17734.56 15050.34 15048.05 15236.55

values reveals the improvement in fit that is achieved by assuming negative binomialmarginals. Models defined through copulas outperform the BPLN model. Bothcopulas provide rather similar fit.

Using the method described in the previous section we obtained the stationarydistribution which can be seen in Fig. 6. As expected, it has a mode in �0� 0� cell andnegative correlation. One-step ahead predictive distributions can be also calculated.Figure 7 depicts a series of such predictive distibutions for various combinations forcurrent values of the two variables.

Figure 6. The stationary distribution for the selected model.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 17: Flexible Bivariate INAR(1) Processes Using Copulas

738 Karlis and Pedeli

Figure 7. Predictive distributions for one-step ahead predictions based on the model withnegative binomial marginals and normal copula.

6. Concluding Remarks

In this article, we focus on the specification of a BINAR(1) process that canaccommodate both positive and negative correlation between the time series. Thisis achieved by specifying the innovation distribution in terms of appropriatebivariate copulas. Such a construction can create flexible bivariate discretedistributions, allowing for certain useful properties like negative correlation and/oroverdispersion. We emphasize that one may select different marginal distributionsand thus create very flexible models. Such an approach is equivalent to specifyingthe two marginal INAR(1) processes and then linking them together via copulas.Even though an explicit representation of the model is not easy, parameterestimation is feasible and simple through maximum likelihood.

Models based on copulas can be generalized in more dimensions but such ageneralization is not simple and straightforward. Specifically, in order to derive thejpmf we need excessive finite differences (in fact for a d-dimensional jpmf we need2d terms). Since excessive (finite) differentiation is needed, copulas defined as cdf,i.e., as multivariate integrals, are neither applicable nor useful. Hence, we need toswitch to multivariate copulas with closed form cdf’s. While such copulas exist theycan have limited correlation range and in general special care is needed (see alsoNikoloulopoulos and Karlis, 2009, for a thorough discussion).

Finally, one can use covariate information by considering covariates on themean of the innovations. This is rather simple to do when working with copulas as itimplies that the marginal is defined via a Poisson or a negative binomial regressionmodel respectively. Considering covariates the series are no longer stationary.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 18: Flexible Bivariate INAR(1) Processes Using Copulas

Flexible Bivariate INAR Processes 739

Acknowledgments

The authors kindly acknowledge financial support from the “Basic ResearchFunding Program 2009–2010” of Athens University of Economics and Business. Theauthors also thank the two referees for helpful comments to the earlier version.

References

Aitchison, J., Ho, C. (1989). The multivariate Poisson-log normal distribution. Biometrika75:621–629.

Al-Osh, M., Alzaid, A. (1987). First-order integer-valued autoregressive process. J. Time Ser.Anal. 8:261–275.

Brännäs, K., Nordström, J. (2000). A bivariate integer valued allocation model forguest nights in hotels and cottages. Umeå Economic Studies, 547, Departrnent ofEconomics, Umeå University.

Campbell, J., Grossman, S., Wang, J. (1993). Trading volume and serial correlation in stockreturns. Quart. J. Econ. 108:905–934.

Chan, K., Fong, W. (2000). Trade size, order imbalance, and the volatility-volume relation.J. Finan. Econ. 57:247–273.

Chib, S., Winkelmann, R. (2001). Markov chain Monte Carlo analysis of correlated countdata. J. Busi. Econ. Statist. 19:428–435.

Easley, D., O’Hara, M. (1992). Time and the process of security price adjustment. J. Finan.47:577–606.

Franke, J., Rao, T. (1995). Multivariate first order integer-valued autoregressions. TechnicalReport, Mathematics Department, UMIST.

Freeland, R., McCabe, B. (2004). Analysis of low count time series data by Poissonautoregression. J. Time Ser. Anal. 25(5):701–722.

Gallant, A., Rossi, P., Tauchen, G. (1992). Stock prices and volume. Rev. Finan. Stud.5:199–242.

Genest, C., Nešlehová, J. (2007). A primer on copulas for count data. ASTIN Bull.37:475–515.

Grunwald, G., Hydman, R., Tedesco, L., Tweedie, R. (2000). Non-Gaussian conditionallinear AR(1) models. Austral. NZ J. Statist. 42:479–495.

Harris, M., Raviv, M. (1993). Difference of opinion make a horse race. Rev. Finan. Stud.6:473–506.

Jung, R., Tremayne, A. (2006). Binomial thinning models for integer time series. Statist.Model. 6:81–96.

Kachour, M. (2009). The first-order rounded integer-valued Vectorial autoregressive(RINVAR(l)) process. Preprint, Université de Rennes 1.

Karlis, D., Meligkotsidou, L. (2007). Finite multivariate Poisson mixtures with apphcations.J. Statist. Plann. Infer. 137:1942–1960.

Kolev, N., Paiva, D. (2009). Copula-based regression models: a survey. J. Statist. Plann. Infer.139:3847–3856.

Latour, A. (1997). The multivariate GINAR(p) process. Adv. Appl. Probab. 29:228–248.Lee, C., Mucklow, B., Ready, M. (1993). Spreads, depths and the impact of earnings

information: An intraday analysis. Rev. Finan. Stud. 6:345–374.McKenzie, E. (1985). Some simple models for discrete variate lime series. Water Resour. Bull.

21:645–650.McKenzie, E. (2003). Discrete variate time series. In: Rap, C., Shanbhag, D., eds. Stochastic

Processes: Modelling and Simulation, Handbook of Statistics. Vol. 21, Amsterdam,North-Holland: Elsevier Science B.V., pp. 573–606.

Nelsen, R. (2006). An Introduction to Covulas. 2nd ed. New York: Springer-Verlag.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13

Page 19: Flexible Bivariate INAR(1) Processes Using Copulas

740 Karlis and Pedeli

Nikoloulopoulos, A., Karlis, D. (2008). On modelling count data: a comparison of somewell known discrete distributions. J. Statist. Computat. Simul. 78:437–457.

Nikoloulopoulos, A., Karlis, D. (2009). Finite normal mixture copulas for multivariatediscrete data modeling. J. Siatist. Plann. Infer. 139:3878–3890.

Pedeli, X., Karlis, D. (2011). A bivariate INAR(l) process with application. Statist. Model.Int. J. 11:325–349.

Piperigou, V. (2009). Discrete distributions in the extended fgm family: the p.g.f. approach.J. Statist. Plann. Infer. 139:3891–3899.

Quoreshi, A. (2006). Bivariate time series modeling of financial count data. Commun. Statist.Theor. Meth. 35:1343–1358.

Silva, M. D., Oliveira, V. (2004). Difference equations for the higher-order moments andcumulants of the INAR(l) model. J. Time Ser. Anal. 25:317–333.

Sklar, M. (1959). Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Statist.Univ. Paris 8:229–231.

Steutel, F., van Harn, K. (1979). Discrete analogues of self-decomposability and stability.Ann. Probab. 7:893–899.

Weiß, C. (2008). Thinning operations for modeling time series of counts-a survey. AStA Adv.Statist. Anal. 92:319–341.

Zhang, M., Rusell, J., Tsay, R. (2008). Determinations of bid and ask quotes andimplications for the cost of trading. J. Empir. Finan. 15:656–678.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

8:23

30

Apr

il 20

13