A nonparametric Bayesian approach to copula estimationbondell/NPStudentPapers/15_Paper.pdf · A...

15
A nonparametric Bayesian approach to copula estimation Abstract We propose a novel Dirichlet-based P´ olya tree (D-P tree) prior on the copula and a non- parametric Bayesian inference procedure based on the D-P tree. Through theoretical results and simulations, we are able to show that the flexibility of the D-P tree prior ensures its con- sistency in copula estimation, thus able to detect more subtle and complex copula structures than earlier non-parametric Bayesian models, such as a Gaussian copula mixture under model misspecification. Further, the continuity of the imposed D-P tree prior leads to a more favorable smoothing effect in copula estimation over classic frequentist methods, especially with small sets of observations. We also apply our method to the copula structure prediction between the S&P 500 index and the IBM stock prices during the 2007-08 financial crisis, finding that D-P tree- based methods enjoy strong robustness and flexibility over classic methods under such irregular market behaviors. 1 INTRODUCTION The copula, as the “link” of a multivariate distribution to its marginals, has attracted growing interest in statistical research since Sklar (1959). By Sklar’s Theorem, a copula characterizes the dependence structure between the marginal components. Therefore, the copula plays a central role in multivariate studies and has gained increasing popularity in application to fields such as risk analysis, insurance modeling, and hydrologic engineering (Nelsen 2007; Wu et al. 2014). The estimation of copulas has been well studied in parametric and semi-parametric settings, but little work has been released on the non-parametric Bayesian inference. In this article, we propose a novel multi-partition Dirichlet-based P´ olya tree (D-P tree) prior on the copula. Our D-P tree prior relaxes the binary partition constraints on earlier P´ olya-tree-like priors but still preserves the favorable properties of the P´ olya tree, including conjugacy and absolute continuity. Based on 1

Transcript of A nonparametric Bayesian approach to copula estimationbondell/NPStudentPapers/15_Paper.pdf · A...

A nonparametric Bayesian approach to copula estimation

Abstract

We propose a novel Dirichlet-based Polya tree (D-P tree) prior on the copula and a non-

parametric Bayesian inference procedure based on the D-P tree. Through theoretical results

and simulations, we are able to show that the flexibility of the D-P tree prior ensures its con-

sistency in copula estimation, thus able to detect more subtle and complex copula structures

than earlier non-parametric Bayesian models, such as a Gaussian copula mixture under model

misspecification. Further, the continuity of the imposed D-P tree prior leads to a more favorable

smoothing effect in copula estimation over classic frequentist methods, especially with small sets

of observations. We also apply our method to the copula structure prediction between the S&P

500 index and the IBM stock prices during the 2007-08 financial crisis, finding that D-P tree-

based methods enjoy strong robustness and flexibility over classic methods under such irregular

market behaviors.

1 INTRODUCTION

The copula, as the “link” of a multivariate distribution to its marginals, has attracted growing

interest in statistical research since Sklar (1959). By Sklar’s Theorem, a copula characterizes the

dependence structure between the marginal components. Therefore, the copula plays a central role

in multivariate studies and has gained increasing popularity in application to fields such as risk

analysis, insurance modeling, and hydrologic engineering (Nelsen 2007; Wu et al. 2014).

The estimation of copulas has been well studied in parametric and semi-parametric settings,

but little work has been released on the non-parametric Bayesian inference. In this article, we

propose a novel multi-partition Dirichlet-based Polya tree (D-P tree) prior on the copula. Our D-P

tree prior relaxes the binary partition constraints on earlier Polya-tree-like priors but still preserves

the favorable properties of the Polya tree, including conjugacy and absolute continuity. Based on

1

such a D-P tree prior, we provide a non-parametric Bayesian approach for copula estimation. Its

consistency is validated through theoretical analysis. By simulated comparisons, we demonstrate

that the D-P tree enjoys highly relaxed flexibility in capturing more complex and subtle copula

structures over other non-parametric alternatives.We illustrate our new method by focusing on

copula structure prediction between the S&P 500 daily index and the IBM daily stock prices

during the 2007-08 financial crisis. We find that D-P tree-based methods are rather robust and

adaptive to irregular market behavior.

Earlier parametric or semi-parametric methods often model copula functions within certain

parametric copula families and estimate the parameters by maximum likelihood (ML). For marginals,

either parametric or non-parametric estimations are usually adopted (Joe 1997; Jaworski et al. 2010;

Chen and Huang 2007; Oakes 1982 1986; Genest et al. 1995). However, these parametric or semi-

parametric methods suffer from the risk of severe bias when the model is misspecified, thus lack

the flexibility to provide accurate estimation for more complex and subtle copula structures. In ad-

dition, copula itself is strictly-increasing-transform invariant (Schweizer and Wolff 1981). Thereby,

under no further parametric assumptions, the rank statistics of data would preserve sufficient in-

formation required for the estimation. In light of these features, nonparametric methods seem to

be more natural and coherent for the estimation of copula.

Most of the recent studies on nonparametric copula estimation focus on empirical methods

(Jaworski et al. 2010; Deheuvels 1979), or kernel-related methods (Scaillet et al. 2007; Behnen

et al. 1985; Gijbels and Mielniczuk 1990; Schuster 1985; Hominal and Deheuvels 1979; Devroye and

Gyorfi 1985; Gasser and Muller 1979; John 1984; Muller 1991; Chen and Huang 2007). Current

nonparametric Bayesian methods focus mainly on an infinite mixture of elliptical copula families

such as the Gaussian or the skew-normal (Wu et al. 2014). Yet such models still have limitations: a

heavy computational burden as they are implemented through MCMC, and an inconsistency when

the model is misspecified, taking the infinite Gaussian copula mixture for instance. These motivate

us to explore priors with conjugacy and more generality.

Note that here we focus mainly on the bivariate copula case to illustrate our method. Also

to concentrate on the estimation of copula structures itself, we assume that the marginals are

known or can be accurately estimated. Such an assumption is reasonable in that: (1) usually we

have more information (either parametric or nonparametric) on the marginals of the data for the

estimation; (2) multivariate data are exponentially enriched when considered marginally, providing

higher resolution for accurate estimation.

2

The article is organized as follows: In Section 2, we introduce the proposed D-P tree prior

and the procedure for copula inference. In Section 3, we elaborate on properties of the D-P tree.

Section 4 provides a simulation-based evaluation of our method in comparison with other common

copula estimation methods. In Section 5, we provide an application of our method to the analysis

of a bivariate stock-index copula structure. Section 6 concludes the article.

2 OUR APPROACH: DIRICHELET-BASED POLYA TREE

2.1 The Dirichlet-based Polya tree (D-P tree)

One natural way to extend the Polya tree to copula space is to adopt the more flexible Dirichlet

distribution for measure variables (Zε) in place of the much-constrained Beta distribution in the

classic PT. Here we first give the Dirichlet-based Polya tree a general definition:

Definition 1. Let Ω be a separable measurable space and Π = Bε be one of its measurable tree

partitions. A random probability measure P is said to have a Dirichlet-based Polya tree distribution,

or D-P tree prior, with parameters (Π,A), written P ∼ DPT (Π,A), if there exists non-negative

numbers A = αε and random variables Z = Zε such that the following hold:

• all the random vectors in Z are independent;

• for every m = 1, 2, . . . and every sequence ε = ε1ε2 . . . εm, Zε = (Zε0, . . . , Zεkε) ∼ Dirichlet(αε0, . . . , αεkε),

with Bε = ∪kεi=0Bεi and kε the number of subpartitions in Bε;

• for every ε, P(Bε=ε1ε2...εm) =(∏m

j=1 Zε1ε2...εj

).

The D-P tree prior still falls into the general class of tail-free process, as the random variables for

measures are independent across different partition levels. Yet rather than constraining on binary

partitions and beta distributions, the D-P tree adopts a more flexible partition structure and,

accordingly, the Dirichlet-distributed variables for the measures, which preserves similar properties

to the classic Polya tree prior.

Adapting the D-P tree prior to bivariate copula estimation, we constrain the D-P tree on

Ω = I = [0, 1] × [0, 1], with the quaternary dyadic partition Π = Bε0, Bε1, Bε2, Bε3, the hyper-

parametersA = αε0, αε1, αε2, αε3 and random variables (Zε0, Zε1, Zε2, Zε3) ∼ Dirichlet(αε0, αε1, αε2, αε3),

as illustrated in Figure 1. From now on, without further specification, we focus only on the D-P

tree prior with such a quaternary partition parametrization, though all results can be generalized.

3

Quaternary Partition

U

V

0 1 4 1 2 3 4 1

01

41

23

41

B0 B3

B2

B10

B12

B13

B110

B111 B112

B113

D−P Tree Measure Variables

U

V

0 1 4 1 2 3 4 1

01

41

23

41

Z0 Z3

Z2

Z10

Z12

Z13

Z110

Z111 Z112

Z113

Figure 1: The quaternary partition (left) on the support [0, 1]2 of a bivariate copula and the parametrizationof Dirichelet-based tree (D-P tree) prior (right).

2.2 Conjugacy and posterior updating

The D-P tree prior preserves the conjugacy property of original Polya tree, thus with P ∼ DPT (Π,A)

and an observation Y |P ∼ P, the posterior P|Y can be readily updated.

Proposition 1 (Conjugacy). Let P be a measure on I = [0, 1]×[0, 1], and an observation Y |P ∼ P.

Suppose P follows a D-P tree prior, as P ∼ DPT (Π,A), with the quaternary partition Π = Bε

and Dirichlet-distributed random variables Z = Zε and hyper-parameters A = αε0, αε1, αε2, αε3.

Then the posterior P|Y ∼ DPT (Π,A|Y ), where, for i = 0, 1, 2, 3,

αεi|Y =

αεi + 1 if Y ∈ Bεi,

αεi otherwise.

Proof: p(Z|Y ) ∝ p(Y |Z)p(Z) ∝∏∞j=1 Zε1...εj

∏Zαεε ∝

∏Zαε+IY ∈Bεε .

For N i.i.d. observations Y = (Y1, Y2, . . . , YN ), the posterior update for multiple observations

is rather intuitive and straightforward: at each level of the partitions, the hyper-parameter αε

associated with the specific partition Bε is incremented by the number of observations falling in

that partition, denoted by nε, where nε =∑N

i=1 IYi∈Bε . Simply put: αε|Y = αε + nε.

4

2.3 Copula estimation by the D-P tree prior

For the copula estimation, suppose we have N i.i.d. observations Y = (Y1, Y2, . . . , YN ) from an

unknown copula distribution C, i.e.,Y1, Y2, . . . , YNi.i.d.∼ C. We assume that C follows a D-P tree

prior, i.e., C ∼ DPT (Π,A), where we take Π to be the quaternary partition on the unit square

[0, 1]× [0, 1] and A = αε : αε1...εm = m2. By Proposition 1, the posterior C|Y ∼ DPT (Π,A|Y ),

where A|Y = α : αε1...εm = m2 + nε.

Therefore, the D-P tree posterior on copula strongly resembles the construction of a histogram

of the observations, but regularized by the imposed prior. Later we will show the choice of hyper-

parameters, as in P ∼ DPT (Π,A = αε : αε1...εm = m2), ensures generating absolutely continuous

measures centered on the uniform distribution, and thus the posterior then can be viewed as a

shrunk version of the histogram.

In practice, we approximate the infinite-level D-P tree prior with its M -level approximation P:

Definition 2. For a probability measure P such that P ∼ DPT (Π,A), with the same notations as

in Definition 1, its M -level approximation PM is for any measurable set B ∈ Bε=ε1ε2...εM ,

PM (B) =

M∏j=1

Zε1ε2...εj

µ(B)

µ(Bε=ε1ε2...εM ),

where µ is the uniform measure on Π.

3 PROPERTIES OF D-P TREE

3.1 Continuity of D-P tree prior

Here we show that the D-P tree prior inherits the feature of generating absolute continuous prob-

ability measures under certain constraints on the hyper-parameters A.

Proposition 2 (Absolute continuity). A D-P tree prior on I = [0, 1] × [0, 1] with the quaternary

partition Π = Bε and Dirichlet-distributed random variables Z = Zε and hyper-parameters

A = αε0, αε1, αε2, αε3 generates an absolute continuous probability measure on I with probability

one when hyper-parameters on the m-level αε1...εm ∝ O(m1+δ), δ > 0.

Further, with Y = (Y1, Y2, . . . , YN )|P i.i.d.∼ P, P ∼ DPT (Π,A), the posterior DPT (Π,A|Y )

also generates an absolute continuous probability measure with probability one.

5

The results follow from Theorem 1.121 and Lemma 1.124 in (Schervish 1995). Thereby, as we

implied earlier in Section 2.3, the canonical hyper-parameter choice that αε1...εm = m2 will indeed

lead to a D-P tree prior yielding absolutely continuous random probability measures, which justifies

the smoothing effect of the D-P tree prior in copula estimation.

3.2 Consistency of D-P tree posterior

Suppose we have N i.i.d. observations Y = Y1, . . . , YN generated from true copula distribution

C. For copula estimation, we assume Yi|Ci.i.d.∼ C, with a D-P tree prior C ∼ DPT (Π,A). Let

PM be the M-level approximation of C and set A as canonical, i.e., the m-level hyper-parameter

αε1...εm = m2.

For the approximated posterior PM |Y , we have the point-wise convergence to the target copula

distribution in terms of any measurable set in the unit square:

Proposition 3 (Point-wise convergence). For any measurable set B ⊂ I = [0, 1] × [0, 1], with

N ∝ O(M3+η), η > 0, then E((PM (B)|Y ) − C(B)) → 0, var(PM (B)|Y ) = O(MN ), therefore

PM (B)|Y p→ C(B).

If we put smoothness constraints on the target distribution, we can have similar convergence

results uniformly on I for the posterior, and further the consistency of the posterior.

Proposition 4 (Consistency). If C ∈ C1([0, 1]×[0, 1]), for B ⊂ I measurable, supB |E(P(B)M |Y )−

C| = maxO(

M√Nγ(M)

), O(

M3

Nγ(M)

); supB var(P(B)M |Y ) = O

(M

Nγ(M)

), where γ(M) ∼ minC(BM )>0C(BM ).

Further, with N ∝ O(210MM2+η), η > 0, ∀δ > 0 as M → ∞, P (dTV (PM , C) ≥ δ|Y ) → 0.

Note that dTV is the total variation distance between probability measures.

Specifically, we refine the order of convergence for several classic copula distributions, which, in

practice, may serve as general guidance for the choice of partition level M based on sample size N .

Proposition 5. The order requirement for the uniform convergence of specific target copulas:

1. For a lower-bounded copula density, i.e., c ≥ ξ > 0, γ(M) ≥ 2−2Mξ, thus N ∝ O(M2+η24M );

2. For a bivariate Gaussian copula, γ(M) ≥ Φ2(√

1− |ρ|Φ−1(2−M ))√

1−|ρ|1+|ρ| , thus N ∝ O(M2+η24M ).

Such convergence properties ensure the consistency of the estimation based on the D-P tree

prior, giving the D-P tree prior advantage over those family-based estimation methods under model

misspecification.

6

4 SIMULATION EXPERIMENTS

4.1 Evaluation: common copulas

To evaluate the performance of our copula estimation procedure, we conduct simulation studies

based on common copulas with various parameter settings, among which Gaussian, Student’s and

Gumbel are symmetric while the skew-normal is asymmetric.

For each simulation, the procedure is as follows: we first draw i.i.d. data samples from true

copula C with the size of N , denoted by Y ; then we follow the procedure described in Section 2.3

for the posterior inference on C; once posterior DPT (Π,A|Y ) is obtained, we draw 10,000 posterior

predictive samples from PM |Y to plot the scatterplots, shown in Figure 2. The plots come in pairs

with the left one showing i.i.d. draws from the true copula and the right one i.i.d. predictive draws

from the posterior D-P tree to compare. In most cases, our proposed D-P tree prior works well,

and the difference between our predictive density and the true copula is mild. Note that without

further clarification, all simulations are done with approximation level M = 10.

4.2 Comparison with existing methods

We compare our method with several existing non-parametric methods for copula estimation.

4.2.1 Comparison with non-parametric Bayesian methods

We first compare our method with the infinite Gaussian mixture copula model (Wu et al. 2014).

For copula distribution C, we have the prior C ∼∑∞

i=1wiCg(ρi), where Cg indicates the bivariate

Gaussian copula, and the weight wii.i.d.∼ U [0, 1] and the correlation ρi

i.i.d.∼ U [−1, 1]. Such a model

is the most common one among existing non-parametric Bayesian methods which focus on mixture

models based on a specific copula family.

Here, we focus on the non-symmetric skew-normal copulas as the data generating copulas. The

simulations are carried out with the sample size varying from N = 1, 000 to N = 100, 000, and

the K-L divergences of the estimates from the true target copula distribution for both methods are

calculated with Monte Carlo method. We report in Table 1 the cases where the skew-normal copula

is highly non-symmetric (α = (100,−100)), thus the Gaussian mixture model heavily misspecified.

The D-P tree shows a gradually increasing advantage as the data size increases. The inconsistency

issue of the Gaussian mixture model reveals, as its K-L divergence from the data-generating model

remains stable (0.17, 0.16) as sample size increases, while the converging trend for the D-P tree

7

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ρ = 0.5G

auss

ian

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

D−P tree

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ρ = 0.9

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

D−P tree

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ρ = 0.5 ν = 1

Stud

ent's

t

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

D−P tree

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ρ = 0.9 ν = 4

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

D−P tree

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

a = 2

Gum

bel

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

D−P tree

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

a = 4

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

D−P tree

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ρ = 0.5 α = (2, 0)

Skew−n

orm

al

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

D−P tree

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ρ = 0.9 α = (−10, 50)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

D−P tree

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ρ = 0.5 α = (100, −100)

Skew−n

orm

al

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

D−P tree

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ρ = 0.9 α = (100, −100)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

D−P tree

Figure 2: Scatterplots of i.i.d. draws from the true copula distribution (left) vs. the D-P tree posterior(right): sample size N = 10, 000, partition level M = 10.

8

posterior is evident.

ρ αN

1,000 10,000 100,000D-P GM D-P GM D-P GM

0.50 (100,-100) 0.26 0.17 0.14 0.17 0.07 0.170.90 (100,-100) 0.46 0.16 0.20 0.16 0.08 0.16

Table 1: Comparison of the K-L divergence between the D-P tree (D-P) and the Gaussian mixture (GM)model for highly non-symmetric skew-normal target copulas.

4.2.2 Comparison with non-parametric frequentist methods

We select three classic non-parametric methods in frequentist settings in comparison with our D-P

tree. Suppose Yi = (Ui, Vi)i.i.d.∼ C:

• The empirical estimator: Cemp(u, v) = 1N

∑Ni=1 IUi≤uIVi≤v.

• The histogram estimator: Chist(Bε) = nεN , where Bε is the partition at the highest level.

• The independent Gaussian kernel estimator (Jaworski et al. 2010):

Cker(u, v) =1

N

N∑i=1

Φ

Φ−1(u)− Φ−1(Ui)

h

Φ

Φ−1(v)− Φ−1(Vi)

h

, (1)

where we make the choice of h = N−15 , consistent with Silverman’s rule of thumb for the

choice of window width.

• The D-P tree posterior mean estimator: for a fair comparison, we use the mean distribution

from the D-P tree posterior as the Bayesian estimator by the D-P tree, i.e., CD−P = E(C|Y ).

We define several measurements for the distance between the estimator and the target distri-

bution. For density estimation, besides the K-L divergence, we also include the commonly adopted

the MISE (Mean Integrated Squared Error) based on the averaged L2-norm between the estimated

density function and the truth: MISE(c) = E[∫∫

[0,1]×[0,1]c(u, v)− c(u, v)2du dv]. Here c is the

target copula density and c is its estimator.

For the distance measurement of the distribution, we extend the MISE for density to the

MISEC : MISEC(C) = E[∫∫

[0,1]×[0,1]C(u, v)− C(u, v)2du dv], where C is the estimated copula

function, C is the true copula.

9

We also have a distance measure specifically targeting the grid-based estimation methods, the

MSEg: MSEg(C) = E[

122M

∑2M

i,j=1C(Bij)− C(Bij)2], where Bij are partitions on [0, 1] ×

[0, 1], and M is the maximum partition level. Note that all the expectations in the measures

defined above are taken over all possible data samples

The simulations are carried out with the sample size varying from N = 10 to N = 10, 000 for a

good look at the convergence trend. We again focus mainly on heavily non-symmetric skew-normal

copulas (α = (−50, 10), (100,−100)). For each parameter setting, we first draw N i.i.d. samples

from the true copula distribution, obtain the copula estimates by three frequentist methods and the

D-P tree posterior mean estimator; then we repeat this process 50 times to obtain the Monte Carlo

approximation of the measures as defined above. Note that for the empirical copula estimation,

the estimated distribution is discrete, thus the density distance measures not applicable; for the

histogram estimator, due to the discrepancy in the supports between the target and the estimated

distributions, the K-L divergence is not applicable. To ensure computational efficiency, we report

the results based on the approximation level M = 8, and to maintain comparability, we take the

same maximum partition level for the histogram estimation method. Here we report mainly the

results under the parameter setting ρ = 0.5, α = (100,−100) in Table 2 as exemplary for our

conclusions.

NK-L

√MISE

D-P Tree Empirical Kernel Hist. D-P Tree Empirical Kernel Hist.

10 0.528 NA 0.528 Inf 1.365 NA 2.190 71.78820 0.473 NA 0.428 Inf 1.163 NA 2.657 56.72650 0.386 NA 0.314 Inf 1.050 NA 1.177 36.757

100 0.349 NA 0.261 Inf 1.159 NA 1.347 25.723500 0.222 NA 0.166 Inf 1.072 NA 1.398 11.665

1,000 0.184 NA 0.136 Inf 0.894 NA 0.703 8.0785,000 0.112 NA 0.090 Inf 0.703 NA 0.516 3.601

10,000 0.089 NA 0.076 Inf 0.701 NA 0.769 2.600√MISEC

√MSEg

10 0.072 0.118 0.091 0.117 0.054 0.321 0.054 0.32120 0.065 0.082 0.068 0.082 0.054 0.230 0.055 0.23050 0.044 0.057 0.050 0.057 0.054 0.151 0.054 0.151

100 0.037 0.041 0.038 0.041 0.054 0.113 0.054 0.113500 0.018 0.017 0.017 0.017 0.053 0.070 0.053 0.070

1,000 0.013 0.012 0.013 0.012 0.053 0.062 0.053 0.0625,000 0.007 0.006 0.007 0.006 0.053 0.055 0.053 0.055

10,000 0.005 0.004 0.005 0.004 0.053 0.054 0.053 0.054

Table 2: Comparison of various distance measures between the D-P tree posterior mean estimator andfrequentist estimators for the skew-normal copula with parameter ρ = 0.5, α = (100,−100).

10

In general, the D-P tree posterior mean estimator performs competitively well compared with

all three frequentist non-parametric methods and consistently across various measures. Notably,

the D-P tree posterior estimation appears advantageous over other methods with small sets of

observations, showcasing a preferably strong smoothing effect induced by the D-P tree prior.

Both the D-P tree and the kernel estimation show drastic advantages in copula density estima-

tion over empirical and histogram methods, as the empirical copula fails to yield density estimator

and the histogram estimator gives severely poor density approximation due to the discrepancy in

the support. Though both methods take advantage of the smoothing effect in estimation density,

under the MISE measurement, the D-P tree dominates kernel method across almost all sample

sizes while giving close figures under the K-L divergence.

As for copula distribution estimation, the D-P tree shows a strong advantage over other methods

in both measures under scenarios of smaller sample size, which may attribute to the more favorable

continuity feature of the D-P tree prior. When the sample size increases, the neutralizing effect of

the D-P tree slows down the convergence of the posterior, and thereby, the empirical and histogram

estimators catch up in figures. Yet still, up to N = 10, 000, the D-P tree gives close distances as

the empirical and the histogram methods, and consistently dominates the kernel method.

5 REAL DATA APPLICATION

For real data analysis, we apply our method to the S&P 500 daily index and the IBM daily

stock prices over the past 20 years (Jan 1, 1994 to Dec 31, 2014) and aim to estimate their

dependence structure with the copula model. We adopt the rolling prediction schemes to evaluate

the performance of our method, as described below in detail.

5.1 Rolling prediction

To mimic the practical prediction scenario, we also evaluate the prediction power of our method

under the time-rolling prediction scheme, that is, we predict the future copula structure within a

certain window of time based on the most recent observations.

Let the joint daily prices for two stocks be (y1i , y2i ), i = 1, . . . , T, where T = 5, 288, and

the returns of log price rji = log yji − log yji−1, i = 2 . . . T, j = 1, 2. Marginally, we fit the

commonly adopted GARCH(1,1) model: rji = σji εji , (σji )

2 = αj0 + αj1(σji−1)

2 + βj1(εji−1)

2, where

the innovations εji are independent with E(εji ) = 0 and var(εji ) = 1. Further, we assume the

11

distribution of the innovations is time-invariant and put the copula model on their joint distribution

F (ε1i , ε2i ) = C(F 1(ε1i ), F

2(ε2j )), where (ε1i , ε2i )

i.i.d.∼ F , and F 1 and F 2 are the marginal distributions.

Specifically, we set a training length of Ttr, a testing set length of Tte, a rolling estimation window

of length te, and a prediction window of length tp. Firstly, we use the daily price time series of

the two stocks y1t : t = 1, . . . , Ttr and y2t , t = 1, . . . , Ttr as the training set for the marginal

GARCH-model fitting. Consistent with common practical prediction scenarios, we fix such fitted

GARCH model and obtain the fitted innovations from the training set (ε1t , ε2t ), t = 1, . . . , Ttr, and

the predicted innovations from the test set (ε1t , ε2t ), t = Ttr + 1, . . . , Ttr + Tte. Then, we conduct

the rolling prediction of the copula structure based on these estimates. For each rolling step, we

apply the proposed D-P tree-based method with both the canonical non-informative prior and

the historic-data-induced prior to the most recent te-fitted/predicted innovations and estimate the

future copula structure of length tp. Here we implicitly assume the i.i.d. property of the innovations

within the estimation and prediction windows combined of length (te + tp). This is reasonable in

that the copula structure is usually stable within a certain length of time. We repeat such rolling

prediction Tte/tp times until the whole testing length (Ttr + 1 to Ttr + Tte) is covered.

We focus on the data of the period covering the 2007-08 financial crisis (i.e., the testing set cov-

ering July, 2007 to July, 2009) to highlight the flexibility and robustness of non-parametric methods

over traditional parametric models. We set Ttr = 500, Tte = 500, and vary te ∈ 10, 20, 50, 100, 250,

tp ∈ 1, 50 and report both the average log-likelihood 1Tte

∑Ttet=1 log ct (equivalent to negative KL

divergence plus a constant), and the square root of average MISEC = 1Tte

∑Ttet=1MISEC(Ct)

as the measures for prediction accuracy (Table 3). Note that for historic-data-based D-P tree

prior, we adopt the posterior of a canonical D-P tree prior updated by the data from testing set

(i = 1, . . . , Ttr− te) with each down-weighted by 0.1. We also carry out the same prediction scheme

with other four methods for comparison.

Generally, both the D-P-tree-based methods show strong advantages over other methods by

the log-likelihood loss in almost all settings, and by√MISEC under a longer prediction window

tp = 50 (where the distribution-based measure√MISEC is more valid) and a larger prediction

set te ≥ 50. Such results verify the robustness and adaptiveness of the D-P tree-based methods

to irregular market behaviors when classic parametric models are terribly misspecified. Further,

by incorporating the historic data into the prior, the D-PTw method enjoys a strong boost in

prediction accuracy, and dominates other methods in most of the scenarios. Admittedly, more

data are used by the D-PTw for inference than other methods in comparison. Yet it is exactly the

12

te tpAverage log-likelihood

√MISEC

D-PT D-PTw Emp. Kernel Gauss. t D-PT D-PTw Emp. Kernel Gauss. t10 1 -0.002 0.133 NA -0.052 0.094 0.086 0.312 0.300 0.338 0.305 0.300 0.30120 1 0.046 0.135 NA 0.030 0.141 0.139 0.310 0.301 0.328 0.305 0.299 0.30050 1 0.096 0.141 NA 0.044 0.141 0.143 0.310 0.304 0.322 0.306 0.299 0.299

100 1 0.155 0.176 NA 0.096 0.154 0.160 0.309 0.306 0.318 0.306 0.298 0.299250 1 0.173 0.178 NA 0.105 0.153 0.158 0.306 0.306 0.312 0.304 0.298 0.29810 50 0.023 0.138 NA -0.075 -0.340 -0.128 0.082 0.062 0.123 0.099 0.078 0.07420 50 0.051 0.137 NA 0.003 0.009 0.028 0.082 0.064 0.102 0.091 0.071 0.07150 50 0.113 0.156 NA 0.058 0.106 0.113 0.066 0.060 0.070 0.068 0.066 0.066

100 50 0.155 0.173 NA 0.067 0.137 0.139 0.060 0.058 0.063 0.062 0.063 0.063250 50 0.175 0.181 NA 0.108 0.150 0.158 0.057 0.057 0.058 0.058 0.061 0.061

Table 3: Comparison of the prediction performance in the average log-likelihood (the higher the numbers,the better the prediction) and the MISEC (the lower, the better) between various methods: the D-P treeposterior mean with the canonical prior (D-PT), the D-P tree with the historic-data-induced prior (D-PTw),the empirical copula (Emp.), the kernel estimator (Kernel), the Gaussian copula (Gauss.) and the Student’st copula (t) models.

showcase of the strength of Bayesian methods where historic or empirical information is readily

concocted into priors to help.

6 CONCLUSION

The proposed Dirichlet-based Polya tree (D-P tree) prior preserves properties including conjugacy,

continuity and convergence as the classic Polya tree, which provides a foundation for non-parametric

copula estimation under the Bayesian framework. Compared with other Bayesian copula estima-

tion methods, the D-P tree prior exhibits strength in robustness and consistency, overcoming the

inconsistency issue of the family-based mixture model under misspecification. In comparison with

the non-parametric methods under the frequentist settings, the D-P tree posterior mean estimator

performs competitively well and rather stably across various distance measures. Notably, with a

small sample size, the D-P tree copula estimator is advantageous in estimation accuracy, which

may imply its potential in higher-dimensional cases where observations are heavily diluted.

REFERENCES

Behnen, K., Huskova, M., and Neuhaus, G. (1985), “Rank Estimators of Scores for Testing Inde-

pendence,” Statistics & Risk Modeling, 3, 239–262.

Chen, S. X. and Huang, T.-M. (2007), “Nonparametric Estimation of Copula Functions for Depen-

dence Modelling,” Canadian Journal of Statistics, 35, 265–282.

Deheuvels, P. (1979), “La Fonction de Dependance Empirique et Ses Proprietes. Un Test Non

13

Parametrique d’Independance,” Academie Royale de. Belgique. Bulletin de la Classe des Sciences.

6e Serie., 65, 274–292.

Devroye, L. and Gyorfi, L. (1985), Nonparametric Density Estimation: the L1 View, vol. 119 of

Wiley Series in Probability and Statistics, New York, NY: Wiley.

Gasser, T. and Muller, H.-G. (1979), Smoothing Techniques for Curve Estimation, Heidelberg,

Germany: Springer, chap. Kernel Estimation of Regression Functions, pp. 23–68.

Genest, C., Ghoudi, K., and Rivest, L.-P. (1995), “A Semiparametric Estimation Procedure of

Dependence Parameters in Multivariate Families of Distributions,” Biometrika, 82, 543–552.

Gijbels, I. and Mielniczuk, J. (1990), “Estimating the Density of a Copula Function,” Communi-

cations in Statistics-Theory and Methods, 19, 445–464.

Hominal, P. and Deheuvels, P. (1979), “Estimation Non Parametrique de la Densite Compte-tenu

d’Informations sur le support,” Revue de Statistique Appliquee, 27, 47–68.

Jaworski, P., Durante, F., Hardle, W. K., and Rychlik, T. (2010), Copula Theory and Its Applica-

tions, Heidelberg, Germany: Springer.

Joe, H. (1997), Multivariate Models and Multivariate Dependence Concepts, Boca Raton, FL: CRC

Press.

John, R. (1984), “Boundary Modification for Kernel Regression,” Communications in Statistics-

Theory and Methods, 13, 893–900.

Muller, H.-G. (1991), “Smooth Optimum Kernel Estimators Near Endpoints,” Biometrika, 78,

521–530.

Nelsen, R. B. (2007), An Introduction to Copulas, New York, NY: Springer.

Oakes, D. (1982), “A Model for Association in Bivariate Survival Data,” Journal of the Royal

Statistical Society. Series B (Methodological), 44, 414–422.

— (1986), “Semiparametric Inference in a Model for Association in Bivanate Survival Data,”

Biometrika, 73, 353–361.

Scaillet, O., Charpentier, A., and Fermanian, J.-D. (2007), “The Estimation of Copulas: Theory

and Practice,” Copulas: from Theory to Applications in Finance, 35–62.

14

Schervish, M. J. (1995), Theory of Statistics, New York, NY: Springer.

Schuster, E. F. (1985), “Incorporating Support Constraints into Nonparametric Estimators of Den-

sities,” Communications in Statistics-Theory and Methods, 14, 1123–1136.

Schweizer, B. and Wolff, E. F. (1981), “On Nonparametric Measures of Dependence for Random

Variables,” The Annals of Statistics, 9, 879–885.

Sklar, A. (1959), Fonctions de Repartition a n Dimensions et Leurs Marges, Paris, France: Uni-

versite Paris 8.

Wu, J., Wang, X., and Walker, S. G. (2014), “Bayesian Nonparametric Inference for a Multivariate

Copula Function,” Methodology and Computing in Applied Probability, 16, 747–763.

15