# Copula information criterion for model selection with two ... ... Copula information criterion for...

date post

18-Jan-2021Category

## Documents

view

2download

0

Embed Size (px)

### Transcript of Copula information criterion for model selection with two ... ... Copula information criterion for...

Copula information criterion for model selection

with two-stage maximum likelihood estimation

Vinnie Ko∗, Nils Lid Hjort Department of Mathematics, University of Oslo

PB 1053, Blindern, NO-0316 Oslo, Norway

February 2018

Abstract

In parametric copula setups, where both the margins and copula have parametric forms, two-stage

maximum likelihood estimation, often referred to as inference functions for margins, is used as an at-

tractive alternative to the full maximum likelihood estimation strategy. Exploiting basic results derived

earlier by the present authors, we develop a copula information criterion (CIC) for model selection. The

CIC is defined as CIC = 2�n(�η)− 2�p∗η, where �n(�η) is the maximized log-likelihood under the two-stage maximum likelihood estimation scheme, with η the full parameter vector for the candidate model in

question, and �p∗η is a certain penalization factor. In a nutshell, CIC aims for the model that minimizes the Kullback–Leibler divergence from the real data generating mechanism. CIC does not assume that

the chosen parametric model captures this true model, unlike what is assumed for AIC. In this sense

CIC is analogous to the Takeuchi Information Criterion (TIC), which is defined for the full maximum

likelihood. If we make an additional assumption that a candidate model is correctly specified, then CIC

for that model simplifies to AIC. Further, since both CIC and TIC are estimating the same part of the

Kullback–Leibler divergence, they are compatible, in the sense that they can be used to compare the

performance of full maximum-likelihood and two-stage maximum likelihood for a given model. Addi-

tionally, we show that CIC can easily be extended to the conditional copula setup where covariates are

parametrically linked to the copula model.

As a numerical illustration, we perform a simulation and find that CIC outperforms AIC in terms of

prediction performance from the selected models. However, as sample size grows, the difference between

CIC and AIC becomes minimal because the log-likelihood part outgrows the bias correction part. Further,

we learn from the simulation that �p∗η, the bias correction term of CIC, has a strong positive relationship with the prediction performance of the model. So, a model with bad prediction performance is being

penalized more by CIC.

Keywords: copula, Akaike information criterion, copula information criterion, model robust, two-stage max-

imum likelihood, inference functions for margins,

∗Corresponding author. E-mail addresses: vinniebk@math.uio.no (V. Ko), nils@math.uio.no (N.L. Hjort)

1

1 Introduction and copula models

One of the main practical issues in copula modeling is model selection. In the full parametric setup, where

both the copula and margins are assumed to have a parametric form, one often has multiple candidates for

both the copula and margins. As the dimension of the model increases, a list of possible combinations of

margins and copula grows rapidly. Hence, there is a need for a model selection criterion that can evaluate

each model systematically according to certain philosophy or criteria and assign a score to each model. In

the end, one would choose the model with the best score.

Throughout this paper, we consider the full parametric setup. In this setup, one can simultaneously

estimate all parameters of the model (i.e. both copula parameters and margin parameters) by using maximum

likelihood (ML) estimation. In this ML estimation framework, one can for instance use AICML (Akaike, 1974)

or TIC (also known as model-robust AICML) (Takeuchi, 1976) as model selection criterion and select the

model with the best score. (Note that we denote the AIC under ML estimation as AICML to distinguish it

from the two-stage ML based AIC2ML, which we will derive in Section 2.3.) However, when the dimension

of the copula model gets high, the number of parameters increases quickly and the ML estimation is not

always feasible in terms of speed and numerical stability. Two-stage maximum likelihood (two-stage ML)

estimation, also often referred to as inference functions for margins (IFM), is a popular alternative estimation

strategy that is designed to overcome these drawbacks of the ML estimation. In stage 1 of the two-stage ML

estimation, the parameter vectors of each marginal distribution are estimated separately by ML. In stage

2, the estimates from stage 1 are plugged into the log-likelihood of the model. Then, the parameters of the

copula, which are now the only unknown parameters, are estimated by using ML estimation again. One of

the advantages of this multi-stage approach is that it is computation-wise much faster than estimating all

parameters simultaneously, because it does not have to search for the global maximum in high-dimensional

space. A drawback of the two-stage ML estimation method, however, is that we cannot use the classical

results based on ML estimation, which include model selection criteria such as TIC and BIC.

In practice, different sorts of goodness-of-fit testing are often used as substitutes, to choose the best

model (Genest & Favre, 2007). Another often used model selection strategy for the two-stage ML is that

one first evaluates candidates of each marginal distribution with AICML and consequently chooses the best

distribution for each margin. Once the margins are chosen, one fits different copulae and evaluates the copula

part with AICML. However, this piecewise model evaluation cannot evaluate the model as a whole.

In this paper, we develop the copula information criterion (CIC) for two-stage ML estimation, which has

the form

CIC = 2�n(�η)− 2�p∗η.

Here �n(�η) is the maximized log-likelihood with the two-stage ML estimation method, in terms of the full parameter vector η of the model in question, and �p∗η is a suitable penalization factor, worked out in Section 2.2. The main advantage of CIC is that it can evaluate a parametric copula with parametric margins as a whole.

CIC is also a model-robust model selection criterion which means that it does not assume that the candidate

model contains the true model. As the overlap of the name already suggests, our CIC is analogous to CIC

from Grønneberg & Hjort (2014), which is designed for copulae estimated with pseudo maximum likelihood

(PML). In PML framework, margins are estimated empirically, while two-stage ML assumes parametric

forms of margins.

2

Our technical setting, identical to Ko & Hjort (2018), is as follows. Let (Y1, · · · , Yd)T be a d-variate continuous stochastic variable originating from a joint density g(y1, · · · , yd) and let yi = (yi,1, · · · , yi,d)T, for i = 1, . . . , n, be independent observations of this variable. The true joint distribution g is typically

unknown. Let f(y1, · · · , yd, η) be our parametric approximation of g, with the parameter vector η, belonging to some connected subset of the appropriate Euclidean space. In addition, G and F (·, η) indicate cumulative distribution functions corresponding to g and f(·, η), respectively. Here Gj(yj) and Fj(yj ,αj) indicate j-th marginal distribution functions corresponding to G and F (·, η) respectively, with αj as the parameter vector belonging to margin component j.

According to Sklar’s theorem (Sklar, 1959), there always exists a copula C(u1, . . . , ud, θ) that satisfies

F (y1, · · · , yd, η) = C(F1(y1,α1), · · · , Fd(yd,αd), θ)

where the full parameter vector η is now blocked as

η = (αT, θT)T = (αT1 , · · · ,αdT, θT)T.

By assuming the regulatory conditions from Ko & Hjort (2018), C(·, θ) can be differentiated,

f(y1, · · · , yd, η) = c (F1(y1,α1), · · · , Fd(yd,αd), θ) d�

j=1

fj(yj ,αj),

where c(u1, . . . , ud) = ∂ dC(u1, . . . , ud, θ)/∂u1 · · · ∂ud and fj(yj ,αj) = ∂Fj(yj ,αj)/∂yj . For further details of

copula modeling, see Joe (1997) and Nelsen (2006). Analogously, the true density g can also be decomposed

into marginal densities and the copula density

g(y1, · · · , yd) = c0 (G1(y1), · · · , Gd(yd)) d�

j=1

gj(yj),

with c0(·) the true copula. The further structure of this paper is as follows. In Section 2.1, we briefly explain Kullback–Leibler

divergence and its relationship to TIC and AICML. In Section 2.2, we derive and define our copula information

criterion. In Section 2.3, we prove that the AIC2ML formula holds under the two-stage ML estimation. In

Section 2.4, we summarize the relationship between TIC, CIC, AICML and AIC2ML. In Section 2.5, we

illustrate what CIC looks like in the two-dimensional setting and show how CIC easily can be extended

to the conditional copula setting. In Section 3, we study the numerical behavior of those model selection

criteria. In our final Section 4, we offer a few concluding remarks and suggestions for future research.

3

2 The copula information criterion for two-stage maximum likeli-

hood estimation

2.1 Kullback–Leibler divergence

The Kullback–Leibler (KL) divergence from g to f measures how the density f diverges from g (Kullback

& Leibler, 1951) and is defined as

*View more*