Copula information criterion for model selection with two ... ... Copula information criterion for...
Embed Size (px)
Transcript of Copula information criterion for model selection with two ... ... Copula information criterion for...
Copula information criterion for model selection
with two-stage maximum likelihood estimation
Vinnie Ko∗, Nils Lid Hjort Department of Mathematics, University of Oslo
PB 1053, Blindern, NO-0316 Oslo, Norway
In parametric copula setups, where both the margins and copula have parametric forms, two-stage
maximum likelihood estimation, often referred to as inference functions for margins, is used as an at-
tractive alternative to the full maximum likelihood estimation strategy. Exploiting basic results derived
earlier by the present authors, we develop a copula information criterion (CIC) for model selection. The
CIC is defined as CIC = 2�n(�η)− 2�p∗η, where �n(�η) is the maximized log-likelihood under the two-stage maximum likelihood estimation scheme, with η the full parameter vector for the candidate model in
question, and �p∗η is a certain penalization factor. In a nutshell, CIC aims for the model that minimizes the Kullback–Leibler divergence from the real data generating mechanism. CIC does not assume that
the chosen parametric model captures this true model, unlike what is assumed for AIC. In this sense
CIC is analogous to the Takeuchi Information Criterion (TIC), which is defined for the full maximum
likelihood. If we make an additional assumption that a candidate model is correctly specified, then CIC
for that model simplifies to AIC. Further, since both CIC and TIC are estimating the same part of the
Kullback–Leibler divergence, they are compatible, in the sense that they can be used to compare the
performance of full maximum-likelihood and two-stage maximum likelihood for a given model. Addi-
tionally, we show that CIC can easily be extended to the conditional copula setup where covariates are
parametrically linked to the copula model.
As a numerical illustration, we perform a simulation and find that CIC outperforms AIC in terms of
prediction performance from the selected models. However, as sample size grows, the difference between
CIC and AIC becomes minimal because the log-likelihood part outgrows the bias correction part. Further,
we learn from the simulation that �p∗η, the bias correction term of CIC, has a strong positive relationship with the prediction performance of the model. So, a model with bad prediction performance is being
penalized more by CIC.
Keywords: copula, Akaike information criterion, copula information criterion, model robust, two-stage max-
imum likelihood, inference functions for margins,
∗Corresponding author. E-mail addresses: email@example.com (V. Ko), firstname.lastname@example.org (N.L. Hjort)
1 Introduction and copula models
One of the main practical issues in copula modeling is model selection. In the full parametric setup, where
both the copula and margins are assumed to have a parametric form, one often has multiple candidates for
both the copula and margins. As the dimension of the model increases, a list of possible combinations of
margins and copula grows rapidly. Hence, there is a need for a model selection criterion that can evaluate
each model systematically according to certain philosophy or criteria and assign a score to each model. In
the end, one would choose the model with the best score.
Throughout this paper, we consider the full parametric setup. In this setup, one can simultaneously
estimate all parameters of the model (i.e. both copula parameters and margin parameters) by using maximum
likelihood (ML) estimation. In this ML estimation framework, one can for instance use AICML (Akaike, 1974)
or TIC (also known as model-robust AICML) (Takeuchi, 1976) as model selection criterion and select the
model with the best score. (Note that we denote the AIC under ML estimation as AICML to distinguish it
from the two-stage ML based AIC2ML, which we will derive in Section 2.3.) However, when the dimension
of the copula model gets high, the number of parameters increases quickly and the ML estimation is not
always feasible in terms of speed and numerical stability. Two-stage maximum likelihood (two-stage ML)
estimation, also often referred to as inference functions for margins (IFM), is a popular alternative estimation
strategy that is designed to overcome these drawbacks of the ML estimation. In stage 1 of the two-stage ML
estimation, the parameter vectors of each marginal distribution are estimated separately by ML. In stage
2, the estimates from stage 1 are plugged into the log-likelihood of the model. Then, the parameters of the
copula, which are now the only unknown parameters, are estimated by using ML estimation again. One of
the advantages of this multi-stage approach is that it is computation-wise much faster than estimating all
parameters simultaneously, because it does not have to search for the global maximum in high-dimensional
space. A drawback of the two-stage ML estimation method, however, is that we cannot use the classical
results based on ML estimation, which include model selection criteria such as TIC and BIC.
In practice, different sorts of goodness-of-fit testing are often used as substitutes, to choose the best
model (Genest & Favre, 2007). Another often used model selection strategy for the two-stage ML is that
one first evaluates candidates of each marginal distribution with AICML and consequently chooses the best
distribution for each margin. Once the margins are chosen, one fits different copulae and evaluates the copula
part with AICML. However, this piecewise model evaluation cannot evaluate the model as a whole.
In this paper, we develop the copula information criterion (CIC) for two-stage ML estimation, which has
CIC = 2�n(�η)− 2�p∗η.
Here �n(�η) is the maximized log-likelihood with the two-stage ML estimation method, in terms of the full parameter vector η of the model in question, and �p∗η is a suitable penalization factor, worked out in Section 2.2. The main advantage of CIC is that it can evaluate a parametric copula with parametric margins as a whole.
CIC is also a model-robust model selection criterion which means that it does not assume that the candidate
model contains the true model. As the overlap of the name already suggests, our CIC is analogous to CIC
from Grønneberg & Hjort (2014), which is designed for copulae estimated with pseudo maximum likelihood
(PML). In PML framework, margins are estimated empirically, while two-stage ML assumes parametric
forms of margins.
Our technical setting, identical to Ko & Hjort (2018), is as follows. Let (Y1, · · · , Yd)T be a d-variate continuous stochastic variable originating from a joint density g(y1, · · · , yd) and let yi = (yi,1, · · · , yi,d)T, for i = 1, . . . , n, be independent observations of this variable. The true joint distribution g is typically
unknown. Let f(y1, · · · , yd, η) be our parametric approximation of g, with the parameter vector η, belonging to some connected subset of the appropriate Euclidean space. In addition, G and F (·, η) indicate cumulative distribution functions corresponding to g and f(·, η), respectively. Here Gj(yj) and Fj(yj ,αj) indicate j-th marginal distribution functions corresponding to G and F (·, η) respectively, with αj as the parameter vector belonging to margin component j.
According to Sklar’s theorem (Sklar, 1959), there always exists a copula C(u1, . . . , ud, θ) that satisfies
F (y1, · · · , yd, η) = C(F1(y1,α1), · · · , Fd(yd,αd), θ)
where the full parameter vector η is now blocked as
η = (αT, θT)T = (αT1 , · · · ,αdT, θT)T.
By assuming the regulatory conditions from Ko & Hjort (2018), C(·, θ) can be differentiated,
f(y1, · · · , yd, η) = c (F1(y1,α1), · · · , Fd(yd,αd), θ) d�
where c(u1, . . . , ud) = ∂ dC(u1, . . . , ud, θ)/∂u1 · · · ∂ud and fj(yj ,αj) = ∂Fj(yj ,αj)/∂yj . For further details of
copula modeling, see Joe (1997) and Nelsen (2006). Analogously, the true density g can also be decomposed
into marginal densities and the copula density
g(y1, · · · , yd) = c0 (G1(y1), · · · , Gd(yd)) d�
with c0(·) the true copula. The further structure of this paper is as follows. In Section 2.1, we briefly explain Kullback–Leibler
divergence and its relationship to TIC and AICML. In Section 2.2, we derive and define our copula information
criterion. In Section 2.3, we prove that the AIC2ML formula holds under the two-stage ML estimation. In
Section 2.4, we summarize the relationship between TIC, CIC, AICML and AIC2ML. In Section 2.5, we
illustrate what CIC looks like in the two-dimensional setting and show how CIC easily can be extended
to the conditional copula setting. In Section 3, we study the numerical behavior of those model selection
criteria. In our final Section 4, we offer a few concluding remarks and suggestions for future research.
2 The copula information criterion for two-stage maximum likeli-
2.1 Kullback–Leibler divergence
The Kullback–Leibler (KL) divergence from g to f measures how the density f diverges from g (Kullback
& Leibler, 1951) and is defined as