Copula information criterion for model selection with two ... ... Copula information criterion for...

Click here to load reader

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Copula information criterion for model selection with two ... ... Copula information criterion for...

  • Copula information criterion for model selection

    with two-stage maximum likelihood estimation

    Vinnie Ko∗, Nils Lid Hjort Department of Mathematics, University of Oslo

    PB 1053, Blindern, NO-0316 Oslo, Norway

    February 2018


    In parametric copula setups, where both the margins and copula have parametric forms, two-stage

    maximum likelihood estimation, often referred to as inference functions for margins, is used as an at-

    tractive alternative to the full maximum likelihood estimation strategy. Exploiting basic results derived

    earlier by the present authors, we develop a copula information criterion (CIC) for model selection. The

    CIC is defined as CIC = 2�n(�η)− 2�p∗η, where �n(�η) is the maximized log-likelihood under the two-stage maximum likelihood estimation scheme, with η the full parameter vector for the candidate model in

    question, and �p∗η is a certain penalization factor. In a nutshell, CIC aims for the model that minimizes the Kullback–Leibler divergence from the real data generating mechanism. CIC does not assume that

    the chosen parametric model captures this true model, unlike what is assumed for AIC. In this sense

    CIC is analogous to the Takeuchi Information Criterion (TIC), which is defined for the full maximum

    likelihood. If we make an additional assumption that a candidate model is correctly specified, then CIC

    for that model simplifies to AIC. Further, since both CIC and TIC are estimating the same part of the

    Kullback–Leibler divergence, they are compatible, in the sense that they can be used to compare the

    performance of full maximum-likelihood and two-stage maximum likelihood for a given model. Addi-

    tionally, we show that CIC can easily be extended to the conditional copula setup where covariates are

    parametrically linked to the copula model.

    As a numerical illustration, we perform a simulation and find that CIC outperforms AIC in terms of

    prediction performance from the selected models. However, as sample size grows, the difference between

    CIC and AIC becomes minimal because the log-likelihood part outgrows the bias correction part. Further,

    we learn from the simulation that �p∗η, the bias correction term of CIC, has a strong positive relationship with the prediction performance of the model. So, a model with bad prediction performance is being

    penalized more by CIC.

    Keywords: copula, Akaike information criterion, copula information criterion, model robust, two-stage max-

    imum likelihood, inference functions for margins,

    ∗Corresponding author. E-mail addresses: (V. Ko), (N.L. Hjort)


  • 1 Introduction and copula models

    One of the main practical issues in copula modeling is model selection. In the full parametric setup, where

    both the copula and margins are assumed to have a parametric form, one often has multiple candidates for

    both the copula and margins. As the dimension of the model increases, a list of possible combinations of

    margins and copula grows rapidly. Hence, there is a need for a model selection criterion that can evaluate

    each model systematically according to certain philosophy or criteria and assign a score to each model. In

    the end, one would choose the model with the best score.

    Throughout this paper, we consider the full parametric setup. In this setup, one can simultaneously

    estimate all parameters of the model (i.e. both copula parameters and margin parameters) by using maximum

    likelihood (ML) estimation. In this ML estimation framework, one can for instance use AICML (Akaike, 1974)

    or TIC (also known as model-robust AICML) (Takeuchi, 1976) as model selection criterion and select the

    model with the best score. (Note that we denote the AIC under ML estimation as AICML to distinguish it

    from the two-stage ML based AIC2ML, which we will derive in Section 2.3.) However, when the dimension

    of the copula model gets high, the number of parameters increases quickly and the ML estimation is not

    always feasible in terms of speed and numerical stability. Two-stage maximum likelihood (two-stage ML)

    estimation, also often referred to as inference functions for margins (IFM), is a popular alternative estimation

    strategy that is designed to overcome these drawbacks of the ML estimation. In stage 1 of the two-stage ML

    estimation, the parameter vectors of each marginal distribution are estimated separately by ML. In stage

    2, the estimates from stage 1 are plugged into the log-likelihood of the model. Then, the parameters of the

    copula, which are now the only unknown parameters, are estimated by using ML estimation again. One of

    the advantages of this multi-stage approach is that it is computation-wise much faster than estimating all

    parameters simultaneously, because it does not have to search for the global maximum in high-dimensional

    space. A drawback of the two-stage ML estimation method, however, is that we cannot use the classical

    results based on ML estimation, which include model selection criteria such as TIC and BIC.

    In practice, different sorts of goodness-of-fit testing are often used as substitutes, to choose the best

    model (Genest & Favre, 2007). Another often used model selection strategy for the two-stage ML is that

    one first evaluates candidates of each marginal distribution with AICML and consequently chooses the best

    distribution for each margin. Once the margins are chosen, one fits different copulae and evaluates the copula

    part with AICML. However, this piecewise model evaluation cannot evaluate the model as a whole.

    In this paper, we develop the copula information criterion (CIC) for two-stage ML estimation, which has

    the form

    CIC = 2�n(�η)− 2�p∗η.

    Here �n(�η) is the maximized log-likelihood with the two-stage ML estimation method, in terms of the full parameter vector η of the model in question, and �p∗η is a suitable penalization factor, worked out in Section 2.2. The main advantage of CIC is that it can evaluate a parametric copula with parametric margins as a whole.

    CIC is also a model-robust model selection criterion which means that it does not assume that the candidate

    model contains the true model. As the overlap of the name already suggests, our CIC is analogous to CIC

    from Grønneberg & Hjort (2014), which is designed for copulae estimated with pseudo maximum likelihood

    (PML). In PML framework, margins are estimated empirically, while two-stage ML assumes parametric

    forms of margins.


  • Our technical setting, identical to Ko & Hjort (2018), is as follows. Let (Y1, · · · , Yd)T be a d-variate continuous stochastic variable originating from a joint density g(y1, · · · , yd) and let yi = (yi,1, · · · , yi,d)T, for i = 1, . . . , n, be independent observations of this variable. The true joint distribution g is typically

    unknown. Let f(y1, · · · , yd, η) be our parametric approximation of g, with the parameter vector η, belonging to some connected subset of the appropriate Euclidean space. In addition, G and F (·, η) indicate cumulative distribution functions corresponding to g and f(·, η), respectively. Here Gj(yj) and Fj(yj ,αj) indicate j-th marginal distribution functions corresponding to G and F (·, η) respectively, with αj as the parameter vector belonging to margin component j.

    According to Sklar’s theorem (Sklar, 1959), there always exists a copula C(u1, . . . , ud, θ) that satisfies

    F (y1, · · · , yd, η) = C(F1(y1,α1), · · · , Fd(yd,αd), θ)

    where the full parameter vector η is now blocked as

    η = (αT, θT)T = (αT1 , · · · ,αdT, θT)T.

    By assuming the regulatory conditions from Ko & Hjort (2018), C(·, θ) can be differentiated,

    f(y1, · · · , yd, η) = c (F1(y1,α1), · · · , Fd(yd,αd), θ) d�


    fj(yj ,αj),

    where c(u1, . . . , ud) = ∂ dC(u1, . . . , ud, θ)/∂u1 · · · ∂ud and fj(yj ,αj) = ∂Fj(yj ,αj)/∂yj . For further details of

    copula modeling, see Joe (1997) and Nelsen (2006). Analogously, the true density g can also be decomposed

    into marginal densities and the copula density

    g(y1, · · · , yd) = c0 (G1(y1), · · · , Gd(yd)) d�



    with c0(·) the true copula. The further structure of this paper is as follows. In Section 2.1, we briefly explain Kullback–Leibler

    divergence and its relationship to TIC and AICML. In Section 2.2, we derive and define our copula information

    criterion. In Section 2.3, we prove that the AIC2ML formula holds under the two-stage ML estimation. In

    Section 2.4, we summarize the relationship between TIC, CIC, AICML and AIC2ML. In Section 2.5, we

    illustrate what CIC looks like in the two-dimensional setting and show how CIC easily can be extended

    to the conditional copula setting. In Section 3, we study the numerical behavior of those model selection

    criteria. In our final Section 4, we offer a few concluding remarks and suggestions for future research.


  • 2 The copula information criterion for two-stage maximum likeli-

    hood estimation

    2.1 Kullback–Leibler divergence

    The Kullback–Leibler (KL) divergence from g to f measures how the density f diverges from g (Kullback

    & Leibler, 1951) and is defined as