jop43liangweb

download jop43liangweb

of 17

Transcript of jop43liangweb

  • 8/3/2019 jop43liangweb

    1/17

    The Journal of Operational Risk (2743) Volume 4/Number 3, Fall 2009

    Bayesian analysis of extreme operationallosses

    Chyng-Lan LiangAlgorithmics (UK) Limited, Eldon House, 2 Eldon Street, London EC2M 7LS, UK;email: [email protected]

    Bayesian techniques offer an alternative to parameter estimation methods,

    such as maximum likelihood estimation, for extreme value models. These

    techniques treat the parameters to be estimated as random variables,

    instead of some fixed, possibly unknown, constants. We investigate, with

    simulated examples, how Bayesian analysis can be used to estimate the

    parameters of extreme value models, for the case where we have no prior

    knowledge at all and the case where we have prior knowledge in the form

    of expert opinion. In addition, Bayesian analysis provides a framework for

    the incorporation of information from external data into a loss model based

    on internal data; this is again illustrated using simulation.

    1 INTRODUCTION

    Maximum likelihood estimation (MLE) techniques are not the only way to draw

    inferences from the likelihood function; Bayesian inference offers an alternative

    methodology, as well as viewpoint. There is some debate concerning the viability

    of these methods, which we will only briefly touch upon. We will be concentrating

    instead on how these methods might be applied in practice.

    The Bayesian techniques we will consider here have wide applicability, but we are

    particularly interested in how these techniques may give us greater insight into the

    behavior of loss processes at extreme levels. There has been a great deal of interest inthe use of Bayesian methods in operational risk; recent papers on the subject include

    those by Shevchenko and Wthrich (2006) and Peters and Sisson (2006). We will not

    be restricting ourselves to conjugate priors (see Shevchenko and Wthrich (2006) for

    classes of frequency and severity models admitting conjugate forms) and will instead

    be concentrating on the fitting of extreme losses, using the distributions suggested by

    extreme value theory.

    In Section 2, we introduce the concepts behind Bayesian analysis; in Section 3

    we briefly describe how simulation-based techniques, and in particular Markov

    chain Monte Carlo (MCMC) techniques, can help us overcome the difficulty ofcomputation in Bayesian analysis. In Section 4, we consider parameter estimation

    for the extreme value model for annual maxima, through a small simulation study.

    For simulated data, we compare the Bayesian estimates for the model parameters

    when we have no prior information (Section 4.1) with those obtained when we have

    27

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    2/17

    28 C.-L. Liang

    prior expert opinion (Section 4.2). In the latter case, there is no obvious methodology

    for transforming expert opinions into prior distributions for the parameters. We will

    be following the method for eliciting prior information proposed by Coles and Tawn

    (1996). We consider using external data for prior specification in Section 5. Typically

    there are concerns about the incorporation of external data into the modeling

    process questions about scaling, applicability and loss severity thresholds. We

    illustrate a possible use of Bayesian analysis with an example showing where

    the external data can provide us with information concerning the shape of the

    distribution.

    2 BAYESIAN INFERENCE

    Suppose we have data x = (x1

    , . . . , xn

    ), constituting independent identically dis-

    tributed (iid) realizations of a random variable, X, whose density belongs to a

    parametric family parametrized by . The likelihood for may then be given by

    P (x | ) =n

    i=1 P (xi | ), since the observations xi , i = 1, . . . , n, are independent.

    In the classical framework, is a constant an unknown constant to be estimated

    in many cases. In the Bayesian setting, itself is a random variable, with an

    a priori distribution, which reflects uncertainty concerning the parameter value prior

    to observing any data. This distribution is called the prior distribution.

    Bayes theorem states that:

    P ( | x) =P()P(x | )

    P(y)P(x | y) dy

    (1)

    Bayes theorem is an immediate consequence of the axioms of probability it is its

    use in statistical analysis that is controversial and revolutionary. The theorem allows

    us to convert some prior set of beliefs concerning the unknown , as represented by

    the prior distribution, P(), into a posterior distribution, P ( | x), which incorporates

    information provided by the data x. In addition, since we get a complete distribution,

    not just an estimate, the accuracy of our estimate can be summarized, for example,by the variance of the posterior distribution; we do not need to resort to asymptotic

    theory, as we do for MLE. Proponents for this approach argue that the prior

    distribution allows us to supplement data with other sources of relevant information;

    opponents contend that conclusions become subjective, with the subjective choice

    of priors.

    3 METROPOLISHASTINGS ALGORITHM

    Computation of the integral in the denominator in Equation (1) can cause problems,especially for high-dimensional . This problem has been overcome by simulation-

    based techniques. The idea behind simulation-based techniques is to simulate a

    sample from the posterior distribution and use this simulated sample to obtain

    estimates of the moments and properties of the posterior distribution.

    The Journal of Operational Risk Volume 4/Number 3, Fall 2009

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    3/17

    Bayesian analysis of extreme operational losses 29

    Under fairly general conditions, a Markov chain eventually reaches an equilibrium

    distribution . The MetropolisHastings algorithm aims to construct a Markov chain

    that has the posterior distribution as its equilibrium distribution. The algorithm is

    given below:

    Set an initial value 1.

    Specify an arbitrary probability rule q(i+1 | i ) for iterative simulation of

    successive values. The q function here is known as the transition kernel of

    the chain.

    This generates a first order Markov chain, as the stochastic properties of

    i+1 are independent of 1, . . . , i1, given i . In order for the sequence

    1, 2, . . . to have the equilibrium distribution given by Equation (1), we add

    an acceptance/rejection step: For iteration i, use the probability rule q( | i ) to generate a candidate value, y,

    for i+1.

    Let:

    i = min

    1,

    P(y)P(x | y)q(i | y)

    P (i )P (x | i )q(y | i )

    (2)

    We set:

    i+1 = y with probability i

    i with probability 1 i(3)

    Under simple regularity assumptions, the generated sequence is a Markov chain,

    with equilibrium distribution being the distribution given by Equation (1). For a large

    enough value of m, the sequence m, m+1, . . . can be used in a similar way to a

    sequence of independent values to estimate properties of the posterior distribution,

    like the posterior mean. This will be regardless of the choice of the transition

    kernel, q, though this choice will affect the settling-in period and the dependence

    in the sequence.

    3.1 Estimation of extreme quantiles

    The objective of extreme value analysis is usually an estimate of the probability

    of future events reaching extreme levels. Extreme quantiles are therefore often of

    primary interest. We will use the 99.5% quantile as a comparison point it is often

    convenient to have one point of comparison, as opposed to comparing the estimates

    for each distribution parameter.

    Suppose we again have our iid observations x = (x1, x2, . . . , xn) for X with

    probability density function (pdf) P (x | ). Prediction can be handled within aBayesian setting. Let x denote a future observation of X. The predictive density

    ofx given our previous observations x = (x1, x2, . . . , xn) is given by:

    P (x | x1, x2, . . . , xn) =

    P (x | )P( | x1, x2, . . . , xn) (4)

    Research Paper www.thejournalofoperationalrisk.com

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    4/17

    30 C.-L. Liang

    Suppose the posterior distribution has been estimated by simulation. If1, . . . , sdenotes a sample from a random variable with distribution P ( | x), then we have, in

    the Bayesian framework:

    P (X < c | x1, x2, . . . , xn) =

    P (X < c | )P( | x1, x2, . . . , xn)

    1

    s

    si=1

    P (X < c | i ) (5)

    The predictive distribution, given by Equation (5), allows for both parameter uncer-

    tainty and randomness in future observations. Equation (5) may be used to estimate

    the extreme quantiles of interest, by determining the c giving the desired probability.

    4 ESTIMATION FOR EXTREME VALUE MODEL

    Let Y1, Y2, . . . be a sequence of iid random variables. Let Xn = max{Y1, . . . , Yn}.

    From work by Fisher and Tippett (1928), we have that if two sequences of real

    numbers an > 0, bn exist such that:

    limn

    P

    Xn bn

    an x

    = F(x) (6)

    then, ifF is a non-degenerate distribution function, it can be given by:

    F(x) =

    exp

    1

    k(x )

    1/ kifk = 0

    exp

    exp

    (x )

    ifk = 0

    (7)

    Here > 0 and for k < 0, x > + /k and for k > 0, x < + /k. k is the shape

    parameter, is the location parameter and is the scale parameter. This is thegeneralized extreme value (GEV) distribution.

    Suppose we have data (x1, x2, . . . , xn), where xi is the annual maximum loss

    for the year indexed by i. We will illustrate the estimation of distribution parameters

    using Bayesian inference through a small simulation study, and we will simulate the

    maximum losses from the GEV distribution.

    One thousand iid observations were simulated from GEV(k =0.2, = 5,

    = 5). Figure 1 shows the histogram and empirical cumulative distribution function

    (cdf) for our simulated data.

    In the following sections, we consider the case where we lack prior informa-tion (4.1) and the case where we have some (4.2). We use Bayesian techniques

    to estimate the parameters and extreme quantiles for our GEV distribution in

    these two cases; the resulting estimates will be compared with those obtained by

    classical MLE.

    The Journal of Operational Risk Volume 4/Number 3, Fall 2009

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    5/17

    Bayesian analysis of extreme operational losses 31

    FIGURE 1 Histogram and empirical cdf of data generated from GEV(0.2, 5, 5),with n = 1,000.

    1201008060

    PDF CDF

    xx40200 120100806040200

    0.0 0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    0.02

    0.04

    0.06

    f(x

    )

    F(x

    )

    The cdf and pdf of the GEV distribution fitted by MLE are also shown.

    4.1 Non-informative prior

    In the situation where we have no prior information, we will still need to specify a

    prior distribution. It is common practice to use either uniform priors or priors with

    very high variance, reflecting the absence of any genuine prior information. Such

    priors are referred to as non-informative priors.

    We reparameterize by setting = log , as an easier way to respect the positivity

    of. The MCMC realizations for may be transformed back to realizations of by

    taking the exponential.

    We assume independence and the following prior pdf:

    P (, , k) = f()f ()fk(k) (8)

    where f(), f () and fk() are normal density functions with mean zero and

    variances v, v and vk respectively. The variances, v, v and vk, should be large

    for a near flat prior; the choice of normality is arbitrary here. We chose v = v =

    vk = 104.

    We adopt an algorithm that is a slight variant to the one described in Section 3. The

    generation of a candidate value and its subsequent acceptance/rejection are carried

    out sequentially for each of the parameters, , and k, where q is replaced in turn

    by transition densities q, q and qk, each being a function of its own argument only.

    We determined on normal qs, with mean zero and variances w, w and wk.We note that, unlike our choice for the prior distribution, our choice for the

    transition density, q, will not affect the model; it will affect only the efficiency

    of the algorithm. We chose w = 0.01, w = 0.0025 and wk = 0.001. This choice

    was made fairly arbitrarily; no attempt has been made here to tailor either the

    Research Paper www.thejournalofoperationalrisk.com

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    6/17

    32 C.-L. Liang

    FIGURE 2 MCMC realizations of GEV parameters for non-informative prior: Toppanel, k; middle panel, ; bottom panel, .

    1,000800600

    Iteration

    Location parameter ()

    Scale parameter ()

    Shape parameter (k

    )

    4002000

    2

    1,000800600

    Iteration

    4002000

    5

    4

    3

    2

    1

    54

    3

    1,000800600Iteration

    4002000

    0.0

    0.30

    0.20

    0.10

    The horizontal line marks the actual parameter value our data was simulated from GEV(0.2, 5, 5).

    transition density or the MCMC algorithm to improve its efficiency or the properties

    of the chain.

    Figure 2 shows the values generated by 1,000 iterations of the chain, with initial

    values ofk1 = 0, 1 = 0, 1 = 2. In Figure 2 (scale parameter), we have transformed

    the scale parameter back to the scale by setting i = exp(i ) for each of the

    simulated i values.

    The settling-in period seems to take around 200 iterations in this example. If wedelete the first 200 simulated values, the remaining 800 can be used to determine

    properties of the posterior distribution.

    The sample means and standard deviation of the 800 simulated values, for each

    of the model parameters, are shown in Table 1, together with the MLE and standard

    The Journal of Operational Risk Volume 4/Number 3, Fall 2009

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    7/17

    Bayesian analysis of extreme operational losses 33

    TABLE 1 Estimates (to three significant figures) for the GEV model parameters, k,, .

    Parameter k 99.5% quantile

    Bayesian inference 0.194 (0.027) 5.19 (0.145) 4.89 (0.175) 53.2MLE 0.200 (0.024) 5.15 (0.145) 4.85 (0.183) 53.3

    Actual value 0.2 5 5 52.1

    Results shown to three significant figures, derived by MLE and Bayesian analysis, with non-informative

    priors (standard deviations/errors in parentheses) and the resulting estimate for the 99.5% quantile.

    errors. The results are very similar, which is reassuring given how uninformative the

    prior specification is.

    We are also interested in estimating extreme quantiles. Equation (5) was used,together with the bisection method, to calculate the 99.5% quantile for our annual

    maximum. Discarding the first 200 of 1,000 MCMC realizations, we have s = 800

    and an estimate of 53.2 (to three significant figures) for the 99.5% quantile. The

    MLE is 53.3, and the 99.5% quantile for the distribution from which the data was

    generated is 52.1 (to three significant figures).

    When we have no prior information concerning the extreme value model param-

    eters, Bayesian inference, with non-informative priors, may provide us with another

    method to estimate the parameter values. There are no ranges in the parameter space

    where the method breaks down, unlike with MLE, and estimates for the variability

    of the estimates are also a side product of the methodology.

    4.2 Prior expert opinion

    We will next consider the situation where we do have prior information or beliefs

    concerning the parameters, and show how we may use data to update our beliefs.

    In this section, we will model the same data as in Section 4.1, but this time we will

    assume that we have experts on whose opinion to base our prior specification.

    It is unlikely that experts will be able to express their prior beliefs concerning

    extremal behavior directly in terms of the model parameters. Even if they were able

    to come up with prior marginal distributions for the parameters, coming up with

    the joint prior specification would still remain problematic. In particular, increasing

    the scale parameter or decreasing the shape parameter leads to a longer-tailed

    distribution, so dependence between these two parameters is expected.

    Coles and Tawn (1996) advocate asking experts about the quantiles of some

    extreme values. In particular, we may, for instance, ask experts for their estimates

    of the median and 90% quantile for particular quantiles of the annual maximum loss.Since we have three model parameters, we require expert opinion for three

    quantiles, which we will denote by q1, q2 and q3, where for i = 1, 2, 3:

    qi = ([log pi ]

    k 1)

    k(9)

    Research Paper www.thejournalofoperationalrisk.com

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    8/17

    34 C.-L. Liang

    for some (large) probabilities p1 < p2 < p3. Since, by definition, q1 < q2 < q3 and

    we need to respect the ordering, Coles and Tawn (1996) advocate working instead

    with the differences q1 = q1, q2 = q2 q1, q3 = q3 q2, with the assumption that:

    qi gamma(i , i ), i = 1, 2, 3 (10)

    The estimates for the median and 90% quantile for each of q1, q2 and q3, provided

    by our experts, will allow us to calculate the gamma parameters, i , i for i = 1, 2, 3,

    by solving two simultaneous equations in two unknowns. Assuming independence,

    we then have:

    P (q1, q2, q3) =

    11 q

    111 exp(1q1)

    (1

    )

    22 (q2 q1)

    21 exp(2[q2 q1])

    (2

    )

    33 (q3 q2)

    31 exp(3[q3 q2])

    (3)(11)

    We now have our prior distribution in terms of q1, q2, q3 and we would like it in

    terms of our GEV model parameters, k, and . The determinant of the Jacobian

    matrix, J, is:

    det(J ) =

    k2

    [log p1]k log(log p1)[(log p2)

    k (log p3)k]

    +

    k2[log p2]

    k log(log p2)[(log p1)k (log p3)

    k]

    k2[log p3]

    k log(log p3)[(log p1)k (log p2)

    k] (12)

    Substituting Equation (9) into Equation (11) and multiplying by the absolute value

    of the determinant of J gives us our prior joint distribution for k, and . Figure 4

    (see page 36) shows the marginal prior distributions for the GEV model parameters,

    k, and , for gamma priors with (1

    , 2

    , 3

    ) = (35, 15, 50) and (1

    , 2

    , 3

    ) =

    (0.9, 0.5, 0.8).

    We next configure our MCMC algorithm, noting that again no attempt has

    been made to guarantee that the generated chain has desirable properties, such

    as a short settling-in period or low correlation, opting instead for simplicity. Our

    analysis is based on a Gibbs sampler, successively updating the individual param-

    eters conditonal on the current values of the other parameters, with a Metropolis

    acceptance/rejection step.

    We sequentially generate the candidate values for each of the model parameters

    in turn and determine the acceptance/rejection probability, with the other parametersfixed at their last accepted values. We again use normal transition kernels.

    The first 1,000 realizations of our MCMC simulation, for each of the three

    parameters, are shown in Figure 3. Initial values for the parameters are again taken to

    be k1 = 0, 1 = 0, 1 = 2. The settling-in period seems to take around 200 iterations.

    The Journal of Operational Risk Volume 4/Number 3, Fall 2009

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    9/17

    Bayesian analysis of extreme operational losses 35

    FIGURE 3 First 1,000 MCMC realizations of GEV parameters for gamma priors: Toppanel, k; middle panel, , bottom panel, .

    1,000800600

    Iteration

    Location parameter ()

    Scale parameter ()

    Shape parameter (k

    )

    4002000

    2

    1,000800600

    Iteration

    4002000

    5

    4

    3

    2

    1

    54

    3

    1,000800600Iteration

    4002000

    0.0

    0.30

    0.20

    0.10

    The horizontal line marks the actual parameter value our data was simulated from GEV(0.2, 5, 5).

    Discarding the first 200 observations, the distribution of the remaining observations

    may be used to approximate the posterior distribution of our model parameters.

    Figure 4 (see page 36) shows the marginal prior and posterior pdfs for k,

    and . Since we wished to plot the posterior distributions with a higher degree of

    smoothness, 3,000 MCMC realizations (with the first 200 observations discarded)

    were generated.

    As might be expected, given data, our uncertainty concerning the model parametervalues decreases, and the posterior variance is smaller than the prior variance.

    Comparisons between prior and posterior estimates and between those obtained

    through Bayesian and classical inferences are shown in Table 2 (see page 36). Our

    prior beliefs have affected the resulting estimates Bayesian inference is subjective.

    Research Paper www.thejournalofoperationalrisk.com

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    10/17

    36 C.-L. Liang

    FIGURE 4 Marginal prior and posterior distributions for k, and .

    Shape parameter

    0.8 0.6 0.4 0.2 0.0

    0

    5

    10

    15

    20Posterior

    Prior

    Scale parameter

    0 2 4 6

    0.0

    0.5

    1.0

    1.5

    2.0 Posterior

    Prior

    Location parameter

    10 20 30 40 50

    0.0

    0.5

    1.0

    1.5

    2.0 Posterior

    Prior

    TABLE 2 Comparison of the prior, posterior and maximum likelihood estimates(to two significant figures) for the GEV model parameters.

    Parameter k 99.5% quantile

    Prior 0.33 4.8 24Posterior 0.25 (0.020) 5.3 (0.16) 4.9 (0.19) 62.5MLE 0.20 (0.024) 5.1 (0.15) 4.8 (0.18) 53.3

    Actual value 0.2 5 5 52.1

    Simulation means are shown for the posterior; the mean for the prior was found with a discrete

    approximation. Standard errors/standard deviations are given in parentheses for the maximum

    likelihood and posterior estimates respectively. The estimates for the 99.5% quantile are also

    shown, together with the actual value.

    Equation (5), together with the bisection method, was again used to calculate the99.5% quantile for our annual maximum. Discarding the first 200 of 3,000 MCMC

    realizations, we have s = 2,800 and an estimate of 62.5 (to three significant figures)

    for the 99.5% quantile. Our prior beliefs have, in this instance, raised our estimate

    for the extreme quantile. We note that this methodology might be particularly useful

    where the experts believe that conditions have changed since the loss data was

    collected and they wish to add their own input to the estimation process. Our example

    results may be the result of expert belief that the losses to be suffered in the future will

    be larger than the collected loss data might indicate incorporating their prior beliefsinto our estimation process has resulted in a higher estimate for the 99.5% quantile.

    Coles and Tawn (1996) were able to elicit prior information of the kind we have

    illustrated here, from their hydrology expert. It remains to be seen whether experts

    in other fields, including those for operational losses, will be able to do the same.

    The Journal of Operational Risk Volume 4/Number 3, Fall 2009

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    11/17

    Bayesian analysis of extreme operational losses 37

    5 EXTERNAL DATA

    Data sparseness is a particular problem when data is heavy tailed. With only a

    limited amount of data, it is very hard to determine the properties of the tail of the

    distribution. Banks are increasingly attempting to supplement their loss databaseswith external sources of loss data.

    These efforts to catalogue the operational loss experience of the industry typically

    fall into two categories: databases that use public sources, such as newspapers and

    press releases, and consortium databases, based on losses contributed by member

    banks, which are then pooled and shared. For both types of databases, the larger

    losses are usually more likely to be catalogued, and the threshold above which losses

    are reported may or may not be known.

    We need to be careful when utilizing external data. The industry is beginningto realize that one cannot just take a tail event from another bank and look at it

    as a signal to what could happen internally. Rather, a better approach might be to

    consider tail events occurring only within peer firms that are of a similar size and

    operate in a similar business environment, with a comparable scope of business

    activities. Attempts to scale data have also been made, although no consensus on

    the best practice for this has yet been reached.

    Bayesian methods may be used to incorporate the use of external data into the

    creation of the model for internal losses. Depending on the source and applicability

    of the external data, it may be possible for external data to provide us with priorinformation concerning our internal model parameters. This would be especially

    desirable in the situation where there is little internal data, resulting in a bad fit

    when using the internal data alone. There are various ways in which information

    from external data can be used to come up with a prior distribution.

    We may only have, or have confidence in, external data above a high (known)

    threshold. In this case, a peak over threshold (POT) approach (Embrechts et al

    (2003)) is suggested. Alternatively, we may not know the threshold for the data (or

    it may not be of a fixed value) this may be the case for external databases derivedfrom public sources. Here we may prefer to use the largest loss suffered in each year,

    and we could consider fitting a GEV distribution to the annual maximum losses, as

    we did in Section 4.

    5.1 Losses over a threshold

    In this section, we will illustrate a possible Bayesian approach with a simple example

    where we have external data over a known threshold. Suppose we have only a small

    number of observations (internal data) from the distribution we wish to fit. With onlya small amount of observations from a heavy-tailed distribution, a good fit based

    on the internal data alone is less likely. Suppose that, in addition, we have a large

    quantity of observations (external data) independently sampled from the tail of a

    distribution, which we believe has the same shape parameter.

    Research Paper www.thejournalofoperationalrisk.com

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    12/17

    38 C.-L. Liang

    We will use external data to provide us with information on the shape parameter.

    The main assumption being made is that the internal and external data reflects

    observations from distributions that have the same shape parameter. We will assume

    non-informative priors for the location and scale parameters.

    A random variable X is said to have a generalized Pareto distribution (GPD) with

    shape parameter k, location parameter and scale parameter , if the distribution

    function is given by:

    F (x | k, , ) =

    1

    1

    k(x )

    1/kifk = 0

    1 exp(x )

    ifk = 0(13)

    where > 0 and for k 0, x > 0 and for k > 0, 0 < x < / k. The extreme

    value index, a function of the power of the tail decay, is k. The GPD reduces to the

    Pareto distribution when k < 0.

    Pickands (1975) proved that the distribution of exceedances above a threshold

    will tend to the GPD, as the threshold tends to infinity, provided the underlying

    distribution function belongs to the domain of attraction of the GEV distribution.

    The class of distributions belonging to this domain is fairly large, making the theorem

    quite widely applicable.

    We will consider a couple of ways of using external data to determine a priordistribution for the shape parameter:

    Method 1. We may fit the GPD to the external data, in order to determine

    the MLE of the shape parameter for the internal model. We can take the prior

    distribution to be normal, with mean given by the MLE of the shape parameter

    and standard deviation given by the estimated standard errors in our estimation.

    Method 2. Estimators of the extreme value index could be used to provide

    information on the shape parameter. Such estimators abound in the literature;

    they include those proposed by Pickands (1975), Hill (1975) and Huisman et al(2001). The estimates for the extreme value index over a range of threshold

    values may, for example, be used to generate the prior distribution for the shape

    parameter.

    5.2 Simulation study

    A small simulation study was conducted to illustrate the use of external data in a

    Bayesian approach to parameter estimation. For Y Gamma(, ), we will say that

    X = exp(Y ) is log gamma, with parameters , and . is the shape parameterfor the log gamma distribution, and 1/ is the extreme value index.

    Our internal data consisted of 100 iid observations simulated from the

    log gamma( = 1.2, = 2, = 7) distribution. We fitted the log gamma distribution

    to this data by MLE. Figure 5 (see page 39) shows the histogram and empirical cdf

    The Journal of Operational Risk Volume 4/Number 3, Fall 2009

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    13/17

    Bayesian analysis of extreme operational losses 39

    FIGURE 5 Histogram of our internal data and the pdf of the fitted log gammadistribution (left). Empirical cdf for internal data and the fitted cdf (right).

    3.0 3.2 3.4 3.6 3.8 4.0 4.2

    0

    2

    4

    6

    PDF

    log 10(x)

    3.2 3.4 3.6 3.8 4.0 4.2

    log 10(x)

    CDF

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    of the internal data, together with the fitted pdf and cdf. There was no statistically

    significant evidence for the KolmogorovSmirnov goodness of fit test that the

    data did not come from the fitted distribution. The MLE were = 1.11, = 1.70,

    = 7.00 (to three significant figures).

    We then generated 10,000 iid observations from the tail (50,000) of the

    log gamma(1.8, 2, 7.5) distribution this was our external data. We have assumed

    that the shape parameter, , is the same for the models generating the internal and

    external losses.

    For method 1, we fitted the GPD distribution to the exceedances over 50,000.

    The resulting MLE for the extreme value index was found to be 0.527, with

    an estimated standard error of 0.0154 (to three significant figures). We assumed

    that 1/ had a normal prior distribution with mean 0.527 and standard deviation

    0.0154, N(0.527, 0.01542), with and having non-informative prior distributions,

    N(0, 104).

    For method 2, the results from the estimation for the extreme value index,

    proposed by Pickands (1975), Hill (1975) and Huisman et al (2001), are shown in

    Figure 6 (see page 40). Taking the Huisman estimates, for example, we can form a

    non-parametric prior distribution for the extreme value index. and will again be

    assumed to have non-informative prior distributions, N(0, 104).

    Normal transition kernels were used, with initial parameter values of 1 = 1,1/1 = 0.7, 1 = 1. Two thousand realizations of the MCMC simulation for meth-

    ods 1 and 2 are shown in Figure 7 (see page 41). Discarding the first 500 MCMC

    realizations gives us the estimates for the log gamma parameters; the results are

    summarized in Table 3 (see page 40).

    Research Paper www.thejournalofoperationalrisk.com

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    14/17

    40 C.-L. Liang

    FIGURE 6 Estimates for the extreme value index derived from the Pickands, Hill andHuisman estimators, using external data above varying thresholds.

    Number of losses above threshold

    Extremevalueindex

    0 200 400 600 800 1,000

    0.3

    0.4

    0.5

    0.6

    0.7

    HillHuisman

    Pickands

    0.46 0.48 0.50 0.52 0.54 0.56

    40

    30

    20

    10

    0

    Extreme value index

    The unbroken line shows the actual parameter value, and the dashed line shows the MLE from fittinginternal data (left). The histogram for the Huisman estimates for thresholds resulting in 301,000 losses

    above the threshold (right).

    TABLE 3 Comparison of the Bayesian estimates (to three significant figures) for thelog gamma model parameters, for priors obtained using methods 1 and 2, with theestimates obtained through MLE.

    Parameter 99.5% quantile

    MLE 1.11 1.70 7.00 29,200Prior method 1 1.26 1.89 6.99 24,800Prior method 2 1.28 1.91 6.99 24,600

    Actual value 1.2 2 7 19,600

    The estimates for the 99.5% quantile are 24,800 and 24,600 (to three significant

    figures) for methods 1 and 2 respectively. The estimate from maximum likelihood

    fitting on the internal data alone is 29,200; the actual 99.5% quantile for thelog gamma(1.2, 2, 7) distribution is 19,600 (to three significant figures).

    In this example, external data has helped us to finetune our model for internal

    losses. However, it should be noted that, in our example, the external data that we

    simulated actually had relevance for our internal model (external and internal shared

    The Journal of Operational Risk Volume 4/Number 3, Fall 2009

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    15/17

    Bayesian analysis of extreme operational losses 41

    FIGURE 7 MCMC realizations of GEV parameters for prior information from exter-nal data with method 1 (left) and method 2 (right): Top panels, ; middle panels, ;bottom panels, .

    Alpha Alpha

    1.0

    1.5

    2.0

    1.0

    1.5

    2.0

    Delta

    1.4

    1.6

    1.8

    2.0

    2.2

    1.4

    1.6

    1.8

    2.0

    2.2

    Delta

    Mu

    Iteration

    0 500 1,000 1,500 2,000

    Iteration

    0 500 1,000 1,500 2,000

    Iteration

    0 500 1,000 1,500 2,000

    Iteration

    0 500 1,000 1,500 2,000

    Iteration

    0 500 1,000 1,500 2,000

    Iteration

    0 500 1,000 1,500 2,000

    6.6

    6.7

    6.8

    6.9

    7.0

    6.6

    6.7

    6.8

    6.9

    7.0

    Mu

    The unbroken line shows the actual parameter value; the dashed line shows the MLE from fitting internal

    data.

    Research Paper www.thejournalofoperationalrisk.com

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    16/17

    42 C.-L. Liang

    the same shape parameter). We have also assumed that we do not have a lot of internal

    losses, but we have a large quantity of external losses, in the tail of the distribution.

    The model assumptions made here were known to be correct in the simulation

    world. If the assumptions are not correct, our estimates will be biased by our

    preconceptions or prior beliefs.

    6 CONCLUSION

    Bayesian analysis provides another way of viewing the problem of parameter

    estimation. Most importantly, it allows for the incorporation of prior opinion into the

    estimation, which some proponents view as its greatest strength (and some opponents

    view as its greatest weakness).

    In this paper, we have considered how Bayesian inference may be used in thefitting of extreme value distributions to extreme losses. The prior specification

    may be based on expert opinion, external data sources or a combination of the

    two. In particular, Bayesian methods provide a framework for the incorporation of

    information derived from external and internal sources. Currently the combination of

    data from various sources in operational risk has involved mostly pooling the data or

    creating a mixture model. We would have to consider whether to and/or how to scale

    the data, for the former methodology; for the latter, we would need to decide on the

    mixture coefficient. Bayesian analysis provides another option though choices still

    do have to be made by the modeler.The simulation study conducted here demonstrates how prior information may

    be used to affect the fitted distribution and that, where the prior specification does

    provide relevant information about the distribution of the losses, the estimation

    of the model parameters can be improved. The development of simulation-based

    techniques, in particular MCMC, has overcome the difficulty of computation of

    the posterior, and it has made Bayesian techniques very popular in many areas of

    application.

    REFERENCES

    Coles, S. G., and Tawn, J. A. (1996). A Bayesian analysis of extreme rainfall data. AppliedStatistics 45(4), 463478.

    Embrechts, P., Klppelberg, C., and Mikosch, T. (2003). Modelling Extremal Events forInsurance and Finance. Springer-Verlag, New York.

    Fisher, R. A., and Tippett, L. H. C. (1928). Limiting forms of the frequency distribution ofthe largest or smallest member of a sample. Proceedings of the Cambridge Philosophical

    Society 24, 180190.Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution.

    Annals of Statistics 3, 11631174.

    Huisman, R., Koedijk, K. G., Kool, C. J. M., and Palm, F. (2001). Tail-index estimates in smallsamples. Journal of Business and Economic Statistics 19(1), 208216.

    The Journal of Operational Risk Volume 4/Number 3, Fall 2009

    2009 Incisive Media. Copying or distributing in print or electronic

    forms without written permission of Incisive Media is prohibited.

  • 8/3/2019 jop43liangweb

    17/17

    Bayesian analysis of extreme operational losses 43

    Peters, G. W., and Sisson, S. A. (2006). Bayesian inference, Monte Carlo sampling andoperational risk. The Journal of Operational Risk 1(3), 2750.

    Pickands, J. (1975). Statistical inference using extreme order statistics. Annals of Statistics 3,199131.

    Shevchenko, P., and Wthrich, M. (2006). The structural modelling of operational risk viaBayesian inference: combining loss data with expert opinions. The Journal of OperationalRisk1(3), 326.

    Research Paper www.thejournalofoperationalrisk.com