jop43liangweb

8/3/2019 jop43liangweb

1/17

The Journal of Operational Risk (2743) Volume 4/Number 3, Fall 2009

Bayesian analysis of extreme operationallosses

Chyng-Lan LiangAlgorithmics (UK) Limited, Eldon House, 2 Eldon Street, London EC2M 7LS, UK;email: [email protected]

Bayesian techniques offer an alternative to parameter estimation methods,

such as maximum likelihood estimation, for extreme value models. These

techniques treat the parameters to be estimated as random variables,

instead of some fixed, possibly unknown, constants. We investigate, with

simulated examples, how Bayesian analysis can be used to estimate the

parameters of extreme value models, for the case where we have no prior

knowledge at all and the case where we have prior knowledge in the form

of expert opinion. In addition, Bayesian analysis provides a framework for

the incorporation of information from external data into a loss model based

on internal data; this is again illustrated using simulation.

1 INTRODUCTION

Maximum likelihood estimation (MLE) techniques are not the only way to draw

inferences from the likelihood function; Bayesian inference offers an alternative

methodology, as well as viewpoint. There is some debate concerning the viability

of these methods, which we will only briefly touch upon. We will be concentrating

instead on how these methods might be applied in practice.

The Bayesian techniques we will consider here have wide applicability, but we are

particularly interested in how these techniques may give us greater insight into the

behavior of loss processes at extreme levels. There has been a great deal of interest inthe use of Bayesian methods in operational risk; recent papers on the subject include

those by Shevchenko and Wthrich (2006) and Peters and Sisson (2006). We will not

be restricting ourselves to conjugate priors (see Shevchenko and Wthrich (2006) for

classes of frequency and severity models admitting conjugate forms) and will instead

be concentrating on the fitting of extreme losses, using the distributions suggested by

extreme value theory.

In Section 2, we introduce the concepts behind Bayesian analysis; in Section 3

we briefly describe how simulation-based techniques, and in particular Markov

chain Monte Carlo (MCMC) techniques, can help us overcome the difficulty ofcomputation in Bayesian analysis. In Section 4, we consider parameter estimation

for the extreme value model for annual maxima, through a small simulation study.

For simulated data, we compare the Bayesian estimates for the model parameters

when we have no prior information (Section 4.1) with those obtained when we have

27

2009 Incisive Media. Copying or distributing in print or electronic

forms without written permission of Incisive Media is prohibited.


2/17

28 C.-L. Liang

prior expert opinion (Section 4.2). In the latter case, there is no obvious methodology

for transforming expert opinions into prior distributions for the parameters. We will

be following the method for eliciting prior information proposed by Coles and Tawn

(1996). We consider using external data for prior specification in Section 5. Typically

there are concerns about the incorporation of external data into the modeling

process questions about scaling, applicability and loss severity thresholds. We

illustrate a possible use of Bayesian analysis with an example showing where

the external data can provide us with information concerning the shape of the

distribution.

2 BAYESIAN INFERENCE

Suppose we have data x = (x1

, . . . , xn

), constituting independent identically dis-

tributed (iid) realizations of a random variable, X, whose density belongs to a

parametric family parametrized by . The likelihood for may then be given by

P (x | ) =n

i=1 P (xi | ), since the observations xi , i = 1, . . . , n, are independent.

In the classical framework, is a constant an unknown constant to be estimated

in many cases. In the Bayesian setting, itself is a random variable, with an

a priori distribution, which reflects uncertainty concerning the parameter value prior

to observing any data. This distribution is called the prior distribution.

Bayes theorem states that:

P ( | x) =P()P(x | )

P(y)P(x | y) dy

(1)

Bayes theorem is an immediate consequence of the axioms of probability it is its

use in statistical analysis that is controversial and revolutionary. The theorem allows

us to convert some prior set of beliefs concerning the unknown , as represented by

the prior distribution, P(), into a posterior distribution, P ( | x), which incorporates

information provided by the data x. In addition, since we get a complete distribution,

not just an estimate, the accuracy of our estimate can be summarized, for example,by the variance of the posterior distribution; we do not need to resort to asymptotic

theory, as we do for MLE. Proponents for this approach argue that the prior

distribution allows us to supplement data with other sources of relevant information;

opponents contend that conclusions become subjective, with the subjective choice

of priors.

3 METROPOLISHASTINGS ALGORITHM

Computation of the integral in the denominator in Equation (1) can cause problems,especially for high-dimensional . This problem has been overcome by simulation-

based techniques. The idea behind simulation-based techniques is to simulate a

sample from the posterior distribution and use this simulated sample to obtain

estimates of the moments and properties of the posterior distribution.

The Journal of Operational Risk Volume 4/Number 3, Fall 2009




3/17

Bayesian analysis of extreme operational losses 29

Under fairly general conditions, a Markov chain eventually reaches an equilibrium

distribution . The MetropolisHastings algorithm aims to construct a Markov chain

that has the posterior distribution as its equilibrium distribution. The algorithm is

given below:

Set an initial value 1.

Specify an arbitrary probability rule q(i+1 | i ) for iterative simulation of

successive values. The q function here is known as the transition kernel of

the chain.

This generates a first order Markov chain, as the stochastic properties of

i+1 are independent of 1, . . . , i1, given i . In order for the sequence

1, 2, . . . to have the equilibrium distribution given by Equation (1), we add

an acceptance/rejection step: For iteration i, use the probability rule q( | i ) to generate a candidate value, y,

for i+1.

Let:

i = min

1,

P(y)P(x | y)q(i | y)

P (i )P (x | i )q(y | i )

(2)

We set:

i+1 = y with probability i

i with probability 1 i(3)

Under simple regularity assumptions, the generated sequence is a Markov chain,

with equilibrium distribution being the distribution given by Equation (1). For a large

enough value of m, the sequence m, m+1, . . . can be used in a similar way to a

sequence of independent values to estimate properties of the posterior distribution,

like the posterior mean. This will be regardless of the choice of the transition

kernel, q, though this choice will affect the settling-in period and the dependence

in the sequence.

3.1 Estimation of extreme quantiles

The objective of extreme value analysis is usually an estimate of the probability

of future events reaching extreme levels. Extreme quantiles are therefore often of

primary interest. We will use the 99.5% quantile as a comparison point it is often

convenient to have one point of comparison, as opposed to comparing the estimates

for each distribution parameter.

Suppose we again have our iid observations x = (x1, x2, . . . , xn) for X with

probability density function (pdf) P (x | ). Prediction can be handled within aBayesian setting. Let x denote a future observation of X. The predictive density

ofx given our previous observations x = (x1, x2, . . . , xn) is given by:

P (x | x1, x2, . . . , xn) =

P (x | )P( | x1, x2, . . . , xn) (4)

Research Paper www.thejournalofoperationalrisk.com




4/17

30 C.-L. Liang

Suppose the posterior distribution has been estimated by simulation. If1, . . . , sdenotes a sample from a random variable with distribution P ( | x), then we have, in

the Bayesian framework:

P (X < c | x1, x2, . . . , xn) =

P (X < c | )P( | x1, x2, . . . , xn)

1

s

si=1

P (X < c | i ) (5)

The predictive distribution, given by Equation (5), allows for both parameter uncer-

tainty and randomness in future observations. Equation (5) may be used to estimate

the extreme quantiles of interest, by determining the c giving the desired probability.

4 ESTIMATION FOR EXTREME VALUE MODEL

Let Y1, Y2, . . . be a sequence of iid random variables. Let Xn = max{Y1, . . . , Yn}.

From work by Fisher and Tippett (1928), we have that if two sequences of real

numbers an > 0, bn exist such that:

limn

P

Xn bn

an x

= F(x) (6)

then, ifF is a non-degenerate distribution function, it can be given by:

F(x) =

exp

1

k(x )

1/ kifk = 0

exp

exp

(x )

ifk = 0

(7)

Here > 0 and for k < 0, x > + /k and for k > 0, x < + /k. k is the shape

parameter, is the location parameter and is the scale parameter. This is thegeneralized extreme value (GEV) distribution.

Suppose we have data (x1, x2, . . . , xn), where xi is the annual maximum loss

for the year indexed by i. We will illustrate the estimation of distribution parameters

using Bayesian inference through a small simulation study, and we will simulate the

maximum losses from the GEV distribution.

One thousand iid observations were simulated from GEV(k =0.2, = 5,

= 5). Figure 1 shows the histogram and empirical cumulative distribution function

(cdf) for our simulated data.

In the following sections, we consider the case where we lack prior informa-tion (4.1) and the case where we have some (4.2). We use Bayesian techniques

to estimate the parameters and extreme quantiles for our GEV distribution in

these two cases; the resulting estimates will be compared with those obtained by

classical MLE.





5/17


FIGURE 1 Histogram and empirical cdf of data generated from GEV(0.2, 5, 5),with n = 1,000.

1201008060

PDF CDF

xx40200 120100806040200

0.0 0.0

0.2

0.4

0.6

0.8

1.0

0.02

0.04

0.06

f(x

)

F(x

)

The cdf and pdf of the GEV distribution fitted by MLE are also shown.

4.1 Non-informative prior

In the situation where we have no prior information, we will still need to specify a

prior distribution. It is common practice to use either uniform priors or priors with

very high variance, reflecting the absence of any genuine prior information. Such

priors are referred to as non-informative priors.

We reparameterize by setting = log , as an easier way to respect the positivity

of. The MCMC realizations for may be transformed back to realizations of by

taking the exponential.

We assume independence and the following prior pdf:

P (, , k) = f()f ()fk(k) (8)

where f(), f () and fk() are normal density functions with mean zero and

variances v, v and vk respectively. The variances, v, v and vk, should be large

for a near flat prior; the choice of normality is arbitrary here. We chose v = v =

vk = 104.

We adopt an algorithm that is a slight variant to the one described in Section 3. The

generation of a candidate value and its subsequent acceptance/rejection are carried

out sequentially for each of the parameters, , and k, where q is replaced in turn

by transition densities q, q and qk, each being a function of its own argument only.

We determined on normal qs, with mean zero and variances w, w and wk.We note that, unlike our choice for the prior distribution, our choice for the

transition density, q, will not affect the model; it will affect only the efficiency

of the algorithm. We chose w = 0.01, w = 0.0025 and wk = 0.001. This choice

was made fairly arbitrarily; no attempt has been made here to tailor either the





6/17

32 C.-L. Liang

FIGURE 2 MCMC realizations of GEV parameters for non-informative prior: Toppanel, k; middle panel, ; bottom panel, .

1,000800600

Iteration

Location parameter ()

Scale parameter ()

Shape parameter (k

)

4002000

2

1,000800600

Iteration

4002000

5

4

3

2

1

54

3

1,000800600Iteration

4002000

0.0

0.30

0.20

0.10

The horizontal line marks the actual parameter value our data was simulated from GEV(0.2, 5, 5).

transition density or the MCMC algorithm to improve its efficiency or the properties

of the chain.

Figure 2 shows the values generated by 1,000 iterations of the chain, with initial

values ofk1 = 0, 1 = 0, 1 = 2. In Figure 2 (scale parameter), we have transformed

the scale parameter back to the scale by setting i = exp(i ) for each of the

simulated i values.

The settling-in period seems to take around 200 iterations in this example. If wedelete the first 200 simulated values, the remaining 800 can be used to determine

properties of the posterior distribution.

The sample means and standard deviation of the 800 simulated values, for each

of the model parameters, are shown in Table 1, together with the MLE and standard





7/17


TABLE 1 Estimates (to three significant figures) for the GEV model parameters, k,, .

Parameter k 99.5% quantile

Bayesian inference 0.194 (0.027) 5.19 (0.145) 4.89 (0.175) 53.2MLE 0.200 (0.024) 5.15 (0.145) 4.85 (0.183) 53.3

Actual value 0.2 5 5 52.1

Results shown to three significant figures, derived by MLE and Bayesian analysis, with non-informative

priors (standard deviations/errors in parentheses) and the resulting estimate for the 99.5% quantile.

errors. The results are very similar, which is reassuring given how uninformative the

prior specification is.

We are also interested in estimating extreme quantiles. Equation (5) was used,together with the bisection method, to calculate the 99.5% quantile for our annual

maximum. Discarding the first 200 of 1,000 MCMC realizations, we have s = 800

and an estimate of 53.2 (to three significant figures) for the 99.5% quantile. The

MLE is 53.3, and the 99.5% quantile for the distribution from which the data was

generated is 52.1 (to three significant figures).

When we have no prior information concerning the extreme value model param-

eters, Bayesian inference, with non-informative priors, may provide us with another

method to estimate the parameter values. There are no ranges in the parameter space

where the method breaks down, unlike with MLE, and estimates for the variability

of the estimates are also a side product of the methodology.

4.2 Prior expert opinion

We will next consider the situation where we do have prior information or beliefs

concerning the parameters, and show how we may use data to update our beliefs.

In this section, we will model the same data as in Section 4.1, but this time we will

assume that we have experts on whose opinion to base our prior specification.

It is unlikely that experts will be able to express their prior beliefs concerning

extremal behavior directly in terms of the model parameters. Even if they were able

to come up with prior marginal distributions for the parameters, coming up with

the joint prior specification would still remain problematic. In particular, increasing

the scale parameter or decreasing the shape parameter leads to a longer-tailed

distribution, so dependence between these two parameters is expected.

Coles and Tawn (1996) advocate asking experts about the quantiles of some

extreme values. In particular, we may, for instance, ask experts for their estimates

of the median and 90% quantile for particular quantiles of the annual maximum loss.Since we have three model parameters, we require expert opinion for three

quantiles, which we will denote by q1, q2 and q3, where for i = 1, 2, 3:

qi = ([log pi ]

k 1)

k(9)





8/17

34 C.-L. Liang

for some (large) probabilities p1 < p2 < p3. Since, by definition, q1 < q2 < q3 and

we need to respect the ordering, Coles and Tawn (1996) advocate working instead

with the differences q1 = q1, q2 = q2 q1, q3 = q3 q2, with the assumption that:

qi gamma(i , i ), i = 1, 2, 3 (10)

The estimates for the median and 90% quantile for each of q1, q2 and q3, provided

by our experts, will allow us to calculate the gamma parameters, i , i for i = 1, 2, 3,

by solving two simultaneous equations in two unknowns. Assuming independence,

we then have:

P (q1, q2, q3) =

11 q

111 exp(1q1)

(1

)

22 (q2 q1)

21 exp(2[q2 q1])

(2

)

33 (q3 q2)

31 exp(3[q3 q2])

(3)(11)

We now have our prior distribution in terms of q1, q2, q3 and we would like it in

terms of our GEV model parameters, k, and . The determinant of the Jacobian

matrix, J, is:

det(J ) =

k2

[log p1]k log(log p1)[(log p2)

k (log p3)k]

+

k2[log p2]

k log(log p2)[(log p1)k (log p3)

k]

k2[log p3]

k log(log p3)[(log p1)k (log p2)

k] (12)

Substituting Equation (9) into Equation (11) and multiplying by the absolute value

of the determinant of J gives us our prior joint distribution for k, and . Figure 4

(see page 36) shows the marginal prior distributions for the GEV model parameters,

k, and , for gamma priors with (1

, 2

, 3

) = (35, 15, 50) and (1

, 2

, 3

) =

(0.9, 0.5, 0.8).

We next configure our MCMC algorithm, noting that again no attempt has

been made to guarantee that the generated chain has desirable properties, such

as a short settling-in period or low correlation, opting instead for simplicity. Our

analysis is based on a Gibbs sampler, successively updating the individual param-

eters conditonal on the current values of the other parameters, with a Metropolis

acceptance/rejection step.

We sequentially generate the candidate values for each of the model parameters

in turn and determine the acceptance/rejection probability, with the other parametersfixed at their last accepted values. We again use normal transition kernels.

The first 1,000 realizations of our MCMC simulation, for each of the three

parameters, are shown in Figure 3. Initial values for the parameters are again taken to

be k1 = 0, 1 = 0, 1 = 2. The settling-in period seems to take around 200 iterations.





9/17


FIGURE 3 First 1,000 MCMC realizations of GEV parameters for gamma priors: Toppanel, k; middle panel, , bottom panel, .

1,000800600

Iteration

Location parameter ()

Scale parameter ()

Shape parameter (k

)

4002000

2

1,000800600

Iteration

4002000

5

4

3

2

1

54

3

1,000800600Iteration

4002000

0.0

0.30

0.20

0.10

The horizontal line marks the actual parameter value our data was simulated from GEV(0.2, 5, 5).

Discarding the first 200 observations, the distribution of the remaining observations

may be used to approximate the posterior distribution of our model parameters.

Figure 4 (see page 36) shows the marginal prior and posterior pdfs for k,

and . Since we wished to plot the posterior distributions with a higher degree of

smoothness, 3,000 MCMC realizations (with the first 200 observations discarded)

were generated.

As might be expected, given data, our uncertainty concerning the model parametervalues decreases, and the posterior variance is smaller than the prior variance.

Comparisons between prior and posterior estimates and between those obtained

through Bayesian and classical inferences are shown in Table 2 (see page 36). Our

prior beliefs have affected the resulting estimates Bayesian inference is subjective.





10/17

36 C.-L. Liang

FIGURE 4 Marginal prior and posterior distributions for k, and .

Shape parameter

0.8 0.6 0.4 0.2 0.0

0

5

10

15

20Posterior

Prior

Scale parameter

0 2 4 6

0.0

0.5

1.0

1.5

2.0 Posterior

Prior

Location parameter

10 20 30 40 50

0.0

0.5

1.0

1.5

2.0 Posterior

Prior

TABLE 2 Comparison of the prior, posterior and maximum likelihood estimates(to two significant figures) for the GEV model parameters.

Parameter k 99.5% quantile

Prior 0.33 4.8 24Posterior 0.25 (0.020) 5.3 (0.16) 4.9 (0.19) 62.5MLE 0.20 (0.024) 5.1 (0.15) 4.8 (0.18) 53.3

Actual value 0.2 5 5 52.1

Simulation means are shown for the posterior; the mean for the prior was found with a discrete

approximation. Standard errors/standard deviations are given in parentheses for the maximum

likelihood and posterior estimates respectively. The estimates for the 99.5% quantile are also

shown, together with the actual value.

Equation (5), together with the bisection method, was again used to calculate the99.5% quantile for our annual maximum. Discarding the first 200 of 3,000 MCMC

realizations, we have s = 2,800 and an estimate of 62.5 (to three significant figures)

for the 99.5% quantile. Our prior beliefs have, in this instance, raised our estimate

for the extreme quantile. We note that this methodology might be particularly useful

where the experts believe that conditions have changed since the loss data was

collected and they wish to add their own input to the estimation process. Our example

results may be the result of expert belief that the losses to be suffered in the future will

be larger than the collected loss data might indicate incorporating their prior beliefsinto our estimation process has resulted in a higher estimate for the 99.5% quantile.

Coles and Tawn (1996) were able to elicit prior information of the kind we have

illustrated here, from their hydrology expert. It remains to be seen whether experts

in other fields, including those for operational losses, will be able to do the same.





11/17


5 EXTERNAL DATA

Data sparseness is a particular problem when data is heavy tailed. With only a

limited amount of data, it is very hard to determine the properties of the tail of the

distribution. Banks are increasingly attempting to supplement their loss databaseswith external sources of loss data.

These efforts to catalogue the operational loss experience of the industry typically

fall into two categories: databases that use public sources, such as newspapers and

press releases, and consortium databases, based on losses contributed by member

banks, which are then pooled and shared. For both types of databases, the larger

losses are usually more likely to be catalogued, and the threshold above which losses

are reported may or may not be known.

We need to be careful when utilizing external data. The industry is beginningto realize that one cannot just take a tail event from another bank and look at it

as a signal to what could happen internally. Rather, a better approach might be to

consider tail events occurring only within peer firms that are of a similar size and

operate in a similar business environment, with a comparable scope of business

activities. Attempts to scale data have also been made, although no consensus on

the best practice for this has yet been reached.

Bayesian methods may be used to incorporate the use of external data into the

creation of the model for internal losses. Depending on the source and applicability

of the external data, it may be possible for external data to provide us with priorinformation concerning our internal model parameters. This would be especially

desirable in the situation where there is little internal data, resulting in a bad fit

when using the internal data alone. There are various ways in which information

from external data can be used to come up with a prior distribution.

We may only have, or have confidence in, external data above a high (known)

threshold. In this case, a peak over threshold (POT) approach (Embrechts et al

(2003)) is suggested. Alternatively, we may not know the threshold for the data (or

it may not be of a fixed value) this may be the case for external databases derivedfrom public sources. Here we may prefer to use the largest loss suffered in each year,

and we could consider fitting a GEV distribution to the annual maximum losses, as

we did in Section 4.

5.1 Losses over a threshold

In this section, we will illustrate a possible Bayesian approach with a simple example

where we have external data over a known threshold. Suppose we have only a small

number of observations (internal data) from the distribution we wish to fit. With onlya small amount of observations from a heavy-tailed distribution, a good fit based

on the internal data alone is less likely. Suppose that, in addition, we have a large

quantity of observations (external data) independently sampled from the tail of a

distribution, which we believe has the same shape parameter.





12/17

38 C.-L. Liang

We will use external data to provide us with information on the shape parameter.

The main assumption being made is that the internal and external data reflects

observations from distributions that have the same shape parameter. We will assume

non-informative priors for the location and scale parameters.

A random variable X is said to have a generalized Pareto distribution (GPD) with

shape parameter k, location parameter and scale parameter , if the distribution

function is given by:

F (x | k, , ) =

1

1

k(x )

1/kifk = 0

1 exp(x )

ifk = 0(13)

where > 0 and for k 0, x > 0 and for k > 0, 0 < x < / k. The extreme

value index, a function of the power of the tail decay, is k. The GPD reduces to the

Pareto distribution when k < 0.

Pickands (1975) proved that the distribution of exceedances above a threshold

will tend to the GPD, as the threshold tends to infinity, provided the underlying

distribution function belongs to the domain of attraction of the GEV distribution.

The class of distributions belonging to this domain is fairly large, making the theorem

quite widely applicable.

We will consider a couple of ways of using external data to determine a priordistribution for the shape parameter:

Method 1. We may fit the GPD to the external data, in order to determine

the MLE of the shape parameter for the internal model. We can take the prior

distribution to be normal, with mean given by the MLE of the shape parameter

and standard deviation given by the estimated standard errors in our estimation.

Method 2. Estimators of the extreme value index could be used to provide

information on the shape parameter. Such estimators abound in the literature;

they include those proposed by Pickands (1975), Hill (1975) and Huisman et al(2001). The estimates for the extreme value index over a range of threshold

values may, for example, be used to generate the prior distribution for the shape

parameter.

5.2 Simulation study

A small simulation study was conducted to illustrate the use of external data in a

Bayesian approach to parameter estimation. For Y Gamma(, ), we will say that

X = exp(Y ) is log gamma, with parameters , and . is the shape parameterfor the log gamma distribution, and 1/ is the extreme value index.

Our internal data consisted of 100 iid observations simulated from the

log gamma( = 1.2, = 2, = 7) distribution. We fitted the log gamma distribution

to this data by MLE. Figure 5 (see page 39) shows the histogram and empirical cdf





13/17


FIGURE 5 Histogram of our internal data and the pdf of the fitted log gammadistribution (left). Empirical cdf for internal data and the fitted cdf (right).

3.0 3.2 3.4 3.6 3.8 4.0 4.2

0

2

4

6

PDF

log 10(x)

3.2 3.4 3.6 3.8 4.0 4.2

log 10(x)

CDF

0.0

0.2

0.4

0.6

0.8

1.0

of the internal data, together with the fitted pdf and cdf. There was no statistically

significant evidence for the KolmogorovSmirnov goodness of fit test that the

data did not come from the fitted distribution. The MLE were = 1.11, = 1.70,

= 7.00 (to three significant figures).

We then generated 10,000 iid observations from the tail (50,000) of the

log gamma(1.8, 2, 7.5) distribution this was our external data. We have assumed

that the shape parameter, , is the same for the models generating the internal and

external losses.

For method 1, we fitted the GPD distribution to the exceedances over 50,000.

The resulting MLE for the extreme value index was found to be 0.527, with

an estimated standard error of 0.0154 (to three significant figures). We assumed

that 1/ had a normal prior distribution with mean 0.527 and standard deviation

0.0154, N(0.527, 0.01542), with and having non-informative prior distributions,

N(0, 104).

For method 2, the results from the estimation for the extreme value index,

proposed by Pickands (1975), Hill (1975) and Huisman et al (2001), are shown in

Figure 6 (see page 40). Taking the Huisman estimates, for example, we can form a

non-parametric prior distribution for the extreme value index. and will again be

assumed to have non-informative prior distributions, N(0, 104).

Normal transition kernels were used, with initial parameter values of 1 = 1,1/1 = 0.7, 1 = 1. Two thousand realizations of the MCMC simulation for meth-

ods 1 and 2 are shown in Figure 7 (see page 41). Discarding the first 500 MCMC

realizations gives us the estimates for the log gamma parameters; the results are

summarized in Table 3 (see page 40).





14/17

40 C.-L. Liang

FIGURE 6 Estimates for the extreme value index derived from the Pickands, Hill andHuisman estimators, using external data above varying thresholds.

Number of losses above threshold

Extremevalueindex

0 200 400 600 800 1,000

0.3

0.4

0.5

0.6

0.7

HillHuisman

Pickands

0.46 0.48 0.50 0.52 0.54 0.56

40

30

20

10

0

Extreme value index

The unbroken line shows the actual parameter value, and the dashed line shows the MLE from fittinginternal data (left). The histogram for the Huisman estimates for thresholds resulting in 301,000 losses

above the threshold (right).

TABLE 3 Comparison of the Bayesian estimates (to three significant figures) for thelog gamma model parameters, for priors obtained using methods 1 and 2, with theestimates obtained through MLE.

Parameter 99.5% quantile

MLE 1.11 1.70 7.00 29,200Prior method 1 1.26 1.89 6.99 24,800Prior method 2 1.28 1.91 6.99 24,600

Actual value 1.2 2 7 19,600

The estimates for the 99.5% quantile are 24,800 and 24,600 (to three significant

figures) for methods 1 and 2 respectively. The estimate from maximum likelihood

fitting on the internal data alone is 29,200; the actual 99.5% quantile for thelog gamma(1.2, 2, 7) distribution is 19,600 (to three significant figures).

In this example, external data has helped us to finetune our model for internal

losses. However, it should be noted that, in our example, the external data that we

simulated actually had relevance for our internal model (external and internal shared





15/17


FIGURE 7 MCMC realizations of GEV parameters for prior information from exter-nal data with method 1 (left) and method 2 (right): Top panels, ; middle panels, ;bottom panels, .

Alpha Alpha

1.0

1.5

2.0

1.0

1.5

2.0

Delta

1.4

1.6

1.8

2.0

2.2

1.4

1.6

1.8

2.0

2.2

Delta

Mu

Iteration

0 500 1,000 1,500 2,000

Iteration

0 500 1,000 1,500 2,000

Iteration

0 500 1,000 1,500 2,000

Iteration

0 500 1,000 1,500 2,000

Iteration

0 500 1,000 1,500 2,000

Iteration

0 500 1,000 1,500 2,000

6.6

6.7

6.8

6.9

7.0

6.6

6.7

6.8

6.9

7.0

Mu

The unbroken line shows the actual parameter value; the dashed line shows the MLE from fitting internal

data.





16/17

42 C.-L. Liang

the same shape parameter). We have also assumed that we do not have a lot of internal

losses, but we have a large quantity of external losses, in the tail of the distribution.

The model assumptions made here were known to be correct in the simulation

world. If the assumptions are not correct, our estimates will be biased by our

preconceptions or prior beliefs.

6 CONCLUSION

Bayesian analysis provides another way of viewing the problem of parameter

estimation. Most importantly, it allows for the incorporation of prior opinion into the

estimation, which some proponents view as its greatest strength (and some opponents

view as its greatest weakness).

In this paper, we have considered how Bayesian inference may be used in thefitting of extreme value distributions to extreme losses. The prior specification

may be based on expert opinion, external data sources or a combination of the

two. In particular, Bayesian methods provide a framework for the incorporation of

information derived from external and internal sources. Currently the combination of

data from various sources in operational risk has involved mostly pooling the data or

creating a mixture model. We would have to consider whether to and/or how to scale

the data, for the former methodology; for the latter, we would need to decide on the

mixture coefficient. Bayesian analysis provides another option though choices still

do have to be made by the modeler.The simulation study conducted here demonstrates how prior information may

be used to affect the fitted distribution and that, where the prior specification does

provide relevant information about the distribution of the losses, the estimation

of the model parameters can be improved. The development of simulation-based

techniques, in particular MCMC, has overcome the difficulty of computation of

the posterior, and it has made Bayesian techniques very popular in many areas of

application.

REFERENCES

Coles, S. G., and Tawn, J. A. (1996). A Bayesian analysis of extreme rainfall data. AppliedStatistics 45(4), 463478.

Embrechts, P., Klppelberg, C., and Mikosch, T. (2003). Modelling Extremal Events forInsurance and Finance. Springer-Verlag, New York.

Fisher, R. A., and Tippett, L. H. C. (1928). Limiting forms of the frequency distribution ofthe largest or smallest member of a sample. Proceedings of the Cambridge Philosophical

Society 24, 180190.Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution.

Annals of Statistics 3, 11631174.

Huisman, R., Koedijk, K. G., Kool, C. J. M., and Palm, F. (2001). Tail-index estimates in smallsamples. Journal of Business and Economic Statistics 19(1), 208216.





17/17


Peters, G. W., and Sisson, S. A. (2006). Bayesian inference, Monte Carlo sampling andoperational risk. The Journal of Operational Risk 1(3), 2750.

Pickands, J. (1975). Statistical inference using extreme order statistics. Annals of Statistics 3,199131.

Shevchenko, P., and Wthrich, M. (2006). The structural modelling of operational risk viaBayesian inference: combining loss data with expert opinions. The Journal of OperationalRisk1(3), 326.


jop43liangweb

Documents

Transcript of jop43liangweb