Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week...

52
Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008

Transcript of Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week...

Page 1: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Universidad Complutense de Madrid

Máster en Ingeniería MatemáticaCurso 2007-2008

Modelling Week Second Edition

June 16 – June 24, 2008

Page 2: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Problem raised by Accenture.

Coordinators:

Ignacio Villanueva (UCM).

Estela Luna (Accenture).

Page 3: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Team members:

Elena Bartolozzi (Universitá di Firenze) Matthew Cornford (University of Oxford) Leticia García-Ergüín (UCM) Cristina Pascual Deocón (UCM) Oscar Iván Pascual (UCM) Francisco Javier Plaza (UCM)

Credit Scoring Modelling for Retail Banking Sector

Page 4: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

Page 5: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

Page 6: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Our problem is concerned with who a bank should loan its money to.

When a client applies for a loan, the bank would like to be sure that the client will pay back the full amount of the loan.

We need effective models that allow us to predict if a client will pay back the loan.

What we have is historical data for several variables. We are trying to fit a model to this historical data so we

can estimate a probability of default.

Credit Scoring Modelling for Retail Banking Sector

Page 7: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

Page 8: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Our data is provided by Accenture and include details of completed loan agreements

The variables included are: Age Income Wealth Marital Status Length as a Client Amount of Loan Maturity Default

Credit Scoring Modelling for Retail Banking Sector

Page 9: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Sample Selection

We split the sample into two parts

The modelling sample

The validation sample

Credit Scoring Modelling for Retail Banking Sector

Page 10: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Modelling Sample A random sample from the data is selected. The size of the modelling sample is about 2/3 of the

original data This new sample is used to create the model.

Validation Sample The remaining data is used to validate the model We test how many defaults the model predicted and

which of them really did default.

Credit Scoring Modelling for Retail Banking Sector

Page 11: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

Page 12: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

We have a dependent variable, which is default, and some independent variables (age, income,…)

First of all, we do univariate analysis. For each variable, we calculate some statistics like

mean, standard deviation, skewness… We plot some histograms… This information can be use as a first check before

applying the model. It would be better if the data were homogeneous.

Credit Scoring Modelling for Retail Banking Sector

Page 13: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Univariate Analysis

We’ve used SAS software to generate these statistics:

output.htm

Credit Scoring Modelling for Retail Banking Sector

SAS Graph v9

Page 14: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Page 15: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

This kind of analysis is very useful to detect outliers or transcription mistakes.

Page 16: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Multivariate Analysis

Correlations

Credit Scoring Modelling for Retail Banking Sector

Page 17: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Chi-squared test We try to calculate which of the variables are explanatory

variables, i.e. which variables does default depend on. We use the chi-squared test for that:

To begin with, we must discretize the continuous variables using percentiles.

After doing Chi-squared test, we look at the p-value. If p-value<0.05, we reject independency If p-value>0.05 we do not reject independency.

Credit Scoring Modelling for Retail Banking Sector

22

1

( )ni i

i i

O E

E

Page 18: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Page 19: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Page 20: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Page 21: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Page 22: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Page 23: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Page 24: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

Page 25: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

According to the results of the Univariate and Multivariate Analysis, the variables we include in our model are:

Age Income Wealth Marital Status Maturity

Credit Scoring Modelling for Retail Banking Sector

Page 26: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

We apply a logit model using proc logistic in SAS and glmfit in MATLAB as well, obtaining the same results.

Credit Scoring Modelling for Retail Banking Sector

kk

jj

k

j

jj

k

j

jj

k

j

k

xxx

xxLogit

x

x

xxx

x

x

x

...)(1

)(log))((

)exp()(1

)(

)exp(1

)exp(

)(),...,(

110

0

0

0

0

Page 27: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Page 28: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Intercept -1.85136 Age -0.02678 Income 0.10025 Wealth -0.01761 Marital Status 0.79651 Maturity 0.00892

There must be some diferences because we randomize the sample.

Credit Scoring Modelling for Retail Banking Sector

Page 29: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

So, our model is as follows:

-

1P(Default/x)=

1+e

-1.85 -0.026*Age+0.1*Inc-0.017*Wlth+0.79*Marit+0.0089*MaturWhere

Page 30: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

Page 31: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Model statistics:

Page 32: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Powerstat is a method to measure the likelihood of the model

The data is sorted from worse to better according to the probability of default calculated with our model.

The perfect model will have the total amount of defaults at the beginning.

We plot accumulated defaults against accumulated observations.

Powerstat compares the area between the perfect model, our model and a random model.

Credit Scoring Modelling for Retail Banking Sector

Page 33: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Powerstat (Gini Index):

Credit Scoring Modelling for Retail Banking Sector

Page 34: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Validation

Once the probability of default for each client is found, the question is how to choose the level that classifies if a client will default or not.

We use the validation data to predict with our model how many observations will default and compare with which of them are really did default.

Repeating the process with several random samples, the probability has very low deviation and rounds 0.77.

Credit Scoring Modelling for Retail Banking Sector

Page 35: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

Page 36: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The expected Loss is defined as:

EL = PD * EAD * LGD

PD is the percentage of default. Is defined as default probability calibrated for a year. EAD is the exposition to default. LGD are losses on the exhibition.

Credit Scoring Modelling for Retail Banking Sector

Page 37: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

Scoring allows us to sort people against default. However, these probabilities do not take into account when the

default happens. This is the reason for calibration. We want to obtain the yearly average probability of default We need a sample of people observed in periods of years. The model is applied and the sample is sorted by score. We obtain a default observed rate:

Minimizing the Least Squares Error with the MATLAB function fminsearch, we obtain the values:

A=0.0004 B=3.7410 C=2.7870

BscoreCARate )(

Page 38: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Credit Scoring Modelling for Retail Banking Sector

The Credit Scoring Model was solved quickly and didn’t cause too much difficult.

We asked Accenture to bring another, related problem.

We now introduce the Problem of Capital Allocation.

Page 39: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The Problem of Capital Allocation

Index The Problem of Capital Allocation Implementation Conclusions

Page 40: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The Problem of Capital Allocation

Index The Problem of Capital Allocation Implementation Conclusions

Page 41: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

In this problem a lender has a fixed amount of money to lend, EAD, between n blocks of similiar customers

¿How to distribute the money between the blocks to maximize the profit?

Each block has associated with it an interest rate ρi, an a priori probabilty of default PDi, the loss given default LGDi and the number of customers Ni.

If each customer in each block is independent of the rest then we can easily compute the probability of k defaults.

The Problem of Capital Allocation

Page 42: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The Problem of Capital Allocation But the customers are correlated via the economy. We can use

Gaussian Copula to introduce a default random variable for each customer:

Then for a particular state of the economy we have that the independent probability of default for each customer is:

1

mi

i j j i ij

Z a Y rw

( )i iZ PD Default

1

1

( )m

ii j j

ji

i

PD a Y

pr

Page 43: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The Problem of Capital Allocation We use the binomial distribution:

When N is big enough (in the order of 10^3) we can aproximate this binomial with normal random variable Di:

( ) (1 ) ii N kki i

NP k defaults p p

k

( , (1 ))i i i i i iD N N p N p p

Page 44: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The Problem of Capital Allocation

1

( ( ) )n

ii i i i i

i i

EADL LGDD N D

N

We define the loss distribution as:

As L is a sum of independent normal distributions,

2L( , )LL N

Page 45: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The Problem of Capital Allocation To measure risk we use Value at Risk (VaR) with a 99%

confidence level. So the problem becomes:

Where VaR99 is the fixed level of risk the lender is willing to take.

1

( )

: 1

-2.3262 + = 99

L

n

ii

L L

Minimise f

Subject to

VaR

Page 46: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The Problem of Capital Allocation

Index The Problem of Capital Allocation Implementation Conclusions

Page 47: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The Problem of Capital Allocation

We start with 3 blocks to make the problem easier.

We have to find the α’s that minimise the expected loss.

We have two approaches to solve this problem.

Page 48: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The Problem of Capital Allocation First we fix α’s and find the VaR99 and Expected Loss for

each set of α’s (Black dots).

Page 49: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The Problem of Capital Allocation Then we find the α’s that minimise the Expected Loss for any fixed

VaR99 (Red Dots) using the MATLAB function fmincon.

As we can see we got very good agreement between the two approaches, on the order of 10^(−4).

Page 50: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The Problem of Capital Allocation Here we have the results for 5 blocks, which took considerably

longer than with 3 blocks.

Page 51: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The Problem of Capital Allocation

Conclusions

Analytical method outperformed the simulation of wi as expected.

Optimise for more than 3 blocks the choice of optimiser needs to be investigated furhter.

Another interesting question is to look at the relationship between the efficient border and the interest rates charged for each block.

Page 52: Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

The Problem of Capital Allocation

¿Questions?