TransactionBasedAnalytics2010

60
TRANSACTIONS BASED ANALYTICS Vijay Desai, SAS Institute Presented at Kelly School of Business, Bloomington, IN Nov. 8, 2010 1

Transcript of TransactionBasedAnalytics2010

Page 1: TransactionBasedAnalytics2010

TRANSACTIONS

BASED

ANALYTICS

Vijay Desai, SAS Institute

Presented at Kelly School of

Business, Bloomington, IN

Nov. 8, 2010

1

Page 2: TransactionBasedAnalytics2010

AGENDA

Transactions landscape Transactions data

Problems to tackle

Transactions analytics Types of techniques

Performance measurement

Target definition Fraud

Credit risk

Attrition

Deploying the solution Using the scores in production

Monitoring the production system

2

Page 3: TransactionBasedAnalytics2010

Trasactions Data

Problems to Tackle

3

TRANSACTIONS LANDSCAPE

Page 4: TransactionBasedAnalytics2010

TRANSACTIONS DATA

Credit/Debit cards Authorisations

Payments

Statements

Non-monetary data

Bureau data

Demographic data

Campaign data

Clickstream data

Wire transfers

Financial transactions

4

Page 5: TransactionBasedAnalytics2010

PROBLEMS TO TACKLE

First party fraud

Second party fraud

Third party fraud

Credit risk, bankruptcy

Product offers, pricing

Money laundering

Financial trading violations

Bio-terrorism

Intrusion detection

5

Page 6: TransactionBasedAnalytics2010

PREDICTION VERSUS DETECTION

Detection examples Credit card fraud

Tax under-filing

Bio-terror attack

Prediction examples Charge-off, serious

delinquency

Cross-sell, up-sell propensity

??? Attrition

Fraud rings

Network intrusion

Time

Event Prediction Detection

6

Page 7: TransactionBasedAnalytics2010

Attrition

TARGET DEFINITION AND AVAILABILITY

7

Credit Risk Credit card fraud

Tax under-filers

Network intrusion

Bio-terror attack

Page 8: TransactionBasedAnalytics2010

FIRST PARTY FRAUD

Committed on own account

Victimless fraud

Examples

Fictitious identities

Check kiting

Bust out fraud

Tax under-filing

8

Page 9: TransactionBasedAnalytics2010

SECOND PARTY FRAUD

Committed by someone known to or close to

genuine account holder

Examples

Employee abuse of corporate cards

Relatives abusing cards/data of children, siblings,

parents

Caregivers abusing cards/data of senior citizens

9

Page 10: TransactionBasedAnalytics2010

THIRD PARTY FRAUD

Committed on some one else’s account

Impersonation of genuine identity

Examples

Identity theft

Lost/stolen cards/accounts

Stolen data/account information

Online fraud with infected PCs

10

Page 11: TransactionBasedAnalytics2010

FRAUD TYPES: DEBIT CARD EXAMPLE

11

Source: First Data Resources

Page 12: TransactionBasedAnalytics2010

GLOBAL CARD FRAUD

12

Page 13: TransactionBasedAnalytics2010

US CARD FRAUD LOSSES

13

Source: Kansas City Federal Reserve

Page 14: TransactionBasedAnalytics2010

CARD FRAUD LOSSES FOR SELECT COUNTRIES

14

Source: Kansas City Federal Reserve

Page 15: TransactionBasedAnalytics2010

CREDIT RISK

Existing accounts

Serious delinquency

Bankruptcy

Charge-off

New accounts

Delinquency in first six months

Bankruptcy in first six months

Charge-off in first six months

15

Page 16: TransactionBasedAnalytics2010

CREDIT LIMIT AND BALANCES

16

Page 17: TransactionBasedAnalytics2010

DELINQUENCY STATUS

17

Page 18: TransactionBasedAnalytics2010

ATTRITION/CHURN RISK

Closed/Cancelled account

Loss of revenue due to sharp and lasting

reduction in balance and activity

18

Page 19: TransactionBasedAnalytics2010

NUMBER OF ACCOUNTS

19

Page 20: TransactionBasedAnalytics2010

OPENED AND CLOSED ACCOUNTS

20

Page 21: TransactionBasedAnalytics2010

Techniques

Performance measurement

Target definitions

21

TRANSACTIONS ANALYTICS

Page 22: TransactionBasedAnalytics2010

TYPES OF TECHNIQUES

Rules

Supervised learning models

Regression, decision trees, neural networks, SVM

Unsupervised learning models

Clustering, PCA, neural networks

Semi-supervised learning models

Association rules/Market basket analysis

Optimization

22

Page 23: TransactionBasedAnalytics2010

PREDICTION/DETECTION TECHNIQUES

Un-supervised

Input Layer

Feature

Layers

Input Layer

Feature

Hidden

Input

Layer

Output

Layer

Feature

Hidden

Layers

Input

Layer

Output

Layer

Feature

Hidden

Layers

Input Layer

Feature

Layers

Input Layer

Feature

Hidden

Semi-supervised

Supervised

Input

Layer

Output

Layer

Feature

Hidden

Layers

Input

Layer

Output

Layer

Feature

Hidden

Layers

23

Page 24: TransactionBasedAnalytics2010

TYPICAL RULE BASED SYSTEM

Pros

Easy to understand

Can be a batch or automated system

Effective in catching the obvious cases

Cons

Too many false-positives

Likely to miss many risky cases

Does not provide priority for investigation

Difficult to maintain

24

Page 25: TransactionBasedAnalytics2010

RULES FOR MEASURING SUCCESS

All ”goods”

and “bads”

unknown

All ”goods”

and “bads” known

25

Page 26: TransactionBasedAnalytics2010

PERFORMANCE MEASURES

How good is the score at separating the two classes of goods and bads?

Information value

Kolmogorov–Smirnov statistic

Lift curve

ROC curve

Gini coefficient

Somer’s D-concordance statistic

How good is the score as a probability forecast?

Binomial and Normal tests

Hosmer-Lemeshow test

How good is the score and cut-offs in business decisions?

Error rates

Swap set analysis

26

Page 27: TransactionBasedAnalytics2010

INFORMATION VALUE

Divide the score into i bands

27

Page 28: TransactionBasedAnalytics2010

PERFORMANCE DEFINITIONS

Let F(s|G) ( F(s|B)) be distribution functions of scores (s) of goods, (G) ( bads (B)) in a scorecard

28

Page 29: TransactionBasedAnalytics2010

KOLMOGOROV SMIRNOV STATISTIC

Kolmogorov Smirnov statistic (KS)

29

Page 30: TransactionBasedAnalytics2010

LIFT CURVE

Plots percentage bads rejected versus percentage rejected

Ideal score given by ABC where B represents population bad rate

Random score represented by AC

Accuracy ratio AR=2(Area of curve above diagonal)/Area of ABC

30

Page 31: TransactionBasedAnalytics2010

ROC CURVE

ABC represents ideal score

Diagonal represents random score Gini coefficient (GC) measures twice

the ratio of area between curve and diagonal to area ABC GC=1 corresponds to perfect score

GC=0 represents random score

Somer’s D-concordance (SD) If “good” and “bad” chosen at

random, good will have lower score/probability of being bad than bad

AUROC is area under ROC curve GC=2AUROC-1=SD

31

Page 32: TransactionBasedAnalytics2010

BINOMIAL TEST

Checks if predicted bad rate in a given bin i is

correct versus underestimated

Let there be

bads in the observations of

bin i and the probability of a borrower in that

band being good

The predicted bad rate in bin i is correct if it the

number of bads k in bin i is less than or equal

to

32

Page 33: TransactionBasedAnalytics2010

NORMAL TEST

Approximation of Binomial

The predicted bad rate in bin i is correct if it the

number of bads k in bin i is less than

Where is the inverse of the cumulative

normal distribution

33

Page 34: TransactionBasedAnalytics2010

HOSMER-LEMESHOW TEST

Assess whether observed bad rates match

expected bad rates in each of ten bins

A chi-square test statistic

Let

34

Page 35: TransactionBasedAnalytics2010

SIMPLE SAS EXAMPLE-I

35

Page 36: TransactionBasedAnalytics2010

SIMPLE SAS EXAMPLE-II

36

Page 37: TransactionBasedAnalytics2010

ERROR RATES

Account False Positive Ratio (AFPR): The ratio of good to bad accounts for a given cut-off score

A ratio of 10:1 would indicate that out of 11 accounts, 1 is bad, 10 are good

Account Detection Rate (ADR): The ratio of bad accounts to the total number of bad accounts for the period at a given cut-off score.

If there are 100 bad accounts in the time period and 30 of them are at or above the cut-off score at some time during the period, the ADR is 30%

Value Detection Rate (VDR): Percentage of dollars saved on detected bad accounts for a given cut-off score

Assuming that the total losses on all accounts are $1,000,000 and that $600,000 of these are saved by the system, the VDR would, consequently, be equal to 60%

37

Page 38: TransactionBasedAnalytics2010

SWAP SET ANALYSIS

Used to compare two competing scores

Choose top x% accounts using score1 and score2

Eliminate the common bads and goods

Compare the two data sets to identify bads caught by score1 but not score 2 and vice versa

Score1 is better than score2 if it has a higher bad rate in the swap set

38

Page 39: TransactionBasedAnalytics2010

TARGET DEFINITION: CARD FRAUD

39

Pre-fraud Fraud window Post Block

Date/time of

first fraudulent

transactions

Block date/time

All transactions are

declined / blocked

Fraud activities has not been detected or confirmed yet. The approved fraudulent

transactions during this window are the fraud losses. Legitimate transactions could

exist in this period. (For the fraud case with no loss, there is no fraud window.)

All transactions are

legitimate

Page 40: TransactionBasedAnalytics2010

TARGET DEFINITION: CREDIT RISK

Bad: Account becomes at any time during the outcome window 3+ cycles delinquent

Bankrupt

Charged-off

Indeterminate accounts Maximum of 2 cycles delinquent in the outcome window

Fraud or Transfer status in the outcome window

Inactive accounts

Indeterminate accounts will be excluded from off-sample validation and off-time validation

Other accounts are Good

40

Page 41: TransactionBasedAnalytics2010

41

TARGET DEFINITION: ATTRITION RISK

Account closure Many banks/vendors use this to define “Bad” accounts

Silent attrition Many banks/vendors use this to define “Bad” accounts

Silent attrition defined as a sharp and lasting drop in economic value (balance and activity) of accounts that were valuable in prior periods

Many banks/vendors refine this definition to exclude accounts that have other reasons for change in economic value of account

Many banks/vendors use both to define “Bad” accounts

All other non-fraudulent active current accounts are classified as “Good” accounts

Page 42: TransactionBasedAnalytics2010

Using the scores in production

Monitoring the production system

42

DEPLOYING THE SOLUTION

Page 43: TransactionBasedAnalytics2010

43

SCORE USES

Typical use of scores is in strategies to manage decisions concerning: Whether to approve/decline authorizations Whether to approve/decline over-limit requests Actions to make delinquent accounts current Increase/decrease credit limits Whether to reissue credit cards Collections related actions

Credit risk score is the most frequently used score for above strategies. Some banks also use attrition, revenue and profit scores

Scores also used in other strategies such as retention, balance transfer, balance build, convenience checks, and cross-sell/up-sell optimization

Fraud scores are used for approve/decline/refer decisions

Page 44: TransactionBasedAnalytics2010

BENEFITS FROM REAL TIME SCORING

44

Page 45: TransactionBasedAnalytics2010

WHY DO BOTH RULES AND SCORING?

Rules allow the input of client specific intellectual property and operation constraints

Rules allow tracking and adjustments for new or short term risk patterns

Models pick up non-obvious risk patterns and behaviors

Output from advanced models easy to translate into probability & log odds scores

Scores can be used very easily to rank order entities

The combination of rules and scores provides better detection rate and better quality referral

Business implication - with the same amount of resource,

Catching more risk activity

Catching them earlier Faster way to get a good ROI

45

Page 46: TransactionBasedAnalytics2010

AUTHORIZATION STRATEGY EXAMPLE

46

Page 47: TransactionBasedAnalytics2010

OVERLIMIT STRATEGY EXAMPLE

47

Page 48: TransactionBasedAnalytics2010

RETAIL/CHECK STRATEGY EXAMPLE

48

Page 49: TransactionBasedAnalytics2010

CREDIT LIMIT STRATEGY Risk Score Low Medium High

Credit Limit Utilization Low High Low High

Delinquency Status Clean Dirty Clean Dirty Clean Dirty Clean Dirty

Credit Line Inc. 0 500 0 1000 500 1500 1000 5000 2500

Implemented in the form of decision trees/strategies

Champion/Challenger framework for improving strategies over time

Randomly assign accounts to champion or challenger strategy

Measure performance over time

Takes a six to twelve months to evaluate each challenger strategy

A very small number of potential champion strategies can be tested at a

given time

Difficult to analyze why a particular challenger strategy worked

49

Page 50: TransactionBasedAnalytics2010

EXPANDING BEYOND THE “COMFORT ZONE”

Risk Score Low Medium High

Credit Limit Utilization Low High Low High

Delinquency Status Clean Dirty Clean Dirty Clean Dirty Clean Dirty

Champion Credit Line Inc. 0 500 0 1000 500 1500 1000 5000 2500

Test Group 1 0 0 0 500 0 500 0 2500 1000

Test Group 2 0 0 0 500 0 1500 0 3000 1500

Test Group 3 0 0 0 1500 0 2000 1500 4000 2000

Test Group 4 500 1000 500 2500 1000 3000 2000 7000 3000

Test Group 5 500 1500 1000 3000 1500 4000 2500 8000 4000

Test Group 6 500 2000 1500 4000 2500 5000 3000 9000 5000

50

Page 51: TransactionBasedAnalytics2010

NON-LINEAR PROGRAMMING EXAMPLE (A)

Credit limit increases are a

continuous variable

Randomly choose a small

number of accounts for

optimization

Use Lagrangian relaxation

techniques

Adding more constraints can

make solution more difficult

Map optimal solution to a

decision tree to score all

accounts

Deploying decision tree in

lieu of solution can result in

significant loss in benefit of

the whole effort

51

Page 52: TransactionBasedAnalytics2010

LINEAR PROGRAMMING EXAMPLE I (B)

Only discrete credit limit

increases allowed

Subset of LP problem has

integer solutions most of the

time

Account level optimization

possible

Solve relaxed LP problem

and check feasibility for

remaining constraints

No need to map optimal

solution to a score

52

Page 53: TransactionBasedAnalytics2010

MONITORING THE SYSTEM

Monitoring the scoring system Stability index of score

Stability index of input fields

Remedies for score deterioration

Monitoring the portfolio Population stability report

Characteristic analysis report

Final score report

Delinquency distribution list

Roll rates

Vintage analysis

Reports by portfolio segments, risky segments

53

Page 54: TransactionBasedAnalytics2010

CHARACTERISTIC ANALYSIS REPORT

Stability index

Characteristic reports

54

Page 55: TransactionBasedAnalytics2010

REMEDIES FOR SCORE DETERIORATION

Score shelf life depends upon the problem

Fraud scores have lower shelf life because fraudsters constantly change techniques

Credit scores have longer shelf life because causes do not change much over time

Remedies

Recalibrate the score Least expensive, easiest to implement

A table mapping the old score to a new score

Retrain the model More expensive, straightforward to implement

Keep same variables, simply change the weights/coefficients

Rebuild the model Most expensive, needs the full implementation cycle

New models with new variables and new weights/coefficients

55

Page 56: TransactionBasedAnalytics2010

QUARTERLY REPORTS

Population stability report

Measures change in score distribution over time

Characteristic analysis report

Measures changes in individual input fields over time

Final score report

Measures how closely the score is used in production

E.g., show number of accepts and rejects by application score band

Delinquency distribution report

Measures the portfolio quality by different score ranges

56

Page 57: TransactionBasedAnalytics2010

QUARTERLY REPORT EXAMPLE

Monitor change in population

57

Page 58: TransactionBasedAnalytics2010

NET FLOW RATE REPORT

Month Total Active 0 Days 30 Days 0 to 30 60 Days 30 to 60 90 Days 60 to 90 120 Days90 to 120 Charge-off120 to Charge-off

Jan-02 5,000,000 3,223,095 2708576 138010 62592 20993 15504 20304

Feb-02 4953109 3,042,517 2572243 135248 4.99% 53557 38.81% 22461 35.88% 20993 100.00% 15504 100.00%

Mar-02 4904891 3,113,894 2540610 149907 5.83% 50032 36.99% 20013 37.37% 20384 90.75% 10391 49.50%

Apr-02 5053111 2,871,802 2372516 156405 6.16% 32108 21.42% 15676 31.33% 12809 64.00% 16991 83.35%

May-02 4757579 3,499,756 3020579 107666 4.54% 49620 31.73% 30997 96.54% 15676 100.00% 12029 93.91%

Jun-02 4797435 2,705,767 2319788 159521 5.28% 35672 33.13% 23269 46.89% 10495 33.86% 12967 82.72%

Jul-02 4893318 3,413,728 2916158 146442 6.31% 49193 30.84% 21039 58.98% 16096 69.17% 10495 100.00%

Aug-02 4873484 2,995,243 2565883 91843 3.15% 48012 32.79% 26098 53.05% 21039 100.00% 15735 97.76%

Sep-02 4782782 3,474,030 2804788 173177 6.75% 44291 48.22% 33136 69.02% 21253 81.44% 14616 69.47%

Oct-02 4988121 3,365,931 2999460 118388 4.22% 38906 22.47% 23146 52.26% 15841 47.81% 14074 66.22%

Nov-02 5239903 2,991,770 2584154 152951 5.10% 46657 39.41% 17197 44.20% 14658 63.33% 15841 100.00%

Dec-02 4943682 3,204,539 2734118 141276 5.47% 48221 31.53% 23593 50.57% 12658 73.61% 14658 100.00%

4.99% of current accounts in Jan ’02 become 30 days delinquent in Feb ‘02

3,223,095 accounts roll into 12967 charge-offs with annualized charge-

off rate of 4.8%

58

Page 59: TransactionBasedAnalytics2010

VINTAGE CURVE REPORT

Vintage Curve

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

0 5 10 15 20 25 30 35 40

Months on Books

Cu

mu

lati

ve %

Lo

sses

Months Cohort #1

Months Cohort # 2

Months Cohort #3

59

Page 60: TransactionBasedAnalytics2010

Q&A

60