Predictive Credit Risk Scoring using SAS Enterprise Miner
-
Upload
amal-shanker -
Category
Business
-
view
1.347 -
download
5
description
Transcript of Predictive Credit Risk Scoring using SAS Enterprise Miner
CREDIT SCORINGUSING SAS ENTERPRISE MINER
AMAL SHANKERDESHBANDHU PACHAURI
LENDING CLUB: INTRODUCTIONFounded in 2007An online financial communityBringing together creditworthy borrowers and savvy investors
LATEST COMPANY STATISTICSLoans funded to date: $2,595,182,275Loans funded last month: $203,355,750Interest paid to investors since inception: $229,080,795
Image from https://www.lendingclub.com/public/how-peer-lending-works.action
AIM To build predictive decision models using SAS Enterprise miner that will
be the best indicator of Credit Worthiness.
Compare Regression analysis to Decision tree model and select the one
that predicts accurately.
DATA 42539 customers data 2007 to 2011 59 Variables
• TARGET VARIABLE = Bad Flags (Given)• Remaining were 58 INPUT VARIABLES• 58 was an enough big number to deal with.• Reducing this number to best 4 or 5 was the
first target.• METHODOLOGY USED:
• INTUITIVE METHODS• VARIABLE CATEGORIZATION• SAS FUNCTIONS
VARIABLE SELECTION
• INTUITIVE METHODSAll the variables were checked for any sort of preliminary data inconsistencyVARIABLES DISCARDED
VARIABLES WITH “SIGNIFICANTLY HIGH” MISSING VALUES Months since last delinquency (26929) Months since last record (38887)
VARIABLE WITH DIFFICULTY IN ROLE ASSIGNMENT Employment length
NON-USEFUL VARIABLES State, Member ID etc.
VARIABLES WITH SAME VALUES FOR ALL ROWS OR ALMOST ALL ROWS Tax liens Charge off within 12 months
VARIABLE SELECTION (Contd.)
VARIABLE SELECTION (Contd.)VARIABLES CATEGORIZATION
PRE-APPROVAL VARIABLESFICO range high, Annual Income etc.
DERIVATIVE VARIABLESCredit grade, Credit Sub-grade, Interest rate
POST-APPROVAL VARIABLESPrincipal paid, Interest paid
FINAL OUTCOME• 16 pre-approval variables
VARIABLE SELECTION (Contd.) SAS FUNCTIONS
16 pre-approval variables were then assessed using following SAS functions:
STATEXPLORE To check worthiness of the variables
VARIABLE CLUSTERING To identify and group variables with high degree of correlation
INPUT VARIABLES
STAT EXPLOREVARIABLE
CLUSTERING
STATEXPLORE: WORTH ANALYSISWORTH OF VARIABLES ANALYZED1. SUBGRADE2. FICO RANGE HIGH 3. FICO RANGE LOW4. PURPOSE5. REVOLVNG UTILITY6. PUBLIC RECORD BANKRUPTCIES7. ANNUAL INCOME8. PUBLIC RECORDS9. OPEN ACOUNT10. TOTAL ACCOUNT11. DTI12. REVOLVING BALANCE13. LOAN AMOUNT14. HOME OWNERSHIP15. DELINQUENCY IN 2 YEARS16. DELINQUENCY AMOUNT
CLUSTER 01. DELINQUENCY AMOUNT
CLUSTER 12. FICO RANGE HIGH3. FICO RANGE LOW4. REVOLVING UTILITY BALANCE
CLUSTER 25. OPEN ACCOUNTS6. TOTAL ACCOUNTS7. DTI
CLUSTER 31. PUBLIC RECORDS BANKRUPTCY2. PUBLIC RECORDS
CLUSTER 43. ANNUAL INCOME4. LOAN AMOUNT5. REVOLVING BALANCE
CLUSTER 56. DELINQUENCY SINCE 2 YEARS
VARIABLE CLUSTERING: CLUSTER ANALYSIS 1
PURPOSE AND HOME OWNERSHIP TOO!!!
VARIABLE CLUSTERING: CLUSTER ANALYSIS 2
CLUSTER 11. FICO RANGE HIGH2. DELINQUENCY SINCE 2 YEARS3. PUBLIC RECORDS BANKRUPTCY
CLUSTER 24. ANNUAL INCOME5. OPEN ACCOUNTS
PURPOSE ANDHOME OWNERSHIP STILL UNDER CONSIDERATION!!!
BEGINNING
PRE-APPROVAL VARIABLES
WORTH ANALYSIS
CLUSTER ANALYSIS 1
CLUSTER ANALYSIS 2
58161574
SUMMARY: VARIABLE SELECTION
FINAL 4 INPUT VARIABLES
FICO RANGE HIGH PURPOSE ANNUAL INCOME HOME OWNERSHIP
SAS DIAGRAM
IMPUTEDATA
PARTITION
DECISION TREE
REGRESSION
LOAN V3INPUT VARIABLES
MODEL COMPARISION
WORKSPACE DIAGRAM
Good 6373 1500 9,559,500.00Bad 565 10000 5,650,000.00Total 6938 3,909,500.00
563.49EARNINGS PER CUSTOMER
DECISION TREE ANALYSIS
MODEL PROFITABILITY CALCULATIONS
CUMULATIVE LIFT
DECISION TREE OUTPUT
OBS NAME LABEL NRULES IMPORTANCE VIMPORTANCE RATIO
1 IMP_fico_rangehigh Imp: fico_rangehigh 1 1 1 1
2 IMP_annual_inc Imp: annual_inc 1 0.2694 0.2016 0.7484
3 IMP_purpose Imp: purpose 1 0.193 0 0
4 IMP_homeownershipImp: home_ownership 1 0.1075 0.1139 1.0601
VARIABLE IMPORTANCE
REGRESSION ANALYSISCumulative % Cumulative Number of Mean Posterior
Depth Gain Lift Lift Response % Response Observations Probability PRODUCT5 124.595 2.24595 2.24595 27.9671 27.9671 851 0.28826 245.3093
10 106.193 1.87791 2.06193 23.3843 25.6757 851 0.2125 180.837515 88.735 1.53819 1.88735 19.1539 23.5018 851 0.19381 164.932320 79.298 1.50988 1.79298 18.8014 22.3267 851 0.17844 151.852425 69.107 1.2834 1.69107 15.9812 21.0576 851 0.16588 141.163930 57.751 1.00973 1.57751 12.5734 19.6436 851 0.15506 131.956135 50.204 1.04871 1.50204 13.0588 18.7038 850 0.14529 123.496540 45.111 1.09466 1.45111 13.631 18.0696 851 0.13596 115.70245 39.263 0.9248 1.39263 11.5159 17.3413 851 0.12699 108.068550 33.734 0.83987 1.33734 10.4583 16.653 851 0.11909 101.345655 31.099 1.04748 1.31099 13.0435 16.3248 851 0.11126 94.6822660 27.33 0.85874 1.2733 10.6933 15.8555 851 0.10387 88.3933765 23.85 0.821 1.2385 10.2233 15.4222 851 0.09661 82.2151170 19.867 0.68025 1.19867 8.4706 14.9261 850 0.08954 76.10975 17.286 0.81156 1.17286 10.1058 14.6047 851 0.0824 70.122480 12.746 0.44667 1.12746 5.5621 14.0395 851 0.0746 63.484685 9.481 0.5725 1.09481 7.1289 13.6329 851 0.06686 56.8978690 6.754 0.60395 1.06754 7.5206 13.2933 851 0.05864 49.9026495 3.42 0.43409 1.0342 5.4054 12.8781 851 0.04906 41.75006
100 0 0.34957 1 4.3529 12.4523 850 0.03406 28.95111911 1101.121
Good 10809.88 1500 16214819Bad 1101.121 10000 11011210Total 11911 5203609
436.8742EARNINGS PER CUSTOMER
CUMULATIVE LIFT
MODEL PROFITABILITY CALCULATIONS
MODEL COMPARISION
Event Classification
ModelSelection based on Valid: Misclassification Rate (_V MISC_)
Model Model Data FALSE TRUE FALSE TRUE
Node Description Role Target Negative Negative Positive Positive Tree2 Decision Tree TRAIN Bad_Flag 3176 22345 0 0
Tree2 Decision TreeVALIDATE Bad_Flag 2119 14898 0 0
Reg2 Regression TRAIN Bad_Flag 3176 22344 1 0
Reg2 RegressionVALIDATE Bad_Flag 2119 14898 . 0
MODEL COMPARISION
REGRESSION VS DECISION TREE MODELGood 10809.88 1500 16214819Bad 1101.121 10000 11011210Total 11911 5203609
436.8742EARNINGS PER CUSTOMER
Good 6373 1500 9,559,500.00Bad 565 10000 5,650,000.00Total 6938 3,909,500.00
563.49EARNINGS PER CUSTOMER
REGRESSION ANALYSISDECISION TREE ANALYSIS
436.87
563.49
CONCLUSION Decision tree model is more credit worthy Most significant factor to consider is credit score Regression analysis shows more relative total earnings Decision analysis shows more earnings per customer