CAS Predictive Modeling Seminar Evaluating Predictive Models
-
Upload
heulwen-evans -
Category
Documents
-
view
38 -
download
1
description
Transcript of CAS Predictive Modeling Seminar Evaluating Predictive Models
![Page 1: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/1.jpg)
CAS Predictive Modeling Seminar
Evaluating Predictive Models
Glenn MeyersISO Innovative Analytics
October 5, 2006
![Page 2: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/2.jpg)
Choosing Models
• Predicting losses for individual insurance policies involves:– Millions of policy records– Hundreds (or thousands) of variables
• There are a number of models that provide good predictions– GLM, GAM, CART, MARS, Neural Nets, etc.
• Business objectives influence choice of model
![Page 3: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/3.jpg)
The Modeling Process
• Modeling process involves dimension reduction techniques– Clustering, Principal Components, Factor
Analysis– Building submodels and using predicted
values as input into a higher level model
• The modeling cycle– 1. Build model with training data– 2. Evaluate model with test data– 3. Identify improvements in models and data– 4. Go back to Step 1
![Page 4: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/4.jpg)
Hidden Parameters
• Classic model building methods correct for the number of parameters using “degrees of freedom.”
• The model exploration process “eats up degrees of freedom” in ways that cannot be captured by formal model adjustments.
• In essence the “test” data gets merged into the “training” data.
![Page 5: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/5.jpg)
What Is Significant?
• Statistical packages will often identify improvements that are “statistically significant” but not “practically significant.”
• This talk is about determining when a model identifies “practically significant” improvements.
• Illustrate how to do this on a real example.
![Page 6: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/6.jpg)
The ExampleA Personal Auto Model Under Development
Preliminary Results• Input – Address of insured vehicle• Output – Address Specific Loss Cost
– 30 year old, single car with no SDIP points– 500 deductible or 25/50/25 policy limits– Symbol 8, model year 2006– etc.
• Model derived from over 1,200 variables reflecting weather, traffic, demographic, topographical and economic conditions.
![Page 7: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/7.jpg)
Difference Between
Address Specific and ISO Territory Loss Cost
![Page 8: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/8.jpg)
Differences AboundSome Questions to Ask
• Can the model output be used to improve insurer underwriting results?
• Are the results statistically significant?
Define ELI
Address Specific Loss CostExpected Loss Index
ISO Territory Loss Cost
![Page 9: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/9.jpg)
Use Expected Loss Index for Risk Selection
Expected Loss Index Loss Ratio %Less than 75% 69.7Between 75 and 100% 85.8Between 100 and 125% 109.7Greater than 125% 159.5
Denominator = Full ISO Loss Cost
![Page 10: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/10.jpg)
Propose a Standard Way of Evaluating Lift – The Gini Index
• Originally proposed by Corrado Gini in 1912
• Most often used to measure income and/or wealth inequality– Search for “Gini” in wikipedia.org
• In insurance underwriting, we want to evaluate systematic methods of finding “loss” inequality.
![Page 11: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/11.jpg)
Gini Index
• Look at set of policy records below cutoff point, ELI < 1.
• This set of records accounts for 59% of total ISO (full) loss cost.
• This set of records accounts for 48% of total loss.
• 1 − 48/59 → 19% reduction in loss ratio.
![Page 12: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/12.jpg)
Gini Index
• Do this calculation for other cutoff points.
• The results make up the what we call the Lorenz Curve
![Page 13: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/13.jpg)
Gini Index
• If ELI is random, the Lorenz curve will be on the diagonal line.
• The Gini index is the percentage of the area under the “random” line that is above the Lorenz curve.
• Higher Gini means better predictive model.
![Page 14: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/14.jpg)
A Gini Index Thought Experiment
• If we had the ability to predict who will have losses, what would the Gini index be?
• It would be 100% if only one risk had all the losses
![Page 15: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/15.jpg)
Bodily Injury
![Page 16: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/16.jpg)
Property Damage
![Page 17: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/17.jpg)
Collision
![Page 18: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/18.jpg)
Statistical Significance
• How much random fluctuation is in the Gini index calculation?
• Use bootstrapping to evaluate– Take a random sample of records, with
replacement.– Calculate Gini index for the sample.– Repeat 250 times.
• Plot a histogram of the results.
![Page 19: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/19.jpg)
Bootstrap Results
![Page 20: CAS Predictive Modeling Seminar Evaluating Predictive Models](https://reader036.fdocuments.net/reader036/viewer/2022082611/56812c55550346895d90de1e/html5/thumbnails/20.jpg)
Summary
• Standard tests of statistical significance are suspect.
– Informal model selection process– Statistical/Practical significance
• Propose Gini index as a test of practical significance.
• Divide data into three samples1. Training – Used to fit models2. Test – Used to evaluate fits3. Holdout – “Final” evaluation
R2