Predictive Modeling CAS Reinsurance Seminar May 7, 2007
description
Transcript of Predictive Modeling CAS Reinsurance Seminar May 7, 2007
![Page 1: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/1.jpg)
Predictive Modeling CAS Reinsurance SeminarMay 7, 2007
Louise Francis, FCAS, MAAA
Francis Analytics and Actuarial Data Mining, Inc.
Actuarial Data Mining Services
Francis Analytics
www.data-mines.com
![Page 2: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/2.jpg)
2Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Why Predictive Modeling?
• Better use of data than traditional methods
• Advanced methods for dealing with messy data now available
![Page 3: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/3.jpg)
3Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Data Mining Goes Prime Time
![Page 4: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/4.jpg)
4Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Becoming A Popular Tool In All Industries
![Page 5: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/5.jpg)
5Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Real Life Insurance Application – The “Boris Gang”
![Page 6: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/6.jpg)
6Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Predictive Modeling Family
Predictive Modeling
Classical Linear Models GLMs Data Mining
![Page 7: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/7.jpg)
8Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Data Quality: A Data Mining Problem
• Actuary reviewing a database
![Page 8: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/8.jpg)
10Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
A Problem: Nonlinear Functions
An Insurance Nonlinear Function:Provider Bill vs. Probability of Independent Medical Exam
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100
200
275
363
450
560
683
821
989
1195
1450
1805
2540
11368Provider 2 Bill
0.30
0.40
0.50
0.60
0.70
0.80
0.90
Valu
e P
ro
b IM
E
![Page 9: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/9.jpg)
11Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis AnalyticsClassical Statistics: Regression
• Estimation of parameters: Fit line that minimizes deviation between actual and fitted values
Workers Comp Severity Trend
$-
$2,000
$4,000
$6,000
$8,000
$10,000
1990 1992 1994 1996 1998 2000 2002 2004
Year
Severi
ty
Severity Fitted Y
2min( ( ) )iY Y
![Page 10: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/10.jpg)
13Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis AnalyticsGeneralized Linear ModelsCommon Links for GLMs
Y
1
)1
ln(Y
Y
CDF normal thedenotes ),( Y
The identity link: h(Y) = Y
The log link: h(Y) = ln(Y)
The inverse link: h(Y) =
The logit link: h(Y) =
The probit link: h(Y) =
![Page 11: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/11.jpg)
14Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Major Kinds of Data Mining
• Supervised learning– Most common
situation– A dependent variable
• Frequency• Loss ratio• Fraud/no fraud
– Some methods• Regression• CART• Some neural
networks
• Unsupervised learning
– No dependent variable
– Group like records together
• A group of claims with similar characteristics might be more likely to be fraudulent
• Ex: Territory assignment, Text Mining
– Some methods
• Association rules
• K-means clustering
• Kohonen neural networks
![Page 12: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/12.jpg)
15Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Desirable Features of a Data Mining Method
• Any nonlinear relationship can be approximated
• A method that works when the form of the nonlinearity is unknown
• The effect of interactions can be easily determined and incorporated into the model
• The method generalizes well on out-of sample data
![Page 13: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/13.jpg)
16Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
The Fraud Surrogates used as Dependent Variables
• Independent Medical Exam (IME) requested
• Special Investigation Unit (SIU) referral– (IME successful)– (SIU successful)
• Data: Detailed Auto Injury Claim Database for Massachusetts
• Accident Years (1995-1997)
![Page 14: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/14.jpg)
17Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Predictor Variables
• Claim file variables– Provider bill, Provider type– Injury
• Derived from claim file variables– Attorneys per zip code– Docs per zip code
• Using external data– Average household income– Households per zip
![Page 15: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/15.jpg)
18Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Different Kinds of Decision Trees
• Single Trees (CART, CHAID)
• Ensemble Trees, a more recent development (TREENET, RANDOM FOREST)
– A composite or weighted average of many trees (perhaps 100 or more)
![Page 16: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/16.jpg)
19Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Non Tree Methods
• MARS – Multivariate Adaptive Regression Splines
• Neural Networks
• Naïve Bayes (Baseline)
• Logistic Regression (Baseline)
![Page 17: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/17.jpg)
21Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Classification and Regression Trees (CART)
• Tree Splits are binary
• If the variable is numeric, split is based on R2 or sum or mean squared error
– For any variable, choose the two way split of data that reduces the mse the most
– Do for all independent variables
– Choose the variable that reduces the squared errors the most
• When dependent is categorical, other goodness of fit measures (gini index, deviance) are used
![Page 18: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/18.jpg)
22Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis AnalyticsCART – Example of 1st split on Provider 2 Bill, With Paid as Dependent
• For the entire database, total squared deviation of paid losses around the predicted value (i.e., the mean) is 4.95x1013. The SSE declines to 4.66x1013 after the data are partitioned using $5,021 as the cutpoint.
• Any other partition of the provider bill produces a larger SSE than 4.66x1013. For instance, if a cutpoint of $10,000 is selected, the SSE is 4.76*1013.
1st Split
All Data
Mean = 11,224
Bill < 5,021
Mean = 10,770
Bill>= 5,021
Mean = 59,250
![Page 19: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/19.jpg)
23Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
|mp2.bill<544.5
mp2.bill<3.5 mp2.bill<4055.5
mp2.bill<1443.5 mp2.bill<16659
mp2.bill<5151.5
0.02254 0.04817
0.07767 0.08832
0.11480 0.13330
0.06980
Continue Splitting to get more homogenous groups at terminal nodes
![Page 20: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/20.jpg)
25Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Ensemble Trees: Fit More Than One Tree
• Fit a series of trees
• Each tree added improves the fit of the model
• Average or Sum the results of the fits
• There are many methods to fit the trees and prevent overfitting
•Boosting: Iminer Ensemble and Treenet•Bagging: Random Forest
![Page 21: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/21.jpg)
27Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis AnalyticsTreenet Prediction of IME Requested
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10
0
20
0
27
5
36
3
45
0
56
0
68
3
82
1
98
9
11
95
14
50
18
05
25
40
11
36
8Provider 2 Bill
0.30
0.40
0.50
0.60
0.70
0.80
0.90
Va
lue
Pro
b IM
E
![Page 22: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/22.jpg)
29Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Three Layer Neural Network
Input Layer Hidden Layer Output Layer(Input Data) (Process Data) (Predicted Value)
Neural Networks
=
![Page 23: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/23.jpg)
31Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis AnalyticsNeural Networks
• Also minimizes squared deviation between fitted and actual values
• Can be viewed as a non-parametric, non-linear regression
![Page 24: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/24.jpg)
32Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Hidden Layer of Neural Network(Input Transfer Function)
-1.2 -0.7 -0.2 0.3 0.8
X
0.0
0.2
0.4
0.6
0.8
1.0
Logistic Function for Various Values of w1
w1=-10w1=-5w1=-1w1=1w1=5w1=10
![Page 25: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/25.jpg)
33Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis AnalyticsThe Activation Function (Transfer Function)
• The sigmoid logistic function
YeYf
1
1)(
0 1 1 2 2 ... n nY w w X w X w X
![Page 26: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/26.jpg)
34Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis AnalyticsNeural Network: Provider 2 Bill vs. IME Requested
Privider 2 Bill
Fitte
d N
eu
ral N
et
Pre
dic
tio
n
0 20000 40000 60000 80000 100000
0.0
40
.06
0.0
80
.10
0.1
20
.14
![Page 27: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/27.jpg)
35Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
MARS: Provider 2 Bill vs. IME Requested
0 1000 2000 3000 4000Provider 2 Bill
0.05
0.07
0.09
0.11
0.13
MA
RS
Pre
dic
ted IM
E
![Page 28: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/28.jpg)
36Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
How MARS Fits Nonlinear Function
• MARS fits a piecewise regression– BF1 = max(0, X – 1,401.00)– BF2 = max(0, 1,401.00 - X )– BF3 = max(0, X - 70.00)– Y = 0.336 + .145626E-03 * BF1 - .199072E-03 *
BF2 - .145947E-03 * BF3; BF1 is basis function– BF1, BF2, BF3 are basis functions
• MARS uses statistical optimization to find best basis function(s)
• Basis function similar to dummy variable in regression. Like a combination of a dummy indicator and a linear independent variable
![Page 29: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/29.jpg)
39Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis AnalyticsBaseline Method: Naive Bayes Classifier
• Naive Bayes assumes feature (predictor variables) independence conditional on each category
• Probability that an observation X will have a specific set of values for the independent variables is the product of the conditional probabilities of observing each of the values given target category cj, j=1 to m (m typically 2)
1 2
1 2
( , ... | ) ( | )
where , ... are specific values for the independent variables
n j i i ji
n
P X X X c P X x c
X X X
![Page 30: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/30.jpg)
40Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Naïve Bayes Formula
1, 21, 2
1 2 )
1, 21 2 )
( , ... )( | ... ) (Bayes Rule)
( , ...
( ) ( | )
( | ... )( , ...
j Nj N
n
j i ji
j Nn
p C c X X XP C X X X
P X X X
p C c P X c
P C X X XP X X X
A constant
![Page 31: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/31.jpg)
44Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Advantages/Disadvantages
• Computationally efficient
• Under many circumstances has performed well
• Assumption of conditional independence often does not hold
• Can’t be used for numeric variables
![Page 32: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/32.jpg)
45Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Naïve Bayes Predicted IME vs. Provider 2 Bill
0 97
18
12
65
34
94
33
51
76
01
68
57
69
85
39
39
10
25
11
10
11
99
12
85
13
71
14
65
15
54
16
49
17
45
18
38
19
45
20
50
21
49
22
60
23
80
25
12
26
37
27
60
28
95
30
42
31
96
33
91
35
88
38
05
40
60
43
35
47
05
52
00
59
44
71
26
92
88
13
76
7
Provider 2 Bill
0.060000
0.080000
0.100000
0.120000
0.140000
Me
an
Pro
ba
bil
ity
IM
E
![Page 33: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/33.jpg)
46Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis AnalyticsTrue/False Positives and True/False Negatives
(Type I and Type II Errors) The “Confusion” Matrix
• Choose a “cut point” in the model score.
• Claims > cut point, classify “yes”.Sample Confusion Matrix: Sensitivity and Specificity
Prediction No Yes Row TotalNo 800 200 1,000 Yes 200 400 600 Column Total 1,000 600
True Class
Correct Total Percent CorrectSensitivity 800 1,000 80%Specificity 400 600 67%
![Page 34: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/34.jpg)
47Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
ROC Curves and Area Under the ROC Curve
• Want good performance both on sensitivity and specificity
• Sensitivity and specificity depend on cut points chosen– Choose a series of different cut points, and
compute sensitivity and specificity for each of them
– Graph results• Plot sensitivity vs 1-specifity• Compute an overall measure of “lift”, or
area under the curve
![Page 35: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/35.jpg)
48Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis AnalyticsTREENET ROC Curve – IME Explain AUROC AUROC = 0.701
![Page 36: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/36.jpg)
50Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Ranking of Methods/Software – IME Requested
Method/Software AUROC Lower Bound Upper BoundRandom Forest 0.7030 0.6954 0.7107Treenet 0.7010 0.6935 0.7085MARS 0.6974 0.6897 0.7051SPLUS Neural 0.6961 0.6885 0.7038S-PLUS Tree 0.6881 0.6802 0.6961Logistic 0.6771 0.6695 0.6848Naïve Bayes 0.6763 0.6685 0.6841SPSS Exhaustive CHAID 0.6730 0.6660 0.6820CART Tree 0.6694 0.6613 0.6775Iminer Neural 0.6681 0.6604 0.6759Iminer Ensemble 0.6491 0.6408 0.6573Iminer Tree 0.6286 0.6199 0.6372
![Page 37: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/37.jpg)
51Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
Some Software Packages That Can be Used
Excel Access Free Software
R Web based software
S-Plus (similar to commercial version of R) SPSS CART/MARS Data Mining suites – (SAS Enterprise Miner/SPSS
Clementine)
![Page 38: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/38.jpg)
52Francis Analytics www.data-mines.com
Actuarial Data Mining Services
Francis Analytics
References
• Derrig, R., Francis, L., “Distinguishing the Forest from the Trees: A Comparison of Tree Based Data Mining Methods”, CAS Winter Forum, March 2006, WWW.casact.org
• Derrig, R., Francis, L., “A Comparison of Methods for Predicting Fraud ”,Risk Theory Seminar, April 2006
• Francis, L., “Taming Text: An Introduction to Text Mining”, CAS Winter Forum, March 2006, WWW.casact.org
• Francis, L.A., Neural Networks Demystified, Casualty Actuarial Society Forum, Winter, pp. 254-319, 2001.
• Francis, L.A., Martian Chronicles: Is MARS better than Neural Networks? Casualty Actuarial Society Forum, Winter, pp. 253-320, 2003b.
• Dahr, V, Seven Methods for Transforming Corporate into Business Intelligence, Prentice Hall, 1997
• The web site WWW.data-mines.com has some tutorials and presentations
![Page 39: Predictive Modeling CAS Reinsurance Seminar May 7, 2007](https://reader036.fdocuments.net/reader036/viewer/2022062807/56815216550346895dc055b6/html5/thumbnails/39.jpg)
Predictive Modeling CAS Reinsurance SeminarMay, 2006
Actuarial Data Mining Services
Francis Analytics
www.data-mines.com