Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment...
Transcript of Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment...
![Page 1: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/1.jpg)
Introduction to Predictive Modeling
Prepared byLouise Francis
Francis Analytics and Actuarial Data Mining, Inc.June 20, 2005
![Page 2: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/2.jpg)
Modeling 101Objectives
• Gentle introduction to classical statistical models and
• Introduction to some more advanced models• Illustrate some simple applications
• Show examples in commonly available software (see Excel files that accompany slides)
• Discuss practical modeling issues
• Which model(s) to use?
![Page 3: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/3.jpg)
Predictive Modeling Family
Predictive Modeling
Classical Linear Models GLMs Data Mining
![Page 4: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/4.jpg)
Why Predictive Modeling?
n Better use of data than traditional methods
n Advanced methods for dealing with messy data now available
![Page 5: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/5.jpg)
Major Kinds of Modelingn Supervised learning
n Most common situation
n A dependent variablen Frequencyn Loss ration Fraud/no fraud
n Some methodsn Regressionn CARTn Some neural
networks
n Unsupervised learningn No dependent variablen Group like records
togethern A group of claims
with similar characteristics might be more likely to be fraudulent
n Some methodsn Association rulesn K-means clusteringn Kohonen neural
networks
![Page 6: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/6.jpg)
Kinds of Applications
nClassification
nPrediction
![Page 7: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/7.jpg)
Common Example: Fit Severity Trend
n Example data and analyses in Trend Projection Pred Modl.xls
![Page 8: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/8.jpg)
A Brief Introduction to Regression for Prediction
• One of most common statistical methods fits a line to data• Model: Y = a+bx + error • Error assumed to be Normal
Workers Comp Sevirity Trend
$-
$2,000
$4,000
$6,000
$8,000
$10,000
1990 1992 1994 1996 1998 2000 2002 2004
Year
Sev
erit
y
Severity Fitted Y
![Page 9: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/9.jpg)
A Brief Introduction to Regression
• Fits line that minimizes squared deviation between actual and fitted values
• 2min( ( ) )iY Y−∑)
1
2
1
( )( ),
( )
N
i ii
N
ii
Y Y X Xa Y X
X Xβ β=
=
− −= = −
−
∑
∑
![Page 10: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/10.jpg)
Simple Formula for Fitting Line
1
2
1
( )( ),
( )
N
i ii
N
ii
Y Y X Xa Y X
X Xβ β=
=
− −= = −
−
∑
∑
(1) (2) (3) (4) (5) (6) (7) (8)Accident SeverityYear Y Fitted Y X
1991 5,410$ 3,715$ 1 75$ -6 (448.91) 361992 4,868$ 3,985$ 2 (467)$ -5 2,336.63 251993 4,393$ 4,255$ 3 (943)$ -4 3,770.32 161994 4,191$ 4,525$ 4 (1,145)$ -3 3,434.25 91995 3,892$ 4,796$ 5 (1,443)$ -2 2,886.80 41996 3,494$ 5,066$ 6 (1,842)$ -1 1,841.86 11997 4,529$ 5,336$ 7 (806)$ 0 - 01998 4,977$ 5,606$ 8 (358)$ 1 (358.10) 11999 5,453$ 5,876$ 9 117$ 2 234.45 42000 5,727$ 6,146$ 10 391$ 3 1,174.07 92001 6,687$ 6,416$ 11 1,351$ 4 5,405.06 162002 6,885$ 6,686$ 12 1,550$ 5 7,747.87 252003 8,855$ 6,956$ 13 3,520$ 6 21,119.29 36
Sum/Average 5,336$ 7 49,143.61 182.00 B 270.02
3,445.41
Y Y− X X− ( )( )Y Y X X− − 2( )X X−
a Y Xβ= −
![Page 11: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/11.jpg)
Excel Does Regression
• Install Data Analysis Tool Pak (Add In) that comes with Excel
• Click Tools, Data Analysis, Regression
![Page 12: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/12.jpg)
Key Statistic: Residual
The Residual plays a key role in regression evaluation and diagnostics
i
Re
is model estimate for Yi i i
i
sidual Y Y
Y
= −)
)
n2
i i1
Sum Squared Residual
= (Y -Y )
SSR =
∑)
![Page 13: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/13.jpg)
Goodness of Fit Statistics
• R2: (SS Regression/SS Total)• percentage of variance explained
• F statistic: (MS Regression/MS Resid)• significance of regression
• T statistics: Uses SE of coefficient to determine if it is significant • significance of coefficients• It is customary to drop variable if coefficient not
significant
• Note SS = Sum squared of errors
![Page 14: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/14.jpg)
Output of Excel Regression Procedure
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.72R Square 0.52
Adjusted R Square 0.48Standard Error 1052.73Observations 13.00
ANOVAdf SS MS F Significance F
Regression 1 13269748.70 13269748.70 11.97 0.01Residual 11 12190626.36 1108238.76Total 12 25460375.05
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 3445.41 619.37 5.56 0.00 2082.18 4808.64X Variable 1 270.02 78.03 3.46 0.01 98.27 441.77
![Page 15: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/15.jpg)
Assumptions of Regression
• Errors independent of value of X• Errors independent of value of Y• Errors independent of prior errors• Errors are from normal distribution• Linearity• We can test for validity of assumptions
![Page 16: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/16.jpg)
Diagnostics: Residual Plot
• Points should scatter randomly around zero• If not, a straight line probably is not be appropriate
-2000-1500-1000
-5000
5001000150020002500
0 5 10 15
X Variable 1
Res
idu
als
![Page 17: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/17.jpg)
Random Residuals
Random Residuals
(2,000.00)
(1,500.00)
(1,000.00)
(500.00)
-
500.00
1,000.00
1,500.00
2,000.00
2,500.00
0 2 4 6 8 10 12 14
![Page 18: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/18.jpg)
Normal Residual
Distribution for Normal Residual/AA6
Val
ues
in 1
0^ -4
Values in Thousands
0.0000.5001.0001.5002.0002.5003.0003.5004.000
Mean=0.1868576
-4 -3 -2 -1 0 1 2 3 4-4 -3 -2 -1 0 1 2 3 4
5% 90% 5% -1.7336 1.7313
Mean=0.1868576
![Page 19: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/19.jpg)
Other Diagnostics: Normal QQ Plot
• Plot should be a straight line• Otherwise residuals not from normal distribution
Normal Probability Plot
0
2000
4000
6000
8000
10000
0 20 40 60 80 100 120
Sample Percentile
Y
![Page 20: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/20.jpg)
Test for autocorrelated errors
• Autocorrelation often present in time series data
• Durban – Watson statistic:
• If residuals uncorrelated, this is near 2
21
2
2
1
( )N
t ti
N
tt
e eD
e
−=
−
−=
∑
∑
![Page 21: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/21.jpg)
Durban Watson Statistic
• Indicates autocorrelation present
(1) (2) (3) (4)Lag
Residual Residual (e(t)-e(t-1))^2 e(t)^21,251.8 1,566,956.8
371.6 1,251.8 774,732.8 138,080.9 (425.6) 371.6 635,511.3 181,133.0 (527.5) (425.6) 10,380.8 278,238.8 (824.2) (527.5) 88,060.0 679,359.2
(1,087.3) (824.2) 69,210.5 1,182,246.0 (140.7) (1,087.3) 895,981.3 19,810.1
77.4 (140.7) 47,600.9 5,995.1 492.2 77.4 172,030.1 242,253.8 406.3 492.2 7,382.0 165,059.0 624.2 406.3 47,490.8 389,623.5
(199.6) 624.2 678,714.5 39,857.3 (18.5) (199.6) 32,830.5 340.4
Sum 3,459,925.3 4,888,953.9 DW = 0.707702582
![Page 22: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/22.jpg)
Non-Linear Relationships
• The model fit was of the form:• Severity = a + b*Year
• A more common trend model is:• SeverityYear=SeverityYear0*(1+t)(Year-Year0)
• T is the trend rate• This is an exponential trend model• Cannot fit it with a line
![Page 23: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/23.jpg)
Transformation of Variables
• SeverityYear=SeverityYear0*(1+t)(Year-Year0)
1. Log both sides2. ln(SevYear)=ln(SevYear0)+(Year-Year0)*ln(1+t)3. Y = a + x * b
4. A line can be fit to transformed variables where dependent variable is log(Y)
![Page 24: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/24.jpg)
Exponential Trend – Cont.
• R2 declines and residuals indicate poor fit
Plot of Residuals
-0.4
-0.2
0
0.2
0.4
8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9
Predicted Y
Res
idua
l
![Page 25: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/25.jpg)
A More Complex Model
• Use more than one variable in model (Econometric Model)• In this case we use a medical cost index, the consumer price index
and employment data (number employed, unemployment rate, change in number employed, change in UEP rate to predict workers compensation severity
------------------------------------------------------------ Data --------------------------------------------------------------(1) (2) (3) (4) (5) (6) (7) (8)
WC HealthYear Severity Ins Index CPI EmploymentPchangeEmpUEP Rate Cng UEP
1991 5,410 11.7 136.2 117,718 -0.9% 6.8 1.2 1992 4,868 12.7 140.3 118,492 0.7% 7.5 1.1 1993 4,393 13.6 144.5 120,259 1.5% 6.9 0.9
![Page 26: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/26.jpg)
Multivariable Regression
( )
1
12
( )T T
T
T
X X X Y
Y X
Variance X X
β
β
σ
−
−
=
=
=
)
![Page 27: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/27.jpg)
One Approach: Regression With All Variablesn Many variables not significantn Over-parameterization issue n How get best fitting most parsimonious
model?SUMMARY OUTPUT
Regression StatisticsMultiple R 0.98808656R Square 0.97631505Adjusted R Square0.952630099Standard Error 317.0246366Observations 13
ANOVAdf SS MS F Significance F
Regression 6 24857347.33 4142891.2 41.220903 0.000128191Residual 6 603027.7213 100504.62Total 12 25460375.05
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept (46,924.96) 25,285.89 (1.86) 0.113 (108,797.36) 14,947.43 Ins Index 945.52 297.55 3.18 0.019 217.43 1,673.61 CPI (523.23) 93.98 (5.57) 0.001 (753.19) (293.27) Employment 0.92 0.29 3.17 0.019 0.21 1.62 PchangeEmp (42,369.42) 39,355.61 (1.08) 0.323 (138,669.21) 53,930.37 UEP Rate 743.58 760.09 0.98 0.366 (1,116.30) 2,603.46 Cng UEP (633.27) 3,870.51 (0.16) 0.875 (10,104.09) 8,837.54
![Page 28: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/28.jpg)
Multiple Regression Statistics
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.98808656R Square 0.97631505Adjusted R Square0.952630099Standard Error 317.0246366Observations 13
ANOVAdf SS MS F Significance F
Regression 6 24857347.33 4142891.2 41.220903 0.000128191Residual 6 603027.7213 100504.62Total 12 25460375.05
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept (46,924.96) 25,285.89 (1.86) 0.113 (108,797.36) 14,947.43 Ins Index 945.52 297.55 3.18 0.019 217.43 1,673.61 CPI (523.23) 93.98 (5.57) 0.001 (753.19) (293.27) Employment 0.92 0.29 3.17 0.019 0.21 1.62 PchangeEmp (42,369.42) 39,355.61 (1.08) 0.323 (138,669.21) 53,930.37 UEP Rate 743.58 760.09 0.98 0.366 (1,116.30) 2,603.46 Cng UEP (633.27) 3,870.51 (0.16) 0.875 (10,104.09) 8,837.54
![Page 29: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/29.jpg)
Degrees of Freedom
• Related to number of observations• One rule of thumb: subtract the number of
parameters estimated from the number of observations
• The greater the number of parameters estimated, the lower the number of degrees of freedom
![Page 30: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/30.jpg)
Degrees of Freedom
• “Degrees of freedom for a particular sum of squares is the smallest number of terms we need to know in order to find the remaining terms and thereby compute the sum”• Iverson and Norpoth, Analysis of Variance
• We want to keep the df as large as possible to avoid overfitting
• This concept becomes particularly important with complex data mining models
![Page 31: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/31.jpg)
Stepwise Regression
n Partial correlationn Correlation of dependent variable with
predictor after all other variables are in model
n F – contributionn Amount of change in F-statistic when variable
is added to model
![Page 32: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/32.jpg)
Stepwise regression-kinds
nForward stepwisenStart with best one variable
regression and addnBackward stepwisenStart with full regression and
delete variablesnExhaustive
![Page 33: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/33.jpg)
Stepwise regression for Severity Data
Variables Entered/Removeda
Health InsIndex
.
Stepwise(Criteria:Probability-of-F-to-enter<= .050,Probability-of-F-to-remove >= .100).
Change inUEP
.
Stepwise(Criteria:Probability-of-F-to-enter<= .050,Probability-of-F-to-remove >= .100).
1
2
Model
VariablesEntered
VariablesRemoved Method
Dependent Variable: Severitya.
![Page 34: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/34.jpg)
Stepwise Regression-Excluded Variables
Excluded Variablesc
-.787a -1.907 .086 -.517 .113
-.417a -1.353 .206 -.393 .233
-.318a
-2.472 .033 -.616 .984
.243a 1.572 .147 .445 .882
.328a 2.497 .032 .620 .935
-.364a -1.156 .274 -.343 .233
.123a .761 .464 .234 .941
-.555b -1.484 .172 -.443 .103
-.235b -.835 .425 -.268 .210
-.150b
-.408 .693 -.135 .129
.114b .732 .483 .237 .703
-.203b -.726 .486 -.235 .217
.040b .282 .784 .094 .876
CPI
Employment
Pct Change inEmploymeny
Unemployment Rate
Change in UEP
Lag Employment
Lag UEP
CPI
Employment
Pct Change inEmploymeny
Unemployment Rate
Change in UEP
Lag Employment
Lag UEP
1
2
Model
Beta In t Sig.Partial
Correlation Tolerance
CollinearityStatistics
Predictors in the Model: (Constant), Health Ins Indexa.
Predictors in the Model: (Constant), Health Ins Index, Change in UEPb.
Dependent Variable: Severityc.
![Page 35: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/35.jpg)
Econometric Model Assessment
• Standardized residuals more evenly spread around the zero line
• R2 is .91 vs .52 of simple trend regression
Residuals: Econometric Model
-2.5-2
-1.5-1
-0.50
0.51
1.52
3000 4000 5000 6000 7000 8000 9000
Perdicted Severity
Std
Res
idu
al
![Page 36: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/36.jpg)
Correlation of Predictor Variables: Multicollinearity
![Page 37: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/37.jpg)
Multicollinearity
• Predictor variables are assumed uncorrelated
• Assess with correlation matrix
Ins Index CPI Employment PchangeEmp UEP Rate Cng UEPIns Index 1.000 CPI 0.942 1.000 Employment 0.876 0.984 1.000 PchangeEmp (0.125) 0.016 0.092 1.000 UEP Rate (0.344) (0.622) (0.742) (0.419) 1.000 Cng UEP 0.254 0.143 0.077 (0.926) 0.321 1.000
![Page 38: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/38.jpg)
Remedies for Multicollinearity
• Drop one or more of the highly correlated variables
• Use Factor analysis or Principle components to produce a new variable which is a weighted average of the correlated variables
• Use stepwise regression to select variables to include
![Page 39: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/39.jpg)
Exponential Smoothing
• A weighted average with more weight given to more recent values
• Linear Exponential Smoothing: model level and trend
1 (1 )
usually between .05 and .3t t tY Y Yα α
α−= + −
) )
1 1
1 1
(1 )( )
( ) (1 )t t t t
t t t t
m Y m r
r m m r
δ δα α
− −
− −
= + − += − + −
![Page 40: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/40.jpg)
Exponential Smoothing Fit
Exponential Smoothing
0
5000
10000
1 3 5 7 9 11 13
Data Point
Val
ue ActualForecast
![Page 41: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/41.jpg)
Tail Development Factors: Another Regression Application(in WC Claims Pred Seminar.xls)
• Typically involve non-linear functions:• Inverse Power Curve:
• Hoerel Curve:• Probability distribution
such as Gamma, Lognormal
1( )t a
kLDF
t c= +
+
1 exp( )tLDF Kt tα β−= −
( 1)( )t
F tLDF
F t+
=
![Page 42: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/42.jpg)
Example: Inverse Power Curve
• Can use transformation of variables to fit simplified model: LDF=1+k/ta
• ln(LDF-1) =a+b*ln(1/t)
• Use nonlinear regression to solve for a and c• Uses numerical algorithms, such as gradient
descent to solve for parameters.• Most statistics packages let you do this
![Page 43: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/43.jpg)
Nonlinear Regression: Grid Search Method• Try out a number of different values for
parameters and pick the ones which minimize a goodness of fit statistic
• You can use the Data Table capability of Excel to do this• Use regression functions linest and intercept
to get k and a• Try out different values for c until you find the
best one
![Page 44: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/44.jpg)
Fitting non-linear function
c= -5
Age WC LDF ln(ldf-1) ln(1/(t+c) Fitted Error2
12 1.502 -0.68916 -1.94591 1.511531 9.08E-0524 1.148 -1.91054 -2.94444 1.117107 0.00095436 1.063 -2.76462 -3.43399 1.056842 3.79E-0548 1.028 -3.57555 -3.7612 1.035063 4.99E-0560 1.017 -4.07454 -4.00733 1.024379 5.45E-0572 1.013 -4.34281 -4.20469 1.018217 2.72E-0584 1.019 -3.96332 -4.36945 1.014283 2.23E-0596 1.014 -4.2687 -4.51086 1.011592 5.8E-06
108 1.011 -4.50986 -4.63473 1.009654 1.81E-060.001245
LDFs from www.njcrib.org
Coefficient 1.476494 LinestConstant 2.202778 Intercept
![Page 45: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/45.jpg)
Using Data Tables in Excel
c 0.001245 error-7 0.00462-5 0.00124-3 0.00141-2 0.00219-1 0.003230 0.004461 0.005802 0.007235 0.01168
10 0.01887
![Page 46: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/46.jpg)
Use Model to Compute the Tail(Using Prediction through 50 years of Development)
(1) (2) (3) (4)EXP(a+b*(2))
Age ln(1/(t+c) Prediced Tail Factors Cumulative Tail120 -4.74493 1.0082 1.1054132 -4.84419 1.0071 1.0964144 -4.93447 1.0062 1.0887156 -5.01728 1.0055 1.0820168 -5.09375 1.0049 1.0761180 -5.16479 1.0044 1.0709192 -5.23111 1.0040 1.0661204 -5.2933 1.0037 1.0619216 -5.35186 1.0033 1.0580228 -5.40717 1.0031 1.0545240 -5.45959 1.0029 1.0513252 -5.50939 1.0027 1.0483264 -5.55683 1.0025 1.0455276 -5.60212 1.0023 1.0429
![Page 47: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/47.jpg)
Fitting Non-linear functions
• Another approach is to use a numerical method• Newton-Raphson (one dimension)
• xn+1 = xn – f’(xn)/f’’(xn)• f(xn) is typically a function being maximized or
minimized, such as squared errors• x’s are parameters being estimated
• A multivariate version of Newton-Raphson or other algorithm is available to solve non-linear problems in most statistical software
• In Excel the Solver add-in is used to do this
![Page 48: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/48.jpg)
Claim Count Triangle Model
• Chain ladder is common approach
Workers Compensation Claim Counts LossAT 12 AT 24 AT 36 AT 48 AT 60 Reported Development Ultimate
YEAR MONTHS MONTHS MONTHS MONTHS MONTHS as of 12/31/03 Factor Claims
1998 112 134 136 136 136 136 1.000 136.0 1999 78 106 110 110 110 110 1.000 110.0 2000 68 101 101 101 101 1.000 101.0 2001 101 123 123 123 1.000 123.0 2002 113 124 124 1.013 125.6 2003 114 114 1.266 144.4
YEAR 12-24 24-36 36-48
1998 1.196 1.015 1.0001999 1.359 1.038 1.0002000 1.485 1.000 1.0002001 1.218 1.0002002 1.097
Average 1.271 1.013 1.000Wt Avg 1.246 1.013 1.000Selected 1.250 1.013 1.000Age to Ultimate 1.266 1.013 1.000
![Page 49: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/49.jpg)
Claim Count Development
• Another approach: additive model
,
, = incremental claimsi j j
i j
Y
Y
µ ε= +
•This model is the same as a one factor ANOVA
![Page 50: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/50.jpg)
ANOVA Model for Development
------------Actual Claims----------12 24 36
168 25 1117 33 0102 42 0185 50 3170 0 6171 16 0
![Page 51: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/51.jpg)
ANOVA: Two Groups
• With only two groups we test the significance of the difference between their means
• Many years ago in a statistics course we learned how to use a t-test for this
2 21 2
1 2
/ /s s n s n
y yt
s
= +
−=
&&& &&& 1 2
2 22 2 1 1
1, 2 2
( ) ( )n n
n y y n y yF
s+ −
− + −=
![Page 52: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/52.jpg)
ANOVA: More than 2 Groups
1 2 3
2 2 23 2 2 2 1 1
2, 3 2
( ) ( ) ( )n n n
n y y n y y n y yF
s+ + −
− + − + −=
![Page 53: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/53.jpg)
Correlation Measure: Eta
between
total
SSSS
η =
![Page 54: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/54.jpg)
ANOVA Model for Development
Anova: Single Factor
SUMMARYGroups Count Sum Average Variance
Column 1 6 913 152.17 1150.97Column 2 6 166 27.67 328.27Column 3 6 10 1.67 5.87
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 77653 2 38826.50 78.43 0.00 3.68Within Groups 7425.5 15 495.03
Total 85078.5 17
![Page 55: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/55.jpg)
Regression With Dummy Variables
• Let Devage24=1 if development age = 24 months, 0 otherwise
• Let Devage36=1 if development age = 36 months, 0 otherwise
• Need one less dummy variable than number of ages
![Page 56: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/56.jpg)
Regression with Dummy Variables: Design Matrix
YEAR Age Cumulative Claims Devage24 Devage361998 1 168 168 0 01999 1 117 117 0 02000 1 102 102 0 02001 1 185 185 0 02002 1 170 170 0 02003 1 171 171 0 01997 2 99 25 1 01998 2 201 33 1 01999 2 159 42 1 02000 2 152 50 1 02001 2 185 0 1 02002 2 186 16 1 01995 3 140 1 0 11996 3 121 0 0 11997 3 99 0 0 11998 3 204 3 0 11999 3 165 6 0 12000 3 152 0 0 1
![Page 57: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/57.jpg)
Equivalent Model to ANOVASUMMARY OUTPUT
Regression StatisticsMultiple R 0.955365R Square 0.912722Adjusted R Square 0.901085Standard Error 22.24934Observations 18
ANOVAdf SS MS F Significance F
Regression 2 77653 38826.5 78.43209 1.13971E-08Residual 15 7425.5 495.0333Total 17 85078.5
CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%Intercept 152.1667 9.083257 16.75243 4.04E-11 132.806151 171.5272X Variable 1 -124.5 12.84567 -9.69199 7.53E-08 -151.8799038 -97.1201X Variable 2 -150.5 12.84567 -11.716 5.99E-09 -177.8799038 -123.12Age 2 estimate 27.7Age 3 estimate 1.7
![Page 58: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/58.jpg)
Apply Logarithmic Transformation
• It is reasonable to believe that variance is proportional to expected value
• Claims can only have positive values• If we log the claim values, can’t get a
negative• Regress log(Claims+.001) on dummy
variables or do ANOVA on logged data
![Page 59: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/59.jpg)
Log Regression
YEAR AgeCumulative Claims ln(Claims) DevAge24 Devage361998 1 168 168 5.12397 0 01999 1 117 117 4.762182 0 02000 1 102 102 4.624983 0 02001 1 185 185 5.220361 0 02002 1 170 170 5.135804 0 02003 1 171 171 5.141669 0 01997 2 99 25 3.218916 1 01998 2 201 33 3.496538 1 01999 2 159 42 3.737693 1 02000 2 152 50 3.912043 1 02001 2 185 0 -6.90776 1 02002 2 186 16 2.772651 1 01995 3 140 1 0.001 0 11996 3 121 0 -6.90776 0 11997 3 99 0 -6.90776 0 11998 3 204 3 1.098946 0 11999 3 165 6 1.791926 0 12000 3 152 0 -6.90776 0 1
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.714499R Square 0.510508Adjusted R Square0.445243Standard Error3.509039Observations 18
ANOVAdf SS MS
Regression 2 192.6306 96.31532Residual 15 184.7003 12.31335Total 17 377.3309
CoefficientsStandard Error t StatIntercept 5.001495 1.432559 3.491301X Variable 1 -3.29648 2.025945 -1.62713X Variable 2 -7.97339 2.025945 -3.93564
![Page 60: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/60.jpg)
Poisson Regression
• Log Regression assumption: errors on log scale are from normal distribution.
• But these are claims – Poisson assumption might be reasonable
• Poisson and Normal from more general class of distributions: exponential family of distributions
![Page 61: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/61.jpg)
“Natural” Form of the Exponential Family
( ) ( ){ }( ) ( )
+
−⋅= φ
φθθ
φθ ,exp , ; iiii
ii yca
byyf
![Page 62: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/62.jpg)
Specific Members of the Exponential Family• Normal (Gaussian)
• Poisson
• Negative Binomial
• Gamma
• Inverse Gaussian
![Page 63: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/63.jpg)
Some Other Members of the Exponential Family• Natural Form
• Binomial
• Compound Poisson/Gamma (Tweedie)
• General Form [use ln(y) instead of y]• Lognormal
• Single Parameter Pareto
![Page 64: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/64.jpg)
Poisson Distribution
{ {( )( )
Pr ( )!
Pr ( ) exp(ln( ) )!
Pr ( ) exp( ln ) ln( !)
y
y
bc yy
ob Y y ey
ob Y yy
ob Y y y y
µ
θθ
µ
µ µ
µ µ
−= =
= = −
= = − −14243
•Poisson distribution:
![Page 65: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/65.jpg)
Poisson Distribution
• Natural Form:
{ } ( )
−⋅−
−⋅== ))!/ln((
ln)ln(exp)(Prob φ
φφ
φµµ
yyy
yY
• “Over-dispersed” Poisson allows φ ≠ 1.
• Variance/Mean ratio = φ
Pr ( )!
y
ob Y y ey
µµ −= =•Poisson distribution:
![Page 66: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/66.jpg)
Linear Model vs GLM
• Regression:
• GLM:
2
'
~ (0, )
i i
i
YX
N
µ εµ
ε σ
= += Β
( )
( ) '~ exponential family
h is a link function
iY h
h X
µ ε
µε
= +
= Β
![Page 67: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/67.jpg)
The Link Function
• Like transformation of variables in linear regression• Y=AXB is transformed into a linear model
• log(Y) = log(A) + B*log(X)
• This is similar to having a log link function:• h(Y) = log(Y) • denote h(Y) as n• n = a+bx
![Page 68: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/68.jpg)
Other Link Functions
• Identity• h(Y)=Y
• Inverse• h(Y) = 1/Y
• Logistic• h(Y)=log(y/(1-y))
• Probit• h(Y) =
( ), is Normal DistributionYΦ Φ
![Page 69: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/69.jpg)
The Other Parameters: Poisson Example
{ } ( )Poisson:
ln( ) lnProb( ) exp ln(( / )!)
yY y y y
µ µ φφ
φ φ ⋅ −
= = − ⋅ −
( )( ){ }
( )( )
exponential family
; , exp ,i i ii i i
y bf y c y
a
θ θθ φ φ
φ
⋅ −= +
( ) ( ) exp( )so =ln( )
var( ) ''( ) ( )''( ) is variance function and equals for poisson
( )
with standard Poisson and w are 1, but more about them lat re
E y b
Y b ab f
aw
θ θ µθ µ
θ φθ µ
φφ
φ
= = =
=
=
Link function
![Page 70: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/70.jpg)
LogLikhood for Poisson
1
1
1 11
( ) exp( )!
( ) exp( )!
log( ( )) ln( )
with regression: og( ( )) ln( ... ) ( ... )
y
yiN
i iN
ii
N
i n ni
f yy
L yy
L y y
l L y y a b x b x a b x b x
µµ
µµ
µ µ
=
=
=
= −
= −
= −
= + − +
∏
∑
∑
![Page 71: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/71.jpg)
Estimating Parameters
• As with nonlinear regression, there usually is not a closed form solution for GLMs
• A numerical method used to solve• For some models this could be programmed in Excel
– but statistical software is the usual choice• If you can’t spend money on the software, download
R for free
![Page 72: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/72.jpg)
GLM fit for Poisson Regressionn >devage<-as.factor((AGE)n >claims.glm<-glm(Claims~devage, family=poisson)n >summary(claims.glm)n Call:n glm(formula = Claims ~ devage, family = poisson)n Deviance Residuals: n Min 1Q Median 3Q Max n -10.250 -1.732 -0.500 0.507 10.626 n Coefficients:n Estimate Std. Error z value Pr(>|z|) n (Intercept) 4.73540 0.02825 167.622 < 2e-16 ***n devage2 -0.89595 0.05430 -16.500 < 2e-16 ***n devage3 -4.32994 0.29004 -14.929 < 2e-16 ***n devage4 -6.81484 1.00020 -6.813 9.53e-12 ***n ---n Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 n (Dispersion parameter for poisson family taken to be 1)n Null deviance: 2838.65 on 36 degrees of freedomn Residual deviance: 708.72 on 33 degrees of freedomn AIC: 851.38
![Page 73: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/73.jpg)
Deviance: Testing Fit
• The maximum liklihood achievable is a full model with the actual data, yi, substituted for E(y)
• The liklihood for a given model uses the predicted value for the model in place of E(y) in the liklihood
• Twice the difference between these two quantities is known as the deviance
• For the Normal, this is just the errors• It is used to assess the goodness of fit of GLM models – thus
it functions like residuals for Normal models
![Page 74: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/74.jpg)
A More General Model for Claim Development
,
,
,
or Multiplicative model
is accident year effect, is development age effect
= incremental claims
i j i j
i j i j
i j
i j
Y
Y B
Y
µ µ ε
µ µ ε
µ µ
= + +
= +
![Page 75: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/75.jpg)
Design Matrix: Dev Age and Accident Year Model
IncrementalYEAR Claims DevAge24 Devage36 Devage48 AY94 AY95 AY96 AY97
1993 27 0 0 0 0 0 0 01993 136 0 0 0 0 0 0 01993 1 0 0 0 0 0 0 01993 0 0 0 1 0 0 0 01994 24 0 0 0 1 0 0 01994 118 0 0 0 1 0 0 01994 1 0 0 0 1 0 0 01994 1 0 0 1 1 0 0 01995 116 0 0 0 0 1 0 01995 23 0 0 0 0 1 0 01995 1 0 0 0 0 1 0 01995 0 0 0 1 0 1 0 01996 99 0 0 0 0 0 1 01996 22 0 0 0 0 0 1 01996 0 0 0 0 0 0 1 01996 0 0 0 1 0 0 1 01997 74 0 0 0 0 0 0 11997 25 0 0 0 0 0 0 1
![Page 76: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/76.jpg)
More General GLM development Modeln Deviance Residuals: n Min 1Q Median 3Q Max n -10.5459 -1.4136 -0.4511 0.7035 10.2242 n Coefficients:n Estimate Std. Error z value Pr(>|z|) n (Intercept) 4.731366 0.079903 59.214 < 2e-16 ***n devage2 -0.844529 0.055450 -15.230 < 2e-16 ***n devage3 -4.227461 0.290609 -14.547 < 2e-16 ***n devage4 -6.712368 1.000482 -6.709 1.96e-11 ***n AY1994 -0.130053 0.114200 -1.139 0.254778 n AY1995 -0.158224 0.115066 -1.375 0.169110 n AY1996 -0.304076 0.119841 -2.537 0.011170 * n AY1997 -0.504747 0.127273 -3.966 7.31e-05 ***n AY1998 0.218254 0.104878 2.081 0.037431 * n AY1999 0.006079 0.110263 0.055 0.956033 n AY2000 -0.075986 0.112589 -0.675 0.499742 n AY2001 0.131483 0.107294 1.225 0.220408 n AY2002 0.136874 0.107159 1.277 0.201496 n AY2003 0.410297 0.110600 3.710 0.000207 ***n ---n Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 n (Dispersion parameter for poisson family taken to be 1)n Null deviance: 2838.65 on 36 degrees of freedomn Residual deviance: 619.64 on 23 degrees of freedomn AIC: 782.3
![Page 77: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/77.jpg)
Plot Deviance Residuals to Assess Fit
0 50 100 150
Fitted : devage + AY
-10
-50
510
Dev
ianc
e R
esid
uals
![Page 78: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/78.jpg)
QQ Plots of Residuals
-2 -1 0 1 2
Quantiles of Standard Normal
-50
510
Pea
rson
Res
idua
ls
![Page 79: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/79.jpg)
An Overdispersed Poisson?
• Variance of poisson should be equal to its mean
• If it is greater than that, then overdispersed poisson
• This uses the parameter• It is estimated by evaluating how much
the actual variance exceeds the meanφ
![Page 80: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/80.jpg)
Weighted Regression
• There an additional consideration in the analysis: should the observations be weighted?
• The variability of a particular record will be proportional to exposures
• Thus, a natural weight is exposures
![Page 81: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/81.jpg)
Weighted Regression
• Example:
• Severities more credible if weighted by number of claims they are based on
• Frequencies more credible if weighted by exposures• Weight inversely proportional to variance• Like a regression with # observations equal to number
of claims (policyholders) in each cell• With GLM, specify appropriate weight variable in
software
![Page 82: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/82.jpg)
Weighted GLM of Claim Frequency Development• Weighted by exposures• Adjusted for overdispersion
Quantiles of Standard Normal
Pea
rson
Res
idua
ls
-2 -1 0 1 2
-20
-10
010
2030
40
![Page 83: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/83.jpg)
Simulated Workers Compensation Data• Dependent variable
• Frequency – claims per employee
• Predictor variables• Occupation• Age• Size of company
• Data was created to illustrate commonly encountered data complexities
![Page 84: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/84.jpg)
Interactions
• Interactions occur when the relationship between a dependent and independent variable varies based on a third variable
• For instance suppose the relationship between age and the probability of a workers compensation injury varies by occupation
![Page 85: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/85.jpg)
Example of InteractionFrequency vs Age by Occupation
0 20 40 60 80
0 20 40 60 80
Age
0.04
0.10
0.04
0.10
Fre
qu
ency
Occupation: Occu1 Occupation: Occu2
Occupation: Occu3 Occupation: Occu4
![Page 86: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/86.jpg)
Interactions in Regression
• In regression, interactions are typically handled with interaction or product terms
• Interaction terms are like a combination of dummy variables and linear predictor variables• Let D1 by 1 if the employee is in occupation 1 and 0
otherwise• Let X be the employee’s age• I1 = D1 * X
• I1 is the interaction term
![Page 87: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/87.jpg)
Regression with Interaction Terms
Frequency Occupation Dummy1 Dummy2 Dummy3 Age Age*D1 Age*D1 Age*D15.7% Occu1 1 0 0 18.5 18.5 0 04.4% Occu1 1 0 0 25 25 0 04.4% Occu1 1 0 0 35 35 0 03.9% Occu1 1 0 0 45 45 0 04.0% Occu1 1 0 0 55 55 0 01.9% Occu1 1 0 0 65 65 0 0
13.3% Occu1 1 0 0 75 75 0 00.0% Occu1 1 0 0 85 85 0 03.0% Occu2 0 1 0 18.5 0 18.5 01.7% Occu2 0 1 0 25 0 25 02.5% Occu2 0 1 0 35 0 35 00.8% Occu2 0 1 0 45 0 45 01.3% Occu2 0 1 0 55 0 55 00.0% Occu2 0 1 0 65 0 65 00.0% Occu2 0 1 0 75 0 75 0
13.8% Occu3 0 0 1 18.5 0 0 18.5
![Page 88: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/88.jpg)
Output of Regression with Interactions
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.64659R Square 0.418079Adjusted R Square0.381709Standard Error0.042525Observations 120
ANOVAdf SS MS F Significance F
Regression 7 0.145514 0.020788 11.49515 6.23E-11Residual 112 0.202539 0.001808Total 119 0.348053
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept 0.113244 0.019364 5.848271 5E-08 0.074877 0.151611Dummy1 -0.05146 0.027092 -1.89929 0.060098 -0.10514 0.002224Dummy2 -0.09351 0.027384 -3.41473 0.00089 -0.14777 -0.03925Dummy3 0.010024 0.027763 0.361047 0.718745 -0.04499 0.065033Age -0.00158 0.000369 -4.28458 3.89E-05 -0.00231 -0.00085Age*D1 0.001254 0.000509 2.462199 0.015332 0.000245 0.002263Age*D1 0.001324 0.000521 2.539389 0.012476 0.000291 0.002356Age*D1 0.000879 0.000536 1.638323 0.104161 -0.00018 0.001941
![Page 89: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/89.jpg)
Residual Plot of Interaction Regression• Residuals indicate a pattern• This is a result of non-linear relationships
Age Residual Plot
-0.1-0.05
00.05
0.10.15
0.2
0 20 40 60 80 100
Age
Res
idua
ls
![Page 90: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/90.jpg)
Common Method for Modeling Interactions and Nonlinearities :Regression Trees
|Occupation:abd
Age<30
Occupation:ab
Occupation:b
Occupation:bd
Age<70
Age<50
Age<30
Age<50
0.010 0.050
0.100 0.008
0.040 0.060
0.030
0.100
0.090 0.080
![Page 91: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/91.jpg)
Regression Tree
• Based on sequential splitting of the data• The first split creates the two groups with that
produce the “best” split of the data• R2 or F is typically the goodness of fit test• Regression Trees are essentially
computationally intensive ANOVAs
![Page 92: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/92.jpg)
Regression Tree
Some Combinations of Occupation R2
Occu1 and Occu2 0.226295Occu3 and Occu4 0.249184Occu1 0.009998Occu2 0.044531Occu3 0.401094
![Page 93: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/93.jpg)
First Split
Base Node Frequency = .045
Occu3 Occu1, Occu2, Occu4
Frequency = .04 Frequency = .09
![Page 94: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/94.jpg)
Next 2 Splits
|Occupation:abd
Age<30
Occupation:ab
0.03 0.10
0.02
0.09
![Page 95: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/95.jpg)
Final Tree
|Occupation:abd
Age<30
Occupation:ab
Occupation:bAge<21.75
Occupation:bd
Occupation:b
Age<70
Age<80
Age<80
Age<50
Age<70
Age<60
Age<30
Age<21.75 Age<50
Age<60
Age<700.0100.0500.1000.1000.005
0.007
0.0300.000
0.040
0.0500.060
0.050
0.000
0.1000.1000.090
0.060
0.0900.070
![Page 96: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/96.jpg)
Classification Applications
• Another type of problem:• Which of two groups does a new policyholder
belong to?
• Example: Suppose all records are classified as high frequency or other
• We want to predict which group a new observation belongs to
![Page 97: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/97.jpg)
Revised Frequency Data:Add variable company size
0 20 40 60 80
0 20 40 60 80
Age2
0.01
0.12
0.01
0.12
Fre
qu
ency
Occupation: Occu1 Occupation: Occu2
Occupation: Occu3 Occupation: Occu4
![Page 98: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/98.jpg)
Example: Classify High Frequency Observations
FrequencyHigh/ Low Co Size Occu2 Occu3 Occu4 Age<35
Interact Age
Age* occu2
Age* occu3
Age* occu4
0.4% 0 1 1 0 0 0 0 55 0 00.0% 0 1 1 0 0 0 0 65 0 02.3% 0 1 1 0 0 0 0 75 0 00.0% 0 1 1 0 0 0 0 85 0 0
25.0% 1 1 0 1 0 1 18.5 0 18.5 010.5% 1 1 0 1 0 1 25 0 25 0
2.6% 0 1 0 1 0 1 35 0 35 01.5% 0 1 0 1 0 0 0 0 45 0
![Page 99: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/99.jpg)
Discriminant Analysis
• A classical procedure for classification• Finds the combination of variables which maximizes
the difference between the two groups• Produces a score from predictor variables. Score is
used for classification• Like a regression with a binary dependent variable
![Page 100: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/100.jpg)
Discriminant Analysis cont.
n Similar to a binary regression:n z = a+ b1x1 + b2x2 +…bnxn
n z is a binary dependent variable (i.e., values of 0 and 1)
n x’s are predictor variables
![Page 101: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/101.jpg)
Score Discriminates between the ClassesBox and Whisker Plots of Discriminant Scores by Group
0.00 0.25 0.50 0.75 1.00
Predicted Group for Analysis 1
-2.00000
0.00000
2.00000
4.00000
Dis
crim
inan
t Sco
res
fro
m F
un
ctio
n 1
for
An
alys
is 1
A
AA
A
AAA
![Page 102: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/102.jpg)
Score Discriminates between the Classes
![Page 103: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/103.jpg)
Discriminant Analysis
Standardized CanonicalDiscriminant Function Coefficients
.690
.180
-.109
2.146
2.084
2.745
-2.376
-.029
-1.826
-1.724
Age
Size
Occu2
Occu3
Occu4
Age<35
Age*(Age<35)
Age*occu2
Age*occu3
Age*occu4
1
Functionn
![Page 104: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/104.jpg)
Other Methods for Classification
• Logistic regression• Can use with GLM’s• Uses logit link: ln(p/(1-p))
• This is log of odds ratio
• Assumes binomial distribution
0 1 1ln( ; ) ...1 n n
pB B X B X
p= + + +
−x
![Page 105: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/105.jpg)
Logistic Regression for Classification
# use GLM function to fit logistic regression freq.glm<-glm(sumemp$High~sumemp$Age2+sumemp$Occupation+sumemp$Miles,family=binomial,data=sumemp) # get predicted value predict.logistic<-predict.glm(freq.glm,type=c(”response”),ci.fit=T) plot(Age2,predict.logistic)
20 30 40 50 60 70 80
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Age2
pred
ict.l
ogis
tic
![Page 106: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/106.jpg)
Other Methods for Classification cont.• Decision Trees/CART
• A classification version of trees with binary dependent variable.
• Typically a different goodness of fit measure than R2.
• Support Vector Machines
![Page 107: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/107.jpg)
Which Model to Use?
• Regression with normality assumption is one of GLM options
• Most insurance distributions are skewed• Might want to use GLM with Gamma distribution
• However, if normality is a reasonable assumption, many more diagnostic tools are available for regression
• Lognormal – when data is logged it is normal
• The distributional forms of much of insurance data is more heavy-tailed than any exponential family member
• Robust methods
![Page 108: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/108.jpg)
Which Model to Use?
• The link functions do not capture all the possible non-linear relationships• Non-parametric methods such as regression trees• Kernel regression
• Can use more than one model• This is usually recommended
![Page 109: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/109.jpg)
Which Model to Use?
• Classification• Discriminant analysis is a reasonable method
to start out with• Logistic regression is more frequently used
than discriminant analysis. Its use requires special software (such as R)
• To capture data complexities such as nonlinearities, trees are easy to understand, common method
![Page 110: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/110.jpg)
Introductory Modeling Library Recommendations• Berry, W., Understanding Regression Assumptions, Sage
University Press• Iversen, R. and Norpoth, H., Analysis of Variance, Sage
University Press• Fox, J., Regression Diagnostics, Sage University Press• Chatfield, C., The Analysis of Time Series, Chapman and Hall• Fox, J., An R and S-PLUS Companion to Applied Regression,
Sage Publications• 2004 Casualty Actuarial Discussion Paper Program on
Generalized Linear Models, www.casact.org
![Page 111: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/111.jpg)
Advanced Modeling Library Recommendations
n Berry, Michael J. A., and Linoff, Gordon, Data Mining Techniques, John Wiley and Sons, 1997
• Francis, Louise, 2003, “Martian Chronicles: Is MARS Better than Neural Networks” at www.casact.org/aboutcas/mdiprize.htm
• Francis, Louise, 2001, “Neural Networks Demystified” at www.casact.org/aboutcas/mdiprize.htm
![Page 112: Introduction to Predictive Modeling Modeling.pdf · 2019. 1. 23. · Change in UEP Lag Employment Lag UEP CPI Employment Pct Change in Employmeny Unemployment Rate Change in UEP Lag](https://reader033.fdocuments.net/reader033/viewer/2022051321/5ff86bf36f5c66134434e4bd/html5/thumbnails/112.jpg)