S4C03, HW2: model formulas, contrasts, ggplot2 and basic GLMs
1 Loss reserving with GLMs: a case study Greg Taylor Taylor Fry Consulting Actuaries Melbourne...
-
Upload
rosalind-french -
Category
Documents
-
view
219 -
download
0
Transcript of 1 Loss reserving with GLMs: a case study Greg Taylor Taylor Fry Consulting Actuaries Melbourne...
1
Loss reserving with GLMs: a case study
Greg TaylorTaylor Fry Consulting Actuaries
Melbourne University
University of New South Wales
Gráinne McGuireTaylor Fry Consulting Actuaries
Casualty Actuarial Society, Spring Meeting
Colorado Springs CO, May 16-19 2004
2
Purpose
• Examine loss reserving in relation to a particular data set• How credible are chain ladder reserves?• Are there any identifiable inconsistencies between the data
and the assumptions underlying the chain ladder model?• If so, do they really matter? Or are we just making an
academic mountain out of a molehill?• Can the chain ladder model be conveniently adjusted to
eliminate any such inconsistencies?• If not, what shall we do?
• Lessons learnt from this specific data set intended to be of wider applicability
3
The data set
• Auto Bodily Injury insurance• Compulsory• No coverage of property damage
• Claims data relates to Scheme of insurance for one state of Australia• Pooled data for the entire state
• Scheme of insurance is state regulated but privately underwritten• Access to common law
• But some restriction on payment of plaintiff costs in the case of smaller claims
• Premium rates partially regulated
4
The data set (continued)
• Centralised data base for Scheme
• Current at 30 September 2003
• About 60,000 claims
• Individual claim records• Claim header file
• Date of injury, date of notification, injury type, injury severity, etc
• Transaction file• Paid losses (corrected for wage inflation)
• Case estimate file
5
Starting point for analysis
• Chain ladder
• This paper is not a vendetta against the chain ladder
• However, it is taken as the point of departure because of its• Simplicity
• Wide usage
6
Chain ladder
• First, basic test of chain ladder validity
• Fundamental premise of chain ladder is constancy of expected age-to-age factors from one accident period to another
Payments in respect of settled claims: age-to-age
factors for various averaging periods
1.00
10.00
100.00
1:0 2:1 3:2 4:3 5:4 6:5 7:6 8:7 9:8 10:9
Development quarters
Ag
e-to
-ag
e fa
cto
rs
Last 1 year Last 2 years Last 3 years Last 4 years All years• This data set fails the test comprehensively
7
Chain ladder – does the instability matter?
• Range of variation is 19%
• Omitting just the last quarter’s experience increases loss reserve by 10-15%
Averaging period Loss reserve at 30 Sept 2003 (excl. Sept
2003 accident qr)
$B
All experience quarters 1.61
Last 8 experience quarters 1.68
All experience quarters except Sept 2003 (last diagonal) 1.78
Last 8 experience quarters except Sept 2003 (last diagonal)
1.92
8
Chain ladder – does the instability matter?• Actually, the situation is much worse than this• Effect of September 2003 quarter (last diagonal) on
loss reserve• Due to low age-to-age factors in the quarter• In turn due to low paid losses in the quarter
• Suggests• Not only omitting September 2003 quarter age-to-age
factors from averaging• But also recognising that loss reserve is increased by low
paid loss experience• Estimate loss reserve at 30 June 2003• Deduct paid losses during September 2003 quarter
9
Chain ladder – does the instability matter? • Now 46%
difference between highest estimate and lowest in previous table
• More than an academic molehill
Averaging period Loss reserve at 30 Sept 2003 (excl. Sept 2003
accident qr)Uncorrected Corrected
$B $BAll experience quarters except Sept 2003 (last diagonal)
1.78 1.94
Last 8 experience quarters except Sept 2003 (last diagonal) 1.92 2.35
10
Review basic facts and questions
• We have a model formulated on the assumption of certain stable parameters (expected age-to-age factors)• This assumption seems clearly violated
• Data contain clear trends over time
• Various attempts at correction for this• Including different averaging periods
• Different corrections give widely differing loss reserves• How might one choose the “appropriate” correction
• Omit just last quarter? Last two? …
• Including averaging period• Average last 4 quarters? Last 6? Last 8? …
11
Some responses to the questions
• DO NOT choose an averaging period• It is a statistical fundamental that one does not average in the presence
of trends
• Rather model the trend• This requires an understanding of the mechanics of the
process generating the trend• DO NOT try to use this understanding to assist in the choice
of an averaging period• Rather use it to model the finer structure of the data
• Otherwise the choice of factors is little more than numerology• These comments apply to not only the chain ladder
• But also any “model” that ignores the fine structure of the data in favour of averaging of some broad descriptive statistics
12
Effect on loss data of changes in underlying process
• Consider a 21x21 paid loss triangle from a fairly typical Auto Bodily Injury portfolio• Years numbered 0,1,2,…• Experience of all accident years
identical• Stable age-to-age factors
• Now assume that rates of claim closure (by numbers) increase by 50% in experience years (diagonals) 11-15
• Examine the ratios of “new:old” paid losses• No change = 100%
13
Effect on loss data of changes in underlying process (cont’d)• Now add superimposed inflation of 5% p.a.
to experience years 14-20
14
Effect on loss data of changes in underlying process (cont’d)• Now add a legislative change that reduces claim costs in
accident years 13-20• 50% reduction for the earliest claims settled
• 0% for the last 30% of claims settled
15
Effect on loss data of changes in underlying process (cont’d)• The ratio of modified experience to the norm
(stable age-to-age factors) is now complex
• Age-to-age factors now change in a complex manner• Trends across diagonals
• Further trends across rows
• Contention is that these trends will be identifiable only by means of some form of structured and rigorous multivariate data analysis
16
How might the loss data be modelled?Let
i = accident quarter
j = development quarter (=0,1,2,…)
Fij = incremental count of claims closed
CFij = incremental paid losses in respect
these closures
Sij = CFij / Fij = average size of these
closures
17
How might the loss data be modelled? (cont’d)• Modelling the loss data might consist of:
• Fitting some structured model to the average claim sizes Sij
• Testing the validity of that model
• The use of average claim sizes will make automatic correction for any changes in the rates of claim closure
18
Modelling the loss data
• Very simple model
Sij ~ logN (βj,σ)
Log normal claim sizes depending on development quarter
• Fit model to data using EMBLEM software
19
Dependency of average claim size on development quarter
Linear Predictor
7
8
9
10
11
12
13
14
0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 3 0 3 1 3 2
development quarter
20
Add superimposed inflation
• Define
k = i+j = calendar quarter of closure
• Extend model
Sij ~ logN (βdj+ βf
k,σ)
Log normal claim sizes depending on development quarter and closure quarter (superimposed inflation)
21
Dependency of average claim size on closure quarter • Some upward
trend with closure quarter
• Positive superimposed inflation
Linear Predictor
8.0
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
Mar
-97
Jun-
97
Sep-9
7
Dec-9
7
Mar
-98
Jun-
98
Sep-9
8
Dec-9
8
Mar
-99
Jun-
99
Sep-9
9
Dec-9
9
Mar
-00
Jun-
00
Sep-0
0
Dec-0
0
Mar
-01
Jun-
01
Sep-0
1
Dec-0
1
Mar
-02
Jun-
02
Sep-0
2
Dec-0
2
Mar
-03
Jun-
03
Sep-0
3
finalisation quarter
22
Modelling individual claim data
• We could continue this mode of analysis
• But why model triangulated data?
• We have individual claim data
• More natural to model individual claim sizes
23
Notation for analysis of individual claim sizes• Time variables i, j, k as before• Yr = size of r-th closed claim
• ir, jr, kr are values of i, j, k for r-th closed claim• Also define
tr = operational time for r-th claim = proportion of claims from accident
quarter ir closed before r-th claim• Model
log Yr = fn(ir, jr, kr,tr) + stochastic error
24
Dependency on operational time
• Model
log Yr = fn(ir, jr, kr,tr) + stochastic error
• Specifically
log Yr ~ N(fn(tr), σ)
• Divide range of tr (0-100%) into 2% bands
25
Dependency on operational time
• Dependency close to linear over much of the range of operational time
Linear Predictor
7.5
8.0
8.5
9.0
9.5
10.0
10.5
11.0
11.5
optime
26
Dependency on calendar quarter of closure (superimposed inflation) • Some upward
trend with closure quarter
• Positive superimposed inflation
Linear Predictor
7.90
7.95
8.00
8.05
8.10
8.15
8.20
8.25
8.30
8.35
8.40
Jun-
97
Sep-9
7
Dec-9
7
Mar
-98
Jun-
98
Sep-9
8
Dec-9
8
Mar
-99
Jun-
99
Sep-9
9
Dec-9
9
Mar
-00
Jun-
00
Sep-0
0
Dec-0
0
Mar
-01
Jun-
01
Sep-0
1
Dec-0
1
Mar
-02
Jun-
02
Sep-0
2
Dec-0
2
Mar
-03
Jun-
03
Sep-0
3
finalisation quarter
27
Log normal assumption?
• Examine residuals of log normal model
Pearson Residuals
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
10,000
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5
Fitted Value
Pea
rso
n R
esid
ual
s
Largest 1,000 Pearson Residuals
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5Fitted Value
• Considerable left skewness
28
Alternative error distribution
• Choose shorter tailed distribution from the family underlying GLMs• Exponential dispersion
family
• We choose EDF(2.3)V[Yr] = φ {E [Yr]}2.3
• Longer tailed than gamma
• Shorter than log normal
Studentized Standardized Deviance Residuals
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
-8 -6 -4 -2 0 2 4 6 8
Largest 100 Studentized Standardized Deviance Residuals
-8
-6
-4
-2
0
2
4
6
8
0 20,000 40,000 60,000 80,000 100,000 120,000 140,000 160,000 180,000 200,000
Fitted Value
29
Refining the model of the data
• …and so on
• We continue to refine the model of claim size• Paper
contains detail
• Final model includes following effects• Operational time (smoothed)
• Seasonal
• Superimposed inflation (smoothed)• Different rates at different operational times
• Different rates over different intervals of calendar time
• Accident quarter (legislative) effect• Diminishes with increasing operational time
• Peters out at operational time 35%
30
Final estimate of liability
Averaging period Loss reserve at 30 Sept 2003 (excl. Sept 2003 accident qr)
Uncorrected Corrected
$B $BAll experience quarters 1.61
Last 8 experience quarters 1.68
All experience quarters except Sept 2003 (last diagonal)
1.78 1.94
Last 8 experience quarters except Sept 2003 (last diagonal) 1.92 2.35*
GLM 2.23*
* Quite different distributions over accident years
31
Conclusions
• GLM has successfully modelled a loss experience with considerable complexity
• Simpler model structures, e.g.chain ladder, would have little hope of doing so• Indeed, it is not even clear how one would approach the problem with
these simpler structures
• The GLM achieves much greater parsimony• Chain ladder number of parameters = 73 with no recognition of any trends• GLM number of parameters = 13 with full recognition of trends
• GLM is fully stochastic• Provides a set of diagnostics for comparing candidate models and
validating a selection
• Understanding of the data set• Assists not only reserving but pricing and other decision making