Parsimonious Parameterization of Age-Period-Cohort Models ... · Parsimonious Parameterization of...

Parsimonious Parameterization ofAge-Period-Cohort Models by Bayesian Shrinkage

Gary Venter1 & Sule Sahin2

1 Columbia University, USA - University of New South Wales, Australia2 University of Liverpool, UK - Hacettepe University, Turkey

European Actuarial Journal Conference 2018,Leuven

9-11 September, 2018

Sule Sahin Parsimonious Parameterization of APC 9-11 September, 2018 1 / 27

Outline

1 Age-Period-Cohort (APC) models - “the Hunt and Blake model”

2 Constraints3 MCMC estimation and APC models

PriorsComparing models

4 U.S. Male mortality5 Fits

Optimizing shrinkageParametersGoodness of fit

6 Conclusions


Age-Period-Cohort (APC) models

Age-period-cohort (APC) models model an array of data byassigning parameters to each row, column and diagonal of the array.

Actuaries use these models for mortality, and also for emergence ofclaims liabilities, by age and year.

Modelled arrays have observations for

age - columnperiod refers to the observation time - diagonalscohort (year of origin) is the year of birth in mortality - row

We denote cohorts as n = 1, ...,N and years as u = 0, ...,U ⇒periods of n + u = 1, ...,N + U



y[n,u] log of the mortality rates in the (n,u) cell.

Mortality models:

Lee-Carter (1992) models mortality by age with the q parameters andperiod, by r , but not cohort:

y[n,u] = q[u] + r[n + u]s[u] + ε[n,u]

Barnett and Zehnwirth (2000) discuss an APC model for claims,and recommend parameter reduction when using it, with cohortparameters p[n]:

y[n,u] = p[n] + q[u] + r[n + u] + ε[n,u]

Renshaw and Haberman (2006) add age weights to both theperiod and cohort effects for mortality modelling:

y[n,u] = p[n]w[u] + q[u] + r[n + u]s[u] + ε[n,u]



Cairns et al. (2009) finds problems with applying age weights to thecohorts.

Xu et al. (2015) fit age weights separetly by cohort and founddifferent age sensitivity in different cohorts.

We ended up leaving out the age weights for cohorts, but used themulti-trend model described in Hunt and Blake (2014):

y[n,u] = p[n] + q[u] +∑

i

ri[n + u]si[u] + ε[n,u]



The original model with a single trend assumes that the changescausing that trend, such as advances in medical treatments, alwaysaffect the various ages in the same way although to differentdegrees.

However, in some cases, they differ more broadly - e.g.Modernization of more primitive societies.

The additional trends allow different age take-up rates for thedifferent trends at different times - e.g. HIV epidemic.


Constraints

There is no long-term average trend across cohorts ⇒ the long-termtrend is represented by the period parameters only ⇒ the cohortparameters represent relative difference among the cohorts given thatthe overall trend is entirely taken up by the period parameters.

The period trend is the trend for the age or ages with the greatestmortality change over the entire period.

All these parameters are estimated simulataneously to find thecombination of parameters that best fits the data under the specifiedmodel


Constraints

The advantage of this particular choice of constraints is that eachparameter group can be interpreted as having a precise meaning.

The period parameters r[n + u] are the by-period cumulative changesin mortality for the ages with the highest trend, given that there is notrend in the cohorts.The cohort parameters p[n] are the relative cohort effects given thatthe overall trend is all in the period parameters.The base mortality curve q[u] represents the starting mortality ratefor each age estimated across all the observations, given that thetrends and weights s[u] are as estimated.


MCMC Estimation of the APC Models

MCMC is a method to simulate (Monte Carlo) a sequence ofsamples from a probability distribution, where each sample isgenerated based on only the immediately previous sample (MarkovChain).

Its main application is generating samples from the posteriordistribution of a parameter vector θ when only the prior andconditional distributions are known.

Thus, it provides a way to do Bayesian estimation without beingable to specify conjugate priors, or in fact any specific form of theposterior distribution.



Bayes theorem

p(θ |X ) =p(X | θ)p(θ)

p(X )



Bayes theorem

p(θ |X ) =p(X | θ)p(θ)

p(X )

Posterior of theparameters giventhe data



Bayes theorem

p(θ |X ) =p(X | θ)p(θ)

p(X )


The likelihoodtimes the prior



Bayes theorem

p(θ |X ) =p(X | θ)p(θ)

p(X )


The likelihoodtimes the prior

a constant for agiven dataset



Some recent papers, although they do not aim at regularization,apply MCMC to APC models → Antonio et al. (2015), Chung Funget al. (2016).

We estimate all the p, q, r , s parameters on linear splines - that is,line segments.

The slope changes at each point are the underlying parameters ofthe model - second differences.

The level parameters are cumulative sums of the previous slopesand the slopes are the cumulative sums of the previous slopechanges.

The changes in slope at each period are the parameters that shrunktoward zero.

To model this, each slope change is given a Laplace (doubleexponential) prior with small variance.


MCMC Estimation - Priors

Priors can be used to constrain the parameters or to make some setsmore likely.

Wide priors just to get the process going, except for the mean-zeropriors for shrinking.Mean-zero priors in three varities: light, medium and heavy-tailed.

1 The normal distribution is light-tailed.2 The double exponential is medium-tailed.3 The Cauchy distribution (t-distribution with one degree of freedom)

is heavy-tailed.

We use the double exponential shrinkage prior:

x > 0 : f (x) = e−x/b/2b

x < 0 : f (x) = ex/b/2b

The b parameter produces the standard deviation of the doubleexponential distribution (an optimal value around 0.04 for slopechanges in the U.S. male mortality model).


MCMC Estimation - Comparing Models

MCMC does not maximize the likelihood - rather it generates theposterior distribution of the parameters.

Cross validation −→ a model evaluation method

Leave-one-out cross validation - loo −→ each point is left out, oneat a time, and the resulting model is tested on that point.

Pareto-smoothed importance sampling −→ addresses theinstability of the estimates −→ takes the output of an MCMC runand estimates what the likelihood would be for a data point from theparameters fit by leaving it out.

elpdloo: expected log pointwise predictive density −→ true

elpd loo is the expected value of the sum of the log of the probabilitydensities for a new dataset not used in the fitting, where theexpectation is over the actual distribution, not the fitted.


MCMC Estimation - Comparing Models

elpdloo is a penalised log-likelihood measure that is arguablypreferable to AIC ,BIC , etc ., particularly in shrinkage estimationwhere the parameter count is problematic.

It can be used to determine the degree of shrinkage of aregularized model estimated by MCMC, which is a significantadvantage to using Bayesian shrinkage instead of the classicalversions.

Degree of shrinkage −→ i.e. the b in the Laplace prior, thatmaximizes elpd loo −→ increasing b from a low starting value

gradually increases elpd loo up to a point, then increases both the loopenalty and the log-likelihood in step for considerably higher values ofb, leaving elpd loo relatively stable around its maximum.


U.S. Male Mortality

U.S. male mortality rates for ages 30-89, cohorts 1891-1975 andyears 1970-2014 (Human Mortality Database)

Figure 1 - Cumulative changes in mortality by decade (20s, 30s,40s,...80s) for 1970-2014.

Does not look much like a single trend that the different agesparticipate in to various degrees due to

slowing or even reversal of the downward trend from 1985-1995 forages under 50 - HIV and the drug wars.flattening out of the downward trend starting in about 1997 for agesunder 60, but less so for the 40s age groups.80s ages have a much lower trend than the other ages up until about2002.

These are challenging aspects for age-period-cohort models

However with several trends, each with their own weights, it may bepossible to model much of the data.


U.S. Male Mortality

Figure : Cumulative trends in mortality rate by age group 1970-2014


U.S. Male Mortality

Four key trends:1 A base trend that applies to all ages, which get their own weights.2 A trend from 1985 to 1995 forced to be non-negative, so towards

higher mortality, for which ages get a possibly different set of weights.3 A trend, also forced to be positive, for ages in the 80s from 1975

until 2007, with all those ages getting the same weights.4 Another positive trend starting in 1997 which all ages under 80

participate in with their own weights.

20s ages have not been included in the model due to the volatilityand differing trends.


U.S. Male Mortality - Parameters

84 cohort parameters for 1892-1975 with 1891 getting zero initially

59 age parameters for 31-89 with age 30 getting zero

44 parameters for trend for 1971-2014 with 1970 getting zero

Each age gets a weight, so there are 60 weight parameters for trend

11 parameters for HIV trend for 1985-1995

To these are applied age factors for ages 30-79, so 50 moreparameters

33 parameters for the additional trend for ages in the 80s from 1975to 2007

18 parameters for the extra trend from 1997

It applies to ages 30-79, taking 50 more parameters

+ b+constant = 411 parameters that gets priors.


Fits: Optimizing Shrinkage

elpdloo is also a correction of the log-likelihood for in-sample bias.

R package loo is used to penalize the log-likelihood for in-samplebias.

p− loo - used as an estimate of the parameter count, to decide thereduction from the original 411 parameters the shrinkage hasproduced.

Multiple σ b count elpd0.25 0.0196 0.005 175.1 6580.52.0 0.0160 0.032 282.0 6885.52.5 0.0158 0.040 285.2 6892.13.5 0.0159 0.056 290.1 6885.65.0 0.0155 0.078 304.1 6885.2

Cauchy 0.0161 239.4 6873.2

Table : The elpd loo measure and p − loo count by Laplace b, itself modeledas selected multiple of residual standard deviation σ


Fits: Parameters

Figure : Final Level Parameters


Fits: Parameters



Fits: Goodness of Fit

Figure : Cumulative Trend by Age Group: 30s, 40s, 50s. Solid: Actual; Dash:Best Fit; Dots: 0.25 Fit.



Figure : Cumulative Trend by Age Group: 60s, 70s, 80s. Solid: Actual; Dash:Best Fit; Dots: 0.25 Fit.



Figure : Loglikelihood by Data Point.


Conclusions

APC modeling appears to be an area where regularization canprovide better fits than MLE, if measured by penalizing thelog-likelihood for over-fitting by the use of cross-validation.

Shrinkage facilitates simultaneous estimation of several trends in theHunt and Blake model

Laplace b parameters greater than 0.04 produce higherlog-likelihoods

We used linear splines but cubic splines could be formulated andcompared by the elpdloo measure.

Parameter constraints are critical for getting convergence to asingle model - the main estimation risk is the existence of localmaxima which can be avoided by the choice of more specific priors.

The Cauchy prior seems worthy of future investigation.


References

Antonio, K., Bardoutsos, A. and Ouburg, W. (2015) Bayesian poisson log-biliniear modelsfor mortality projections with multiple populations. European Actuarial Journal, 5,245-281.

Barnett, G. and Zehnwirth, B. (2000) Best estimates for reserves, PCAS, 87, 245-303.

Cairns, A., Blake, D., Dowd, K., Coughlan, G. Epstein, D., Ong, A. and Balevich, I.(2009) A quantitative comparison of stochastic mortality models using data from Englandand Wales and the United States. North American Actuarial Journal,13, 1-35.

Chung Fung, M., Peters, G. and Shevchenko, P. (2016) A unified approach to mortalitymodelling using state-space framework: characterisation, identification, estimation andforecasting. Working Paper. arXiv:1605.09484v1.

Hunt, A. and Blake, D., (2014) A general procedure for constructing mortality models.North American Actuarial Journal, 18, 116-138.

Lee, R. D. and Carter, L. R. (1992) Modeling and forecasting U.S. mortality. Journal ofthe American Statistical Association, 87, 659-675.

Renshaw, A.E. and Haberman, S. (2006) A cohort-based extension to the Lee-Cartermodel for mortality reduction factors. Insurance: Mathematics and Economics, 38,556-570.

Venter, G. G. (1983)transformed beta and gamma distributions and aggregate losses.Proceedings of the Casualty Actuarial Society, LXX, 156-193.

Xu, Y., Sherris, M. and Ziveyi, J. (2015) The application of affine processes inmulti-cohort mortality model. University of New South Wales Business School ResearchPaper No. 2015ACTL13.


Parsimonious Parameterization of Age-Period-Cohort Models ... · Parsimonious Parameterization of...

Documents

Transcript of Parsimonious Parameterization of Age-Period-Cohort Models ... · Parsimonious Parameterization of...