Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

19
Modeling Challenges for insurance pricing Xavier Conort Chief Data Scientist - DataRobot Tuesday, February 23, 2016 @

Transcript of Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

Page 1: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

Modeling Challenges for insurance pricing

Xavier ConortChief Data Scientist - DataRobot

Tuesday, February 23, 2016 @

Page 2: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

Agenda● Preamble

● Unfriendly distribution shape of claims cost

● Regulation or operational constraints

● Need to predict the future!

● Claims that are Incurred But Not Reported or Not

Enough Reported (IBNR, IBNER)

Page 3: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

Automation is integral part of human civilization

Page 4: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

● Ivy league approach - only for the chosen ones

● Focused on activities - detached from outcomes

● Assumption based: model selection is based on

modeler‘s understanding of the world?

● Development is costly and limited

● Heavy dependence on programmers

Next generation tools, platforms, approaches to data science

Traditional Approach Data Science Approach

● Common man approach - for everyone

● Focused on business outcome

● Validation based: model selected if it predicts well in

real world

● Development is crowd sourced, peer reviewed

● Automated solutions are taking care of programming

open source programming

social network of coders

automated solutions

Page 5: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

“90% of the data in the world today has been created in the

last two years alone”

and some companies have been very successful in using data thanks to data science

better service

newerproduct

improved operations

Page 6: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

SO WHY THE ROBOT HAS NOT REPLACED THE ACTUARY YET ?

Page 7: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

No Open Data => Slower innovation

Page 8: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

Unfriendly distribution shape

● Low Claim frequency● Claim severity follows a skewed distribution

with sometimes a large tail● And ...

○ Discontinuities caused by policy limits○ Environmental and operational changes leading to distributions

constantly changing over time○ Heterogeneity of risk within risk pools, caused by fraud and

imperfect measures of risk exposure○ ...

Page 9: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

[email protected] | @DataRobot | DataRobot, INC., CONFIDENTIAL

OUR APPROACH TO WIN 1st PLACE IN:

High severity (presence of large claims)

Low frequency (0.26% of claims event) Many Features

Censor large claims Downsample majority events Get rid of noise

Let DataRobot explore the best transformations and the Machine Learning Algorithms for the data

Experiment other models in R

Page 10: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

A more actuarial approach● Modeling per risk type (bodily injury, damage, 3rd party...)

● Censoring large is well accepted but don’t forget to reallocate the cost

of large claims if you use your model for pricing

● Log transformation and downsampling are less a practise. If you do

this, don’t forget to adjust the bias

● Actuaries will mostly use GLMs (Generalized Linear Models with Poisson, Gamma

and Tweedie distributions with a “log” link) and do either:○ frequency modeling x severity modeling but be aware this makes the strong

assumption that frequency and severity are independent. ○ or cost modeling directly

Page 11: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

Can ML algos support Poisson, Gamma and Tweedie loss functions?● Yes! As an example, Regularized GLMs and Gradient Boosting Machine

(GBMs) can support any exponential distributions● But … open source implementation rarely support Gamma and

Tweedie loss functions…● Good news!

○ Poisson loss function is supported by XGboost, R gbm and R glmnet

○ open source algorithms are open! so you can patch them. In some cases, only a few lines of code are necessary

Page 12: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

Regulation or operational constraints

Regulation or operational constraints might force you

to keep control of the model built or/and keep it simple

➢ Example of sum insured: The premium should be monotonically increasing with sum

insured, otherwise people will just purchase more cover and pay less...

➢ Many insurance companies use pricing tables

❖ use ML for feature selection and get insight on non linear relationship and interaction. And

then integrate this insight into your GLMs where you have full control

❖ eliminate undesirable features

❖ patch ML algos to add monotonicity (R gbm already does it!) or interaction constraints

Page 13: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

Predict the future

sum insured history

decision tree prediction

actual risk

new values

Models in general are not very good in predicting the future

2 things to keep in mind when you are using GBMs or RFs or bin the

continuous variables...:

● decision trees don’t extrapolate new values. They won't predict higher

claim sizes for sums insured higher than history

● Machine can be naive and lazy○ see 2 examples in next slide

Page 14: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

Naive Machine● one example is GE Flight Quest. To win the competition, my I2R

colleagues and I censored the info of the name of the airports to force

Gradient Boosting Machine○ to learn the reasons of the delay

○ and not to learn that one airport never had delays in the past 3 months and conclude

it will never have delays in the future

● A real life example of this in insurance is when an actuary uses the policy

number as one of the features to predict the claims cost. A naive

predictive model could use policy number as a proxy for both inflationary

effects and tenure effects.

Page 15: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

BacktestingTo get good prediction models, you need to be fully aware of the model

limitations and effects changing over time.

To get this insight, Backtesting is an important step

Backtesting is the process of applying an or analytical method to historical data to

see how accurately the method would have predicted actual results.

It should help you uncover underperformance due to poor modeling or

environmental and operational changes.

Machine Learning can be used to automate this experience analysis. And

this time, there is no regulation or operational constraints!

Page 16: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

IBNR and IBNERLong claim developments make the modeling process harder

● It can take a long time to know an event happened and even longer to know the final cost.

The life cycle of a claim typically includes event occurrence, reporting, initial estimation,

case estimate review, payment, recoveries... and the reported losses are usually uncertain

● Risks with such long claims are called long-tail risks. Personal injury compensation schemes

(workers’ compensation and motor accident insurance) are typically long-tail risks.

Advices:

● first focus on short tail risks

● recruit an experienced actuary

● or more exciting! estimate IBNER on individual claims using machine learning

Page 17: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

Key takeaways● Modeling insurance pricing is not easy

● Machine Learning can definitely help in the modeling process

● Open source solutions are not designed for insurance pricing and

innovation will be slower as insurance is not an open world

● Plenty of other exciting modeling projects in insurance:○ Scan through millions of potential consumers to choose the right few○ Predict lapses and build price elasticity models○ Select top 10% of your selected risk to manually review / inspect further○ Identify claimants with highest likelihood of being fraud and review them manually○ Text mine the beneficiary clause of life insurance contracts○ Estimate which claims are likely to become problem claims that require special attention○ Decide who should handle a claim or to pay it without further checks○ Geographic features / spacial smoothing○ Detect changes in mix of business○ ….

Page 18: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing

Automation is inevitable

The Economist, May 2015

Harvard Business Review,

June 2015

Page 19: Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing