Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the...

23
Regression Model for the Impact of a Data Breach for a Financial Institution Thomas Lee 2 , Jason Hegland 1 , Spencer Graves 2 1. Stanford University Law School, Stanford Security Litigation Analytics, Palo Alto California 2. VivoSecurity Inc., Los Altos CA,

Transcript of Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the...

Page 1: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

Regression Model for the Impact of a Data Breach for a Financial Institution

Thomas Lee2, Jason Hegland1 , Spencer Graves2

1. Stanford University Law School, Stanford Security Litigation Analytics, Palo Alto California 2. VivoSecurity Inc., Los Altos CA,

Page 2: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

2

Introduction

Regression Model for the Impact of a Data Breach

• An impactful data breach is a tail event often addressed in scenario analysis using expert judgment.

• The questioner-studies, which are a fixed amount per record, are often used.

• We have performed a regression analysis to aid expert judgment.

• We find variables considered but eliminated, as well as variables retained provide insights to expert.

Page 3: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

3 Regression Model for the Impact of a Data Breach

Sources & Methods

Source Purpose Advisen • Principle source

• Data breach cost • Lawsuit probability

EDGAR • Research cost in 10-K SEC filings

California Attorney General

• Comparison analysis

Maryland Attorney General

• Comparison analysis. • Estimate total number of

data breaches HHS • Comparison analysis.

Interviews • Challenge assumptions

Tool Purpose SQL Server • Filtering

• Aggregation R-Studio • Modeling Python • Additional variables

Excel • Manual corrections

Summary

Step Purpose

Data Correction • Created independent and meaningful variables

BMA • Screen many variables

LM • Final cost modeling • Establish SE

GLM • Lawsuit Probability

Monte Carlo • Combine cost model with lawsuit probability model

Tools

Process

Sources

Critical

Page 4: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

4 Regression Model for the Impact of a Data Breach

• Impact from breach of PII that triggers legal reporting requirements: • PHI, PII, CHD, PFI

• Does not include: • Impact from intellectual property loss • Denial of services • Ransomware • Data corruption or loss • Fraud

Establish Scope Total Cost of PII Data Breach

Page 5: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

5 Regression Model for the Impact of a Data Breach

Total Cost of PII Data Breach

Investigation

Notification

Call center

Remediation

o Business Loss o Damage to personal credit o Theft of money & goods o Credit card replacement costs

Business loss; theft of money & goods

Credit monitoring & privacy insurance.

Fines & settlements

Public & Other Businesses Breach Company

Mitigate

Transfer via suits

Tota

l cos

ts

Function of Incident Type

Func

tion

of p

eopl

e af

fect

ed

Page 6: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

Regression Model for the Impact of a Data Breach

1

10

100

1000

Num

ber o

f Inc

iden

ts

Breach Size

20082013

No Significant Size vs Frequency Bias Suitability of Data

Characterization of Size vs Frequency, whole United States, based upon Maryland Attorney General

Characterization of Size vs Frequency, for Advisen Data

6

0.01

0.1

1

10

100

Num

ber o

f Inc

iden

ts

Data Breach Size

Chart Title

Page 7: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

7 Regression Model for the Impact of a Data Breach

0

50

100

150

200

250

300

350

400

0

200

400

600

800

1000

1200

1400

1600

1800

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

Inci

dent

s

Law

suits

Year

Incidents

Lawsuits

Unexplained drop-off in incidents

Data breaches are increasing

Could Affect Lawsuit Probability

Suitability of Data Significant drop in Incidents per Year

Maryland Breach Notices

Page 8: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

8 Regression Model for the Impact of a Data Breach

Data Transformation

Best transformed as log

Affected Count

Page 9: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

9 Regression Model for the Impact of a Data Breach

Best transformed as log

Data Transformation Calculated Total

Page 10: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

10 Regression Model for the Impact of a Data Breach

Best transformed as log

Data Transformation Lawsuits

Page 11: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

11 Regression Model for the Impact of a Data Breach

Best Cost Model

𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 = 𝑒𝑒 12−3.6×𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵−1.4×(𝐼𝐼𝐼𝐼𝐵𝐵𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐵𝐼𝐼𝐵𝐵)+0.7×ln 1+𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐼𝐼𝐼𝐼𝐿𝐿 +𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵×ln 𝐴𝐴𝐴𝐴𝐴𝐴𝐵𝐵𝐼𝐼𝐼𝐼2

Variable Description Pr(>|t|)

BrchY Data was breach, leading to 1) notification, 2) call center, 3) privacy insurance, 4) possible lawsuits

4.9e-4

Lawsuits Number of lawsuits filed as a result of the data breach. 5.1e-2

Affctd Number of people affected by the data breach (number of people with exposed PII data)

1.7e-9

IncidentOther Data breach was caused by any the following: Malicious Insider, Lost or Stolen Device or Accident. Data breach was NOT caused by a Malicious Outsider.

6.0e-3

R-Sqrd 0.69

Full Formula

Page 12: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

Forecast Accuracy 10 observations from training set, NAICS=52

12 Regression Model for the Impact of a Data Breach

2x range

Page 13: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

• 19 Observations found after model development • Red bars are median costs, • error bars are 2x range

13 Regression Model for the Impact of a Data Breach

Forecast Accuracy

Page 14: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

• Based upon Variables • Based upon Confidence Interval

Interpretation

14 Regression Model for the Impact of a Data Breach

Page 15: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

15 Regression Model for the Impact of a Data Breach

Condensing of Incident Type

Interpretation

Final Model Initially Modeled

Malicious Outsider Malicious Outsider

Other

Malicious Insider

Lost/Stolen

Accident

Investigation

Notification

Call center

Remediation

o Business Loss o Damage to credit o Theft of money o Card replacement

Business loss; theft of money & goods

Credit monitoring & privacy insurance.

Fines & settlements

Public & Other Breach Company

Mitigate

Transfer via suits

The only costs that would be different

Page 16: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

16 Regression Model for the Impact of a Data Breach

Breached=Y vs Breached=N

Malicious Outsider

Other

Breached=No

Breached=Yes

Same cost relationship between MO and Other suggests investigation cost is an importance difference between MO and Other

Based upon Variables

Interpretation

Investigation

Notification

Call center

Remediation

o Business o Damage o Theft of o Card rep

Business loss; theft of money & goods

Credit monitoring & privacy insurance.

Fines & settlements

Public & O Breach Company

Mitigate

Transfer via suits

Only costs when Breached=No

Page 17: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

17 Regression Model for the Impact of a Data Breach

Overview Breached=Y

𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑀𝑀𝐼𝐼 = 4,500 × 1 + 𝐿𝐿𝐿𝐿𝐿𝐿𝐶𝐶𝐿𝐿𝐿𝐿𝐶𝐶𝐶𝐶 0.7 × 𝐴𝐴𝐴𝐴𝐴𝐴𝑒𝑒𝐴𝐴𝐶𝐶𝑒𝑒𝐴𝐴

𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐼𝐼𝐼𝐼𝐵𝐼𝐼𝐵𝐵 = 1,100 × 1 + 𝐿𝐿𝐿𝐿𝐿𝐿𝐶𝐶𝐿𝐿𝐿𝐿𝐶𝐶𝐶𝐶 0.7 × 𝐴𝐴𝐴𝐴𝐴𝐴𝑒𝑒𝐴𝐴𝐶𝐶𝑒𝑒𝐴𝐴

• Two types of PII breach: • Malicious Outsider • Other

• Malicious Outsider is 4x costlier • Cost increases by sqrt of people

affected • One lawsuit doubles the cost

Interestingly, we see the same bifurcation of incident types when looking at size vs frequency

0.01

0.1

1

10

100

1 2 4 8 16

32

64

128

256

512

1,0

24 2

,048

4,0

96 8

,192

16,

384

32,

768

65,

536

131

,072

262

,144

524

,288

1,0

48,5

76 2

,097

,152

4,1

94,3

04

Num

ber o

f Inc

iden

ts

Number of People Affected

Malicious OutsiderOther

Page 18: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

Based upon Variables

Interpretation

Variable Importance Interpretation, Guide the Expert

Year Small We can draw upon lessons learned from past data breaches.

NAICS=52 Small We can draw upon lessons learned across industries. Look towards costs that are the same across industries: investigation, notification, credit monitoring, call center

Company Demographic

Small Suggests reputation damage may be less important

Data.Breached Large Consistent with Investigation cost being significant when variable is considered with Incident Type,

Data Type Small Suggests damage to customers is small. Look towards costs that are independent of data exposed: investigation, notification, call center.

Incident Type Large Consistent with investigation costs being significant.

Affected Count Large, (Sqrt)

Should not use a constant multiplier per person affected since there is efficiency with scale. Don’t use record count, use people affected. Consistent with Investigation and Notification costs being major costs.

Lawsuits Large Focus on reducing probability of lawsuits for large data breaches. Most costs will be experienced over several years.

18 Regression Model for the Impact of a Data Breach

Investigation

Notification

Call center

Remediation

o Business Loss o Damage to cre o Theft of mone o Card replacem

Business loss; theft of money & goods

Credit monitoring & privacy insurance.

Fines & settlements

Public & Other Breach Company

Mitigate

Transfer via suits

Page 19: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

Based upon Confidence Interval

Interpretation $0

$50,

000

$100

,000

$150

,000

$200

,000

$250

,000

$300

,000

$350

,000

Like

lyho

od o

f Cos

t

Breach Cost Thousands

Median Cost, $23,688,721

80% Confidence, $122,557,359

90% Confidence, $289,370,426

Cost of a Data Breach Affecting 10M People, caused by Malicious Outsider

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 >0 1 2 3

Prob

abili

ty

Number of Lawsuits

Probability of Lawsuits

19 Regression Model for the Impact of a Data Breach

Page 20: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

Based upon Confidence Interval

Interpretation $0

$50,

000

$100

,000

$150

,000

$200

,000

$250

,000

$300

,000

$350

,000

Like

lyho

od o

f Cos

t

Breach Cost Thousands

Median Cost, $23,688,721

80% Confidence, $122,557,359

90% Confidence, $289,370,426

20 Regression Model for the Impact of a Data Breach

0.01

0.1

1

10

Num

ber o

f Dat

a Br

each

es

Number of People Affected

Malicious Outsider, United States, 2015

A data breach affecting 10M People is a rare event

across all industry

Since a data breach affecting 10M people is rare, the 80% Confidence

interval is CRAZY rare

Page 21: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

Based upon Confidence Interval

Interpretation $0

$50,

000

$100

,000

$150

,000

$200

,000

$250

,000

$300

,000

$350

,000

Like

lyho

od o

f Cos

t

Breach Cost Thousands

Median Cost, $23,688,721

80% Confidence, $122,557,359

90% Confidence, $289,370,426

Expert Judgment can select confidence interval based upon the companies incident response plan vis-à-vis industry best practice and industry peers.

Most costs are within the control of the company and managed as part of the incident response plan

21 Regression Model for the Impact of a Data Breach

Investigation

Notification

Call center

Remediation

o Busine o Damag o Theft o o Card re

Business loss; theft of money & goods

Credit monitoring & privacy insurance.

Fines & settlements

Public & Breach Company

Mitigate

Transfer via suits

Most Important

Page 22: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

Based upon Confidence Interval

Interpretation

Incident Response best practice: • All access logs turned on • Access logs saved in a read-only manner • Access logs saved in a uniform format • Supporting policies, procedures, training and records • Tools to aid investigation: Carbon Black, end-point detection and response

technology • Experienced third party evaluation of readiness

22 Regression Model for the Impact of a Data Breach

Page 23: Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the Impact of a Data Breach • An impactful data breach is a tail event often addressed

Conclusion • It is possible to develop a model that characterizes the cost of a PII data breach

• The forecasting accuracy of such a model is acceptable, over a large range of Affected Count and incident types

• It is important to characterizes incident type independent of methods used to cause the data breach: • Malicious Insider • Malicious Outsider • Accidents • Lost Stolen

• Both variables eliminated and variables kept can inform expert judgment

• Expert judgment can assess most reasonable confidence interval by evaluating incident response plan

23 Regression Model for the Impact of a Data Breach