Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the...
Transcript of Regression Model for the Impact of a Data Breach …...2 Introduction Regression Model for the...
Regression Model for the Impact of a Data Breach for a Financial Institution
Thomas Lee2, Jason Hegland1 , Spencer Graves2
1. Stanford University Law School, Stanford Security Litigation Analytics, Palo Alto California 2. VivoSecurity Inc., Los Altos CA,
2
Introduction
Regression Model for the Impact of a Data Breach
• An impactful data breach is a tail event often addressed in scenario analysis using expert judgment.
• The questioner-studies, which are a fixed amount per record, are often used.
• We have performed a regression analysis to aid expert judgment.
• We find variables considered but eliminated, as well as variables retained provide insights to expert.
3 Regression Model for the Impact of a Data Breach
Sources & Methods
Source Purpose Advisen • Principle source
• Data breach cost • Lawsuit probability
EDGAR • Research cost in 10-K SEC filings
California Attorney General
• Comparison analysis
Maryland Attorney General
• Comparison analysis. • Estimate total number of
data breaches HHS • Comparison analysis.
Interviews • Challenge assumptions
Tool Purpose SQL Server • Filtering
• Aggregation R-Studio • Modeling Python • Additional variables
Excel • Manual corrections
Summary
Step Purpose
Data Correction • Created independent and meaningful variables
BMA • Screen many variables
LM • Final cost modeling • Establish SE
GLM • Lawsuit Probability
Monte Carlo • Combine cost model with lawsuit probability model
Tools
Process
Sources
Critical
4 Regression Model for the Impact of a Data Breach
• Impact from breach of PII that triggers legal reporting requirements: • PHI, PII, CHD, PFI
• Does not include: • Impact from intellectual property loss • Denial of services • Ransomware • Data corruption or loss • Fraud
Establish Scope Total Cost of PII Data Breach
5 Regression Model for the Impact of a Data Breach
Total Cost of PII Data Breach
Investigation
Notification
Call center
Remediation
o Business Loss o Damage to personal credit o Theft of money & goods o Credit card replacement costs
Business loss; theft of money & goods
Credit monitoring & privacy insurance.
Fines & settlements
Public & Other Businesses Breach Company
Mitigate
Transfer via suits
Tota
l cos
ts
Function of Incident Type
Func
tion
of p
eopl
e af
fect
ed
Regression Model for the Impact of a Data Breach
1
10
100
1000
Num
ber o
f Inc
iden
ts
Breach Size
20082013
No Significant Size vs Frequency Bias Suitability of Data
Characterization of Size vs Frequency, whole United States, based upon Maryland Attorney General
Characterization of Size vs Frequency, for Advisen Data
6
0.01
0.1
1
10
100
Num
ber o
f Inc
iden
ts
Data Breach Size
Chart Title
7 Regression Model for the Impact of a Data Breach
0
50
100
150
200
250
300
350
400
0
200
400
600
800
1000
1200
1400
1600
1800
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Inci
dent
s
Law
suits
Year
Incidents
Lawsuits
Unexplained drop-off in incidents
Data breaches are increasing
Could Affect Lawsuit Probability
Suitability of Data Significant drop in Incidents per Year
Maryland Breach Notices
8 Regression Model for the Impact of a Data Breach
Data Transformation
Best transformed as log
Affected Count
9 Regression Model for the Impact of a Data Breach
Best transformed as log
Data Transformation Calculated Total
10 Regression Model for the Impact of a Data Breach
Best transformed as log
Data Transformation Lawsuits
11 Regression Model for the Impact of a Data Breach
Best Cost Model
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 = 𝑒𝑒 12−3.6×𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵−1.4×(𝐼𝐼𝐼𝐼𝐵𝐵𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐵𝐼𝐼𝐵𝐵)+0.7×ln 1+𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐼𝐼𝐼𝐼𝐿𝐿 +𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵×ln 𝐴𝐴𝐴𝐴𝐴𝐴𝐵𝐵𝐼𝐼𝐼𝐼2
Variable Description Pr(>|t|)
BrchY Data was breach, leading to 1) notification, 2) call center, 3) privacy insurance, 4) possible lawsuits
4.9e-4
Lawsuits Number of lawsuits filed as a result of the data breach. 5.1e-2
Affctd Number of people affected by the data breach (number of people with exposed PII data)
1.7e-9
IncidentOther Data breach was caused by any the following: Malicious Insider, Lost or Stolen Device or Accident. Data breach was NOT caused by a Malicious Outsider.
6.0e-3
R-Sqrd 0.69
Full Formula
Forecast Accuracy 10 observations from training set, NAICS=52
12 Regression Model for the Impact of a Data Breach
2x range
• 19 Observations found after model development • Red bars are median costs, • error bars are 2x range
13 Regression Model for the Impact of a Data Breach
Forecast Accuracy
• Based upon Variables • Based upon Confidence Interval
Interpretation
14 Regression Model for the Impact of a Data Breach
15 Regression Model for the Impact of a Data Breach
Condensing of Incident Type
Interpretation
Final Model Initially Modeled
Malicious Outsider Malicious Outsider
Other
Malicious Insider
Lost/Stolen
Accident
Investigation
Notification
Call center
Remediation
o Business Loss o Damage to credit o Theft of money o Card replacement
Business loss; theft of money & goods
Credit monitoring & privacy insurance.
Fines & settlements
Public & Other Breach Company
Mitigate
Transfer via suits
The only costs that would be different
16 Regression Model for the Impact of a Data Breach
Breached=Y vs Breached=N
Malicious Outsider
Other
Breached=No
Breached=Yes
Same cost relationship between MO and Other suggests investigation cost is an importance difference between MO and Other
Based upon Variables
Interpretation
Investigation
Notification
Call center
Remediation
o Business o Damage o Theft of o Card rep
Business loss; theft of money & goods
Credit monitoring & privacy insurance.
Fines & settlements
Public & O Breach Company
Mitigate
Transfer via suits
Only costs when Breached=No
17 Regression Model for the Impact of a Data Breach
Overview Breached=Y
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑀𝑀𝐼𝐼 = 4,500 × 1 + 𝐿𝐿𝐿𝐿𝐿𝐿𝐶𝐶𝐿𝐿𝐿𝐿𝐶𝐶𝐶𝐶 0.7 × 𝐴𝐴𝐴𝐴𝐴𝐴𝑒𝑒𝐴𝐴𝐶𝐶𝑒𝑒𝐴𝐴
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐼𝐼𝐼𝐼𝐵𝐼𝐼𝐵𝐵 = 1,100 × 1 + 𝐿𝐿𝐿𝐿𝐿𝐿𝐶𝐶𝐿𝐿𝐿𝐿𝐶𝐶𝐶𝐶 0.7 × 𝐴𝐴𝐴𝐴𝐴𝐴𝑒𝑒𝐴𝐴𝐶𝐶𝑒𝑒𝐴𝐴
• Two types of PII breach: • Malicious Outsider • Other
• Malicious Outsider is 4x costlier • Cost increases by sqrt of people
affected • One lawsuit doubles the cost
Interestingly, we see the same bifurcation of incident types when looking at size vs frequency
0.01
0.1
1
10
100
1 2 4 8 16
32
64
128
256
512
1,0
24 2
,048
4,0
96 8
,192
16,
384
32,
768
65,
536
131
,072
262
,144
524
,288
1,0
48,5
76 2
,097
,152
4,1
94,3
04
Num
ber o
f Inc
iden
ts
Number of People Affected
Malicious OutsiderOther
Based upon Variables
Interpretation
Variable Importance Interpretation, Guide the Expert
Year Small We can draw upon lessons learned from past data breaches.
NAICS=52 Small We can draw upon lessons learned across industries. Look towards costs that are the same across industries: investigation, notification, credit monitoring, call center
Company Demographic
Small Suggests reputation damage may be less important
Data.Breached Large Consistent with Investigation cost being significant when variable is considered with Incident Type,
Data Type Small Suggests damage to customers is small. Look towards costs that are independent of data exposed: investigation, notification, call center.
Incident Type Large Consistent with investigation costs being significant.
Affected Count Large, (Sqrt)
Should not use a constant multiplier per person affected since there is efficiency with scale. Don’t use record count, use people affected. Consistent with Investigation and Notification costs being major costs.
Lawsuits Large Focus on reducing probability of lawsuits for large data breaches. Most costs will be experienced over several years.
18 Regression Model for the Impact of a Data Breach
Investigation
Notification
Call center
Remediation
o Business Loss o Damage to cre o Theft of mone o Card replacem
Business loss; theft of money & goods
Credit monitoring & privacy insurance.
Fines & settlements
Public & Other Breach Company
Mitigate
Transfer via suits
Based upon Confidence Interval
Interpretation $0
$50,
000
$100
,000
$150
,000
$200
,000
$250
,000
$300
,000
$350
,000
Like
lyho
od o
f Cos
t
Breach Cost Thousands
Median Cost, $23,688,721
80% Confidence, $122,557,359
90% Confidence, $289,370,426
Cost of a Data Breach Affecting 10M People, caused by Malicious Outsider
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 >0 1 2 3
Prob
abili
ty
Number of Lawsuits
Probability of Lawsuits
19 Regression Model for the Impact of a Data Breach
Based upon Confidence Interval
Interpretation $0
$50,
000
$100
,000
$150
,000
$200
,000
$250
,000
$300
,000
$350
,000
Like
lyho
od o
f Cos
t
Breach Cost Thousands
Median Cost, $23,688,721
80% Confidence, $122,557,359
90% Confidence, $289,370,426
20 Regression Model for the Impact of a Data Breach
0.01
0.1
1
10
Num
ber o
f Dat
a Br
each
es
Number of People Affected
Malicious Outsider, United States, 2015
A data breach affecting 10M People is a rare event
across all industry
Since a data breach affecting 10M people is rare, the 80% Confidence
interval is CRAZY rare
Based upon Confidence Interval
Interpretation $0
$50,
000
$100
,000
$150
,000
$200
,000
$250
,000
$300
,000
$350
,000
Like
lyho
od o
f Cos
t
Breach Cost Thousands
Median Cost, $23,688,721
80% Confidence, $122,557,359
90% Confidence, $289,370,426
Expert Judgment can select confidence interval based upon the companies incident response plan vis-à-vis industry best practice and industry peers.
Most costs are within the control of the company and managed as part of the incident response plan
21 Regression Model for the Impact of a Data Breach
Investigation
Notification
Call center
Remediation
o Busine o Damag o Theft o o Card re
Business loss; theft of money & goods
Credit monitoring & privacy insurance.
Fines & settlements
Public & Breach Company
Mitigate
Transfer via suits
Most Important
Based upon Confidence Interval
Interpretation
Incident Response best practice: • All access logs turned on • Access logs saved in a read-only manner • Access logs saved in a uniform format • Supporting policies, procedures, training and records • Tools to aid investigation: Carbon Black, end-point detection and response
technology • Experienced third party evaluation of readiness
22 Regression Model for the Impact of a Data Breach
Conclusion • It is possible to develop a model that characterizes the cost of a PII data breach
• The forecasting accuracy of such a model is acceptable, over a large range of Affected Count and incident types
• It is important to characterizes incident type independent of methods used to cause the data breach: • Malicious Insider • Malicious Outsider • Accidents • Lost Stolen
• Both variables eliminated and variables kept can inform expert judgment
• Expert judgment can assess most reasonable confidence interval by evaluating incident response plan
23 Regression Model for the Impact of a Data Breach