Prevalence of MS in the US Using Big Data
Transcript of Prevalence of MS in the US Using Big Data
Prevalence of MS in the USUsing Big Data
William J Culpepper II, PhD, MA & Mitchell T Wallin, MD, MPH
PVA Summit, August 2018Dallas, TX
Prevalence of MS in the US
• Learning Objectives
1. Define prevalence and describe why it is important.
2. Describe the merits and limitations of using healthcare claims datasets for estimating prevalence.
3. Define case ascertainment of MS in healthcare claims datasets.
DISCLOSURESWJ Culpepper
• Work discussed in this presentation was supported a National MS Society grant (# HC-1508-05693)
• The presenter is PI on two other NMSS grants that are unrelated to this presentation
DISCLOSURESM Wallin
• Work discussed in this presentation was supported a National MS Society grant (# HC-1508-05693)
• The presenter is PI/co-PI on two NMSS grants and a VA Merit Review grant that are unrelated to this presentation
• The presenter receives funding from the VA MS Center of Excellence
Epidemiology overview : PREVALENCE
• Prevalence is the proportion of ALL cases (new and old) of a specified disease or condition occurring within a defined population over a prescribed period of time– Point prevalence : prevalence occurring on a specific date– Period prevalence: prevalence occurring within a specified time period
(e.g., 1 year)– Lifetime prevalence: prevalence occurring over the life span up to the time
of ascertainment
Epidemiology overview : PREVALENCE
• Most commonly reported as prevalence rate (PR): number of cases per 100,000 population–Provides a standardized measure of prevalence–Easy to generate the number affected
• Assess the scope or burden of disease–Acute conditions: prevalence incidence (e.g., ALS, pancreatic CA)–Chronic conditions: prevalence > incidence (e.g., MS, PD)
Epidemiology overview : PREVALENCE
• Rarely have data on entire population of interest
– Prevalence estimated frequently derived from a representative sample of the population
– Sample selection is critical in terms of representativeness
– Size of the sample important, especially for rare diseases like MS
– Numerator and denominator drawn from same data source(s) to avoid bias
Epidemiology overview : PREVALENCE
• Rigorous ascertainment of prevalent cases required–Single occurrence of a ICD DX code rarely reliable
• Validation of ascertainment algorithm (if none exists)–Formal analysis required
• Sensitivity• Specificity• Positive & negative predictive values• Accuracy & inter-rater agreement
–Comparator• Chart review determined DX • Following accepted DX criteria• Conducted by qualified & experienced clinician / abstractor
Problem: MS Prevalence in the US
• MS Prevalence Workgroup sponsored by NMSS–Existing estimates of MS prevalence are dated
• Anecdotally believed to underestimate the “true” prevalence of MS in the US–Convened June 2014 to discuss strategies and methods
• Traditional epidemiology (medical office charts, registries)• Surveys (BRFSS)• Administrative healthcare claims datasets (CMS, commercial)
–Consensus was to use administrative healthcare claims datasets as they provide• Ability to capture largest segment of US population by using datasets with overlapping coverage• The most time and cost efficient approach
MS Prevalence in the US: Case Ascertainment
• Case ascertainment algorithms for use in administrative healthcare claims datasets–VHA : Culpepper et al. J Rehab Res Dev 2006; 43(1): 17-24.–Manitoba, CA: Marrie et al., Neurology 2010; 74: 465-471.–Have slightly different specifications–Need a unified algorithm that performs equally well across disparate datasets
Case Definition Name* Number and Type of Claims
MS_A ≥2 IP or ≥3 OP
MS_B ≥2 IP or ≥4 OP
MS_C ≥2 IP or ≥5 OP
MS_D ≥2 IP or ≥3 OP or ≥1 DMT
MS_E (IP + OP + DMT) ≥ 3
IP = inpatient admission; OP = outpatient visit; DMT = disease modifying therapy.
*The performance of each algorithm was evaluated based on both a 1-year and a 2-year time period.
MS Prevalence in the US: Case AscertainmentMS Algorithm Data
Source Sensitivity Specificity PPV1 NPV2 Accuracy Youden’s J
1-year ascertainment period
MS_A-1≥2 IP or ≥3 OP
VA 86.1 82.5 97.8 39.6 85.7 0.69KPSC 78.9 74.9 95.5 34.2 78.3 0.54
MB 89.7 67.2 95.3 46.5 87.0 0.57
MS_B-1≥2 IP or ≥4 OP
VA 81.5 89.5 98.6 34.8 82.3 0.71KPSC 66.9 82.3 96.4 26.9 69.0 0.50
MB 79.0 77.9 96.4 33.2 78.9 0.57
MS_C-1≥2 IP or ≥5 OP
VA 76.1 90.4 98.6 29.4 77.5 0.67KPSC 55.3 87.4 96.7 22.4 59.5 0.43
MB 68.5 85.6 97.3 26.7 70.5 0.54MS_D-1
≥2 IP or ≥3 OP or ≥1 DMT
VA 86.6 82.5 97.8 40.4 86.2 0.69KPSC 87.4 73.0 95.7 46.1 85.6 0.60
MB 93.2 66.7 95.4 5.68 90.1 0.60
MS_E-1(IP + OP + DMT) ≥ 3
VA 87.2 82.2 97.8 41.4 86.7 0.69KPSC 85.5 76.2 96.1 43.6 84.3 0.62
MB 93.4 66.1 95.4 56.8 90.1 0.60
Results for 2-year ascertainment period not shown as there was little to no improvement in test statistics.1-positive predictive value; 2-negative predictive value; Youden’s J- inter-rater agreement.
MS Prevalence in the US: Case Ascertainment
• Summary of algorithms –Performed consistently across 3 disparate datasets–With minor exceptions, algorithms performed consistently across strata–Adding DMT improved ascertainment–2-year ascertainment period showed little improvement
• Best algorithm(s)–MS_D 1year : ≥ 2 IP or ≥ 3 OP or ≥ 1 DMT–MS_E 1year : (IP + OP + DMT) ≥ 3
• Selected MS_E (IP + OP + DMT) ≥ 3 –Had best test statistics overall–Ease of implementation
Problem: MS Prevalence in the US
• Target population – United States
• Sample Selection–No single dataset adequately
represents the US population–Multiple datasets needed–Datasets from different
segments (commercial and government) with broad geographical coverage a must
Nursing homes Prisons
VHA & DOD
Medicare & Medicaid
Commercially insured Self-insured/ Self-pay
ERISAundiagnosed
AboriginalNot under care
Admin. data
MS Prevalence in the US: Dataset Selection
• Commercial datasets–Optum ( 15 million covered)–Truven ( 33 million covered)
• Government datasets–Veterans Administration ( 11 million covered lives)–Medicare ( 27 million covered lives)–Medicaid ( 28 million covered lives)
• Sample Selection & Time Frame–Commercial and VHA dataset restricted to < 65 years –Due to high costs of data, acquired data from 2008 through 2010 to synch with most
recent census data (2010)–Datasets cover approximately 1/3 of the US population in 2010
MS Prevalence in the US: Dataset Selection
• Applied preferred algorithm annually in each dataset– (IP + OP + DMT) ≥ 3
• Computed cumulative prevalence for period of 2008 – 2010
• Observed 2010 prevalence –Truven: 194 / 100,00–Medicaid: 134 / 100,00
Year
2008 2009 2010
Cum
ulat
ive
Prev
alen
ce (M
S C
ases
/100
,000
pop
ulat
ion)
80
100
120
140
160
180
200
220
Optum Truven Medicaid Medicare VHA
MS Prevalence in the US: Dataset Selection
• To obtain a national estimate of prevalence of MS in the US need to
• Combine estimates across datasets• Adjust estimate for the uninsured• Standardize to the 2010 US Census• Adjust for having a short (3 yr) ascertainment period• Project rise in prevalence to derive estimate of life-time prevalence as of 2017
• These issues are detailed in the next presentation….
US MS Prevalence Estimates1976-2014 (cases per 100,000)
0
20
40
60
80
100
120
140
160
Baum -1976
Nelson -1986
Anderson- 1990
Noonon -1989-94
NMSS -2000
Campbell- 2016
• Primary: Obtain scientifically sound & economically feasible US national prevalence estimate
• Secondary: Understand the burden of disease
• Tertiary: Develop a long-term strategy to assess MS prevalence at regular intervals
18
Goals of the MS Prevalence Initiative(National MS Society)
Step 1:Identify dataset(s) covering publically and privately insured populations in the US
Step 2:Develop and validate a highly accurate MS case-finding algorithm that can be standardly applied in all
administrative health care databases
Step 3:Apply case definition algorithm to estimate the number of MS patients and prevalence estimate for each
database
Step 4:Combine MS prevalence estimates into a single estimate of the prevalence of MS
for the United States, weighted according to the number of insured personsin each health insurance segment
MS Case Finding in the United States
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
Employer Purchased Medicaid Medicare Military Uninsured
Age < 652010 2009 2008
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
90.0
100.0
Employer Purchased Medicaid Medicare Military Uninsured
Age > 652010 2009 2008
Apply commercial claims(Optum® & Truven®) Apply Medicare
(< 65)
Apply VA(< 65) Apply commercial claims
(Optum® & Truven®) Apply Medicare(> 65)
Apply VA(> 65)
Apply Medicaid(< 65)
Apply Medicaid(> 65)
Apply Medicaid(< 65) Apply Medicaid
(> 65)
% o
f pop
ulat
ion
with
insu
ranc
e ty
pe
% o
f pop
ulat
ion
with
insu
ranc
e ty
pe
Health Insurance Source in the US by Age(Current Population Survey, 2009)
Observation Period Effects on Prevalence EstimatesSystemic Lupus Erythematosis (Ng R, et al J Rheum 2013)
• Prevalence and incidence for SLE assessed with algorithm in Quebec health administrative databases (1989-2003)
• Using a 15-year observation period:– SLE incidence: 5.6 per 100,000– SLE prevalence: 59.1 per 100,000
• When using a 3-year observation period:– SLE incidence overestimate: 238%– SLE prevalence underestimate: 66%
• 10 years of data required for stable estimates of prevalence and incidence using administrative datasets for chronic conditions
Effect of short period of observation
• Cumulative prevalence applies to our case finding approach within datasets
• Once an individual meets the MS case definition for a given year, they are counted as a case for subsequent years through 2010 (if they remain alive)
• Long periods of observation (10 yrs) are required to approach lifetime prevalence for MS (relapsing-remitting nature)
• Undercount adjustment factors derived from our datasets comparing 3 vs 10 yrs:– VA: 37% (1.37)– Manitoba: 47%(1.47)
1998 2000 2002 2004 2006 2008 2010 2012 2014 2016
Prev
alen
ce (#
/ 10
0,00
0)
80
100
120
140
160
180
200
220
240
260
280
300
VA cumulative VA 2008-2010
a
37%
1998 2000 2002 2004 2006 2008 2010 2012 2014 2016
Prev
alen
ce (#
/ 10
0,00
0)
80
100
120
140
160
180
200
220
240
260
280
300MB cumulative MB 2008-2010
b
47%
c
Algorithm-based MS Prevalence Cumulated over 3 yearsUS National Datasets 2010
Group Optum18-> 65 yrs
Truven18-> 65 yrs
VA18-64 yrs
Medicaid18-> 65 yrs
Medicare> 65 yrs
Enrollees in dataset (2010)
7,648,528 36,391,252 4,989,121 18,997,545 21,735,979
PrevalencePer 100,000(95% CI)
183 (180-186)
186 (185-187)
178 (174-182)
155 (153-157)
129 (128-131)
Female prevalence(95% CI)
275 (270-280)
269 (267-271)
382 (366-398)
177 (175-180)
167 (165-170)
Maleprevalence(95% CI
88 (85-91)
92 (91-94)
153 (149-157)
102 (100-105)
69 (67-71)
National MS Prevalence Estimate Analytic Steps
• Optum and Truven treated as random samples drawn from the same underlying population & age and sex stratified estimates were pooled
• US Census data used to determine the total size of the US population in each age, sex and health insurance stratum
• Stratum-specific estimate was multiplied by the total insured US population in that stratum to determine the number of individuals affected
• Uninsured population estimated to be 5.0% (Slifka database)• Number of individuals identified with MS by algorithm summed in each
stratum then divided by US population denominator for 2010
National MS Prevalence Estimate Analytic Steps (cont.)
• Undercount adjustment factors for the 10-year cumulative prevalence were required: –Estimated range 1.37 (lower bound) to 1.47 (upper bound)–Applied these factors to derive estimates for the 2010 prevalence of MS cumulated
over 10 years• Average annual growth in the MS prevalence rate between 2010-2017 for two
AHC datasets was 2.3% per year. We applied this growth rate annually from 2010 onward to estimate “life-time” prevalence in 2017
MS Prevalence in the United States(Wallin M et al. MSJ –ECTRIMS, 2017)
• Validated MS algorithm applied to national administrative health claims datasets in the US: private insurance (Optum, Truven, Kaiser So.CA), government (Medicaid, Medicare, VA).
• Estimated 10-year period prevalence (2000-2010, upper bound) for MS:– Overall: 309.2 per 100,000 (95% CI: 308.1-310.1)
– Cases with MS: 727,344 cases
• Estimated “Life-time” prevalence for 2017– Overall: 362.6 per 100,000 – Cases with MS: 914,651
10-year US Period Prevalence (2000-2010) for MS per 100,000 (2010 US Census)
WEST MIDWEST
SOUTH
NORTHEAST
272.7per100,000(95%CI:271.0‐274.4)
F:MRatio:2.8
353.1per100,000(95%CI:351.1‐355.2)
F:MRatio:3.0
377.4per100,000(95%CI:375.2‐379.7)
F:MRatio:2.8
272.6per100,000(95%CI:271.2‐273.9)
F:MRatio:2.7
High estimate 2010 prevalence of MS in the US per 100,000 population by census region (2010 US Census)
Wallin M et al. MSJ –ECTRIMS, 2017
Global Burden of Neurological Disorders1990-2015 (GBD 2015 Neurol Disease Collaborators, Lancet Neurol 2017)
• Neurological disorders ranked as the leading cause group of DALYs in 2015 (10.2%)
• Neurological disorders were the second leading cause of mortality (16.8% global deaths)
• Most prevalent neurological conditions (millions):– Tension type headache (1,506)– Migraine (959)– Medication overuse headache (59)– Dementia (46)
Global Burden of Multiple Sclerosis (GBD 2015, Neurol Disease Coll Group)
• MS morbidity estimates derived with GBD methods using 123 unique sources on prevalence and 65 unique sources on incidence (prevalence, DALYs, YLL, mortality)
• 2,221,188 MS cases in 2016 worldwide (95% UI: 2,033,866- 2,436,858)
• Geographic gradients diffusing but persist in 2016
• MS ranked 14th among major neurological conditions as a cause of DALYs
Advancing Research for Neurological Diseases Act of 2015 (H.R. 292/S.849)
• One in six Americans suffers from a neurologic condition, yet the U.S. lacks a coordinated system to collect and analyze data on these conditions
• This legislation proposes a data collection system managed by the CDC that would track the incidence & prevalence of neurological diseases, including MS
• Bipartisan support in 114th Congress and authorized by the 21st Century Cures Act, signed into law December 2016 (Public Law 114-255)
• No funding appropriated to date
United States MS Prevalence Workgroup(NMSS)
Mitch Wallin, MD, MPH - Chair
Stephen Buka, ScD
Jonathan Campbell, PhD
Joel Culpepper, PhD
Gary Cutter, PhD
Weyman Johnson
Wendy Kaye, PhD
Nicholas LaRocca
Annette Langer-Gould, MD, PhD
Albert Lo, MD, PhD
Ruth Ann Marrie, MD, PhD
Robert McBurney, PhD
Oleg Muravov, MD, PhD
Lorene Nelson, PhD
Leslie Ritter
Helen Tremlett, PhD
Prevalence of MS in the United StatesConclusions
• MS prevalence in the US in 2010 is comparable to Canadian estimates and much higher than previously reported
• Trends for worldwide increases in MS prevalence and burden are notable and contribute significantly to the global burden of disease
• Modifiable risk factor for MS onset should be targeted for MS prevention: smoking, obesity, sunlight exposure
• Use of algorithms with large population datasets will provide basis for efficient morbidity and mortality surveillance in the future (US Neurological Disease Surveillance System, Public Law 114-255)
Using Big Data: Prevalence of MS in the United States
QUESTIONS
CE/CME Credit
If you would like to receive continuing education credit for this activity, please visit:
http://PVA.cds.pesgce.com