Prevalence of MS in the US Using Big Data

Prevalence of MS in the USUsing Big Data

William J Culpepper II, PhD, MA & Mitchell T Wallin, MD, MPH

PVA Summit, August 2018Dallas, TX

Prevalence of MS in the US

• Learning Objectives

1. Define prevalence and describe why it is important.

2. Describe the merits and limitations of using healthcare claims datasets for estimating prevalence.

3. Define case ascertainment of MS in healthcare claims datasets.

DISCLOSURESWJ Culpepper

• Work discussed in this presentation was supported a National MS Society grant (# HC-1508-05693)

• The presenter is PI on two other NMSS grants that are unrelated to this presentation

DISCLOSURESM Wallin

• Work discussed in this presentation was supported a National MS Society grant (# HC-1508-05693)

• The presenter is PI/co-PI on two NMSS grants and a VA Merit Review grant that are unrelated to this presentation

• The presenter receives funding from the VA MS Center of Excellence

Epidemiology overview : PREVALENCE

• Prevalence is the proportion of ALL cases (new and old) of a specified disease or condition occurring within a defined population over a prescribed period of time– Point prevalence : prevalence occurring on a specific date– Period prevalence: prevalence occurring within a specified time period

(e.g., 1 year)– Lifetime prevalence: prevalence occurring over the life span up to the time

of ascertainment


• Most commonly reported as prevalence rate (PR): number of cases per 100,000 population–Provides a standardized measure of prevalence–Easy to generate the number affected

• Assess the scope or burden of disease–Acute conditions: prevalence incidence (e.g., ALS, pancreatic CA)–Chronic conditions: prevalence > incidence (e.g., MS, PD)


• Rarely have data on entire population of interest

– Prevalence estimated frequently derived from a representative sample of the population

– Sample selection is critical in terms of representativeness

– Size of the sample important, especially for rare diseases like MS

– Numerator and denominator drawn from same data source(s) to avoid bias


• Rigorous ascertainment of prevalent cases required–Single occurrence of a ICD DX code rarely reliable

• Validation of ascertainment algorithm (if none exists)–Formal analysis required

• Sensitivity• Specificity• Positive & negative predictive values• Accuracy & inter-rater agreement

–Comparator• Chart review determined DX • Following accepted DX criteria• Conducted by qualified & experienced clinician / abstractor

Problem: MS Prevalence in the US

• MS Prevalence Workgroup sponsored by NMSS–Existing estimates of MS prevalence are dated

• Anecdotally believed to underestimate the “true” prevalence of MS in the US–Convened June 2014 to discuss strategies and methods

• Traditional epidemiology (medical office charts, registries)• Surveys (BRFSS)• Administrative healthcare claims datasets (CMS, commercial)

–Consensus was to use administrative healthcare claims datasets as they provide• Ability to capture largest segment of US population by using datasets with overlapping coverage• The most time and cost efficient approach

MS Prevalence in the US: Case Ascertainment

• Case ascertainment algorithms for use in administrative healthcare claims datasets–VHA : Culpepper et al. J Rehab Res Dev 2006; 43(1): 17-24.–Manitoba, CA: Marrie et al., Neurology 2010; 74: 465-471.–Have slightly different specifications–Need a unified algorithm that performs equally well across disparate datasets

Case Definition Name* Number and Type of Claims

MS_A ≥2 IP or ≥3 OP

MS_B ≥2 IP or ≥4 OP

MS_C ≥2 IP or ≥5 OP

MS_D ≥2 IP or ≥3 OP or ≥1 DMT

MS_E (IP + OP + DMT) ≥ 3

IP = inpatient admission; OP = outpatient visit; DMT = disease modifying therapy.

*The performance of each algorithm was evaluated based on both a 1-year and a 2-year time period.

MS Prevalence in the US: Case AscertainmentMS Algorithm Data

Source Sensitivity Specificity PPV1 NPV2 Accuracy Youden’s J

1-year ascertainment period

MS_A-1≥2 IP or ≥3 OP

VA 86.1 82.5 97.8 39.6 85.7 0.69KPSC 78.9 74.9 95.5 34.2 78.3 0.54

MB 89.7 67.2 95.3 46.5 87.0 0.57

MS_B-1≥2 IP or ≥4 OP

VA 81.5 89.5 98.6 34.8 82.3 0.71KPSC 66.9 82.3 96.4 26.9 69.0 0.50

MB 79.0 77.9 96.4 33.2 78.9 0.57

MS_C-1≥2 IP or ≥5 OP

VA 76.1 90.4 98.6 29.4 77.5 0.67KPSC 55.3 87.4 96.7 22.4 59.5 0.43

MB 68.5 85.6 97.3 26.7 70.5 0.54MS_D-1

≥2 IP or ≥3 OP or ≥1 DMT

VA 86.6 82.5 97.8 40.4 86.2 0.69KPSC 87.4 73.0 95.7 46.1 85.6 0.60

MB 93.2 66.7 95.4 5.68 90.1 0.60

MS_E-1(IP + OP + DMT) ≥ 3

VA 87.2 82.2 97.8 41.4 86.7 0.69KPSC 85.5 76.2 96.1 43.6 84.3 0.62

MB 93.4 66.1 95.4 56.8 90.1 0.60

Results for 2-year ascertainment period not shown as there was little to no improvement in test statistics.1-positive predictive value; 2-negative predictive value; Youden’s J- inter-rater agreement.

MS Prevalence in the US: Case Ascertainment

• Summary of algorithms –Performed consistently across 3 disparate datasets–With minor exceptions, algorithms performed consistently across strata–Adding DMT improved ascertainment–2-year ascertainment period showed little improvement

• Best algorithm(s)–MS_D 1year : ≥ 2 IP or ≥ 3 OP or ≥ 1 DMT–MS_E 1year : (IP + OP + DMT) ≥ 3

• Selected MS_E (IP + OP + DMT) ≥ 3 –Had best test statistics overall–Ease of implementation

Problem: MS Prevalence in the US

• Target population – United States

• Sample Selection–No single dataset adequately

represents the US population–Multiple datasets needed–Datasets from different

segments (commercial and government) with broad geographical coverage a must

Nursing homes Prisons

VHA & DOD

Medicare & Medicaid

Commercially insured Self-insured/ Self-pay

ERISAundiagnosed

AboriginalNot under care

Admin. data

MS Prevalence in the US: Dataset Selection

• Commercial datasets–Optum ( 15 million covered)–Truven ( 33 million covered)

• Government datasets–Veterans Administration ( 11 million covered lives)–Medicare ( 27 million covered lives)–Medicaid ( 28 million covered lives)

• Sample Selection & Time Frame–Commercial and VHA dataset restricted to < 65 years –Due to high costs of data, acquired data from 2008 through 2010 to synch with most

recent census data (2010)–Datasets cover approximately 1/3 of the US population in 2010


• Applied preferred algorithm annually in each dataset– (IP + OP + DMT) ≥ 3

• Computed cumulative prevalence for period of 2008 – 2010

• Observed 2010 prevalence –Truven: 194 / 100,00–Medicaid: 134 / 100,00

Year

2008 2009 2010

Cum

ulat

ive

Prev

alen

ce (M

S C

ases

/100

,000

pop

ulat

ion)

80

100

120

140

160

180

200

220

Optum Truven Medicaid Medicare VHA


• To obtain a national estimate of prevalence of MS in the US need to

• Combine estimates across datasets• Adjust estimate for the uninsured• Standardize to the 2010 US Census• Adjust for having a short (3 yr) ascertainment period• Project rise in prevalence to derive estimate of life-time prevalence as of 2017

• These issues are detailed in the next presentation….

US MS Prevalence Estimates1976-2014 (cases per 100,000)

0

20

40

60

80

100

120

140

160

Baum -1976

Nelson -1986

Anderson- 1990

Noonon -1989-94

NMSS -2000

Campbell- 2016

• Primary: Obtain scientifically sound & economically feasible US national prevalence estimate

• Secondary: Understand the burden of disease

• Tertiary: Develop a long-term strategy to assess MS prevalence at regular intervals

18

Goals of the MS Prevalence Initiative(National MS Society)

Step 1:Identify dataset(s) covering publically and privately insured populations in the US

Step 2:Develop and validate a highly accurate MS case-finding algorithm that can be standardly applied in all

administrative health care databases

Step 3:Apply case definition algorithm to estimate the number of MS patients and prevalence estimate for each

database

Step 4:Combine MS prevalence estimates into a single estimate of the prevalence of MS

for the United States, weighted according to the number of insured personsin each health insurance segment

MS Case Finding in the United States

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

Employer Purchased Medicaid Medicare Military Uninsured

Age < 652010 2009 2008

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.0

Employer Purchased Medicaid Medicare Military Uninsured

Age > 652010 2009 2008

Apply commercial claims(Optum® & Truven®) Apply Medicare

(< 65)

Apply VA(< 65) Apply commercial claims

(Optum® & Truven®) Apply Medicare(> 65)

Apply VA(> 65)

Apply Medicaid(< 65)

Apply Medicaid(> 65)

Apply Medicaid(< 65) Apply Medicaid

(> 65)

% o

f pop

ulat

ion

with

insu

ranc

e ty

pe

% o

f pop

ulat

ion

with

insu

ranc

e ty

pe

Health Insurance Source in the US by Age(Current Population Survey, 2009)

Observation Period Effects on Prevalence EstimatesSystemic Lupus Erythematosis (Ng R, et al J Rheum 2013)

• Prevalence and incidence for SLE assessed with algorithm in Quebec health administrative databases (1989-2003)

• Using a 15-year observation period:– SLE incidence: 5.6 per 100,000– SLE prevalence: 59.1 per 100,000

• When using a 3-year observation period:– SLE incidence overestimate: 238%– SLE prevalence underestimate: 66%

• 10 years of data required for stable estimates of prevalence and incidence using administrative datasets for chronic conditions

Effect of short period of observation

• Cumulative prevalence applies to our case finding approach within datasets

• Once an individual meets the MS case definition for a given year, they are counted as a case for subsequent years through 2010 (if they remain alive)

• Long periods of observation (10 yrs) are required to approach lifetime prevalence for MS (relapsing-remitting nature)

• Undercount adjustment factors derived from our datasets comparing 3 vs 10 yrs:– VA: 37% (1.37)– Manitoba: 47%(1.47)

1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

Prev

alen

ce (#

/ 10

0,00

0)

80

100

120

140

160

180

200

220

240

260

280

300

VA cumulative VA 2008-2010

a

37%

1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

Prev

alen

ce (#

/ 10

0,00

0)

80

100

120

140

160

180

200

220

240

260

280

300MB cumulative MB 2008-2010

b

47%

c

Algorithm-based MS Prevalence Cumulated over 3 yearsUS National Datasets 2010

Group Optum18-> 65 yrs

Truven18-> 65 yrs

VA18-64 yrs

Medicaid18-> 65 yrs

Medicare> 65 yrs

Enrollees in dataset (2010)

7,648,528 36,391,252 4,989,121 18,997,545 21,735,979

PrevalencePer 100,000(95% CI)

183 (180-186)

186 (185-187)

178 (174-182)

155 (153-157)

129 (128-131)

Female prevalence(95% CI)

275 (270-280)

269 (267-271)

382 (366-398)

177 (175-180)

167 (165-170)

Maleprevalence(95% CI

88 (85-91)

92 (91-94)

153 (149-157)

102 (100-105)

69 (67-71)

National MS Prevalence Estimate Analytic Steps

• Optum and Truven treated as random samples drawn from the same underlying population & age and sex stratified estimates were pooled

• US Census data used to determine the total size of the US population in each age, sex and health insurance stratum

• Stratum-specific estimate was multiplied by the total insured US population in that stratum to determine the number of individuals affected

• Uninsured population estimated to be 5.0% (Slifka database)• Number of individuals identified with MS by algorithm summed in each

stratum then divided by US population denominator for 2010

National MS Prevalence Estimate Analytic Steps (cont.)

• Undercount adjustment factors for the 10-year cumulative prevalence were required: –Estimated range 1.37 (lower bound) to 1.47 (upper bound)–Applied these factors to derive estimates for the 2010 prevalence of MS cumulated

over 10 years• Average annual growth in the MS prevalence rate between 2010-2017 for two

AHC datasets was 2.3% per year. We applied this growth rate annually from 2010 onward to estimate “life-time” prevalence in 2017

MS Prevalence in the United States(Wallin M et al. MSJ –ECTRIMS, 2017)

• Validated MS algorithm applied to national administrative health claims datasets in the US: private insurance (Optum, Truven, Kaiser So.CA), government (Medicaid, Medicare, VA).

• Estimated 10-year period prevalence (2000-2010, upper bound) for MS:– Overall: 309.2 per 100,000 (95% CI: 308.1-310.1)

– Cases with MS: 727,344 cases

• Estimated “Life-time” prevalence for 2017– Overall: 362.6 per 100,000 – Cases with MS: 914,651

10-year US Period Prevalence (2000-2010) for MS per 100,000 (2010 US Census)

WEST MIDWEST

SOUTH

NORTHEAST

272.7per100,000(95%CI:271.0‐274.4)

F:MRatio:2.8

353.1per100,000(95%CI:351.1‐355.2)

F:MRatio:3.0

377.4per100,000(95%CI:375.2‐379.7)

F:MRatio:2.8

272.6per100,000(95%CI:271.2‐273.9)

F:MRatio:2.7

High estimate 2010 prevalence of MS in the US per 100,000 population by census region (2010 US Census)

Wallin M et al. MSJ –ECTRIMS, 2017

Global Burden of Neurological Disorders1990-2015 (GBD 2015 Neurol Disease Collaborators, Lancet Neurol 2017)

• Neurological disorders ranked as the leading cause group of DALYs in 2015 (10.2%)

• Neurological disorders were the second leading cause of mortality (16.8% global deaths)

• Most prevalent neurological conditions (millions):– Tension type headache (1,506)– Migraine (959)– Medication overuse headache (59)– Dementia (46)

Global Burden of Multiple Sclerosis (GBD 2015, Neurol Disease Coll Group)

• MS morbidity estimates derived with GBD methods using 123 unique sources on prevalence and 65 unique sources on incidence (prevalence, DALYs, YLL, mortality)

• 2,221,188 MS cases in 2016 worldwide (95% UI: 2,033,866- 2,436,858)

• Geographic gradients diffusing but persist in 2016

• MS ranked 14th among major neurological conditions as a cause of DALYs

Advancing Research for Neurological Diseases Act of 2015 (H.R. 292/S.849)

• One in six Americans suffers from a neurologic condition, yet the U.S. lacks a coordinated system to collect and analyze data on these conditions

• This legislation proposes a data collection system managed by the CDC that would track the incidence & prevalence of neurological diseases, including MS

• Bipartisan support in 114th Congress and authorized by the 21st Century Cures Act, signed into law December 2016 (Public Law 114-255)

• No funding appropriated to date

United States MS Prevalence Workgroup(NMSS)

Mitch Wallin, MD, MPH - Chair

Stephen Buka, ScD

Jonathan Campbell, PhD

Joel Culpepper, PhD

Gary Cutter, PhD

Weyman Johnson

Wendy Kaye, PhD

Nicholas LaRocca

Annette Langer-Gould, MD, PhD

Albert Lo, MD, PhD

Ruth Ann Marrie, MD, PhD

Robert McBurney, PhD

Oleg Muravov, MD, PhD

Lorene Nelson, PhD

Leslie Ritter

Helen Tremlett, PhD

Prevalence of MS in the United StatesConclusions

• MS prevalence in the US in 2010 is comparable to Canadian estimates and much higher than previously reported

• Trends for worldwide increases in MS prevalence and burden are notable and contribute significantly to the global burden of disease

• Modifiable risk factor for MS onset should be targeted for MS prevention: smoking, obesity, sunlight exposure

• Use of algorithms with large population datasets will provide basis for efficient morbidity and mortality surveillance in the future (US Neurological Disease Surveillance System, Public Law 114-255)

Using Big Data: Prevalence of MS in the United States

QUESTIONS

CE/CME Credit

If you would like to receive continuing education credit for this activity, please visit:

http://PVA.cds.pesgce.com

Prevalence of MS in the US Using Big Data

Documents

Transcript of Prevalence of MS in the US Using Big Data