Post on 07-Nov-2014
description
Evaluation considerations for EHR-based phenotyping algorithms: A case study for Drug Induced Liver Injury
Casey Overby, Chunhua Weng, Krystl Haerian, Adler Perotte Carol Friedman, George Hripcsak
Department of Biomedical Informatics
Columbia University
AMIA TBI paper presentation March 20th, 2013 !
Published Genome-Wide Associations through 09/2011 1,596 published GWA at p≤5X10-8 for 249 traits
NHGRI GWA Catalog www.genome.gov/GWAStudies
Success in part due to GWAS consortia to obtain needed sample sizes
Background and Motivation
There are added challenges to studying pharmacological traits
Drug response is complex Risk factors in pathogenesis of drug induced liver injury (DILI)
Sample sizes are small compared to typical association studies of common disease Adverse drug events Responder types
Source: Tujios & Fontana et al. Nat. Rev. Gastroenterol. Hepatol. 2011
Background and Motivation
Consortium recruitment approaches
Recruit and phenotype participants prospectively Protocol driven recruitment
Electronic health records (EHR) linked with DNA biorepositories EHR phenotyping
Background and Motivation
Successes developing EHR algorithms within eMERGE
Type II diabetes Peripheral arterial disease Atrial fibrillation Crohn disease Multiple sclerosis Rheumatoid arthritis
High PPV
(Kho et al. JAMIA 2012; Kullo et al. JAMIA 2010; Ritchie et al. AJHG 2010; Denny et al. Circulation 2010; Peissig et al. JAMIA 2012)
Source: www.phekb.org
Background and Motivation
Unique characteristics of DILI
Rare condition of low prevalence Complex condition
Drug is causal agent of liver injury Different drugs can cause DILI Pattern of liver injury varies between drug
Pattern of liver injury based on liver enzyme elevations No tests to confirm drug causality (some assessment tools exist)
High PPV may be challenging
Background and Motivation
Why study? DILI accounts for up to 15 % of acute liver failure cases in the
U.S., of which 75% requires liver transplant or lead to death Most frequent reason for withdrawal of approved drugs from the
market
Lack understanding of underlying mechanisms of DILI Computerized approaches can reduce the burden of
traditional approaches to screening for rare conditions (Jha AK et al. JAMIA 1998; Thadani SR et al. JAMIA 2009)
Background and Motivation
Overview of EHR phenotyping process
Case definition
Background and Motivation
Overview of EHR phenotyping process
Case definition (Re-)Design
EHR Phenotyping algorithm
Background and Motivation
e.g., liver injury e.g., ICD-9 codes for acute liver injury, Decreased liver function lab
Overview of EHR phenotyping process
Case definition (Re-)Design
EHR Phenotyping algorithm
Implement EHR Phenotyping
algorithm
Background and Motivation
Overview of EHR phenotyping process
Case definition (Re-)Design
EHR Phenotyping algorithm
Evaluate EHR Phenotyping
algorithm
Implement EHR Phenotyping
algorithm
Background and Motivation
Overview of EHR phenotyping process
Case definition (Re-)Design
EHR Phenotyping algorithm
Evaluate EHR Phenotyping
algorithm
Implement EHR Phenotyping
algorithm
Disseminate EHR Phenotyping
algorithm
If algorithm needs
improvement
If algorithm is sufficient to be
useful
Background and Motivation
Overview of methods to develop & evaluate initial algorithm
DILI Case definition (iSAEC)
Design EHR Phenotyping
algorithm
Evaluate EHR Phenotyping
algorithm
Implement EHR Phenotyping
algorithm
Disseminate EHR Phenotyping
algorithm
Methods and Results
Overview of methods to develop & evaluate initial algorithm
DILI Case definition (iSAEC)
Design EHR Phenotyping
algorithm
Evaluate EHR Phenotyping
algorithm
Implement EHR Phenotyping
algorithm
Disseminate EHR Phenotyping
algorithm
Develop an evaluation framework
Report lessons learned
Methods and Results
DILI Case definition (iSAEC)
Design EHR Phenotyping
algorithm
Evaluate EHR Phenotyping
algorithm
Implement EHR Phenotyping
algorithm
Disseminate EHR Phenotyping
algorithm
Develop an evaluation framework
Report lessons learned
Lessons inform evaluator approach and algorithm design changes
Lessons learned
DILI case definition 1. Liver injury diagnosis (A1)
a. Acute liver injury (C1-C4) b. New liver injury (B)
2. Caused by a drug a. New drug (A2) b. Not by another disease (D)
A1. Diagnosed with liver
injury?
Clinical data warehouse
B. New liver
injury?
Consider chronicity
no
C2. ALT >= 5x ULN
Patients meeting drug induced liver injury criteria
A2. Exposure to
drug?
C1. ALP >= 2x ULN
C3. ALT >= 3x ULN
C4. Bilirubin >=
2x ULN
yes
yes
no
no
no
Exclude
yes
no
Exclude
D. Other
diagnoses?yes
yes
yes
no
yes
Exclude
no
Exclude
yes
no
Exclude
18,423
13,972
2,375
1,264
560
Initial DILI EHR phenotyping algorithm
Methods and Results
Ref: Aithal, G.P., et al. Case Definition and Phenotype Standardization in Drug-induced Liver Injury. Clin Charmacol Ther. 2011 Jun; 89(6):806-15
TP: 27
FP: 42
NA: 30
PPV: TP/(TP+FP) = 27/(42+27) = 39%
Preliminary kappa coefficient: 0.50 (Moderate agreement)
Interpretation of PPV is unclear given moderate agreement among reviewers
Initial algorithm results: 100 randomly selected for manual review from 560 patients
Estimated positive predictive value
20
20
20
20
20 Reviewer 2
Reviewer 1
Reviewer 3
Reviewer 4
Methods and Results
Included measurement and demonstration studies
Measurement study goal “to determine the extent and nature of the errors with which a measurement
is made using a specific instrument.” Evaluator effectiveness
Demonstration study goal “establishes a relation – which may be associational or causal – between a set
of measured variables.” Algorithm performance
Definitions from: “Evaluation methods in medical informatics” Friedman & Wyatt 2006
Methods and Results
Included quantitative and qualitative assessment
Quantitative data Inter-rater reliability assessment PPV
Qualitative data Perceptions of evaluation approach effectiveness
e.g., review tool, artifacts reviewed
Perceptions of benefit of results e.g., correct for the case definition?
Methods and Results
An evaluation framework and results Measurement study
(evaluator effectiveness) Demonstration study
(algorithm performance)
Qua
ntit
ativ
e re
sult
s Kappa coefficient: 0.50 TP: 27 FP: 42 NA: 30 PPV: TP/(TP+FP) = 39%
Qua
litat
ive
resu
lts
Perceptions of evaluation approach effectiveness: • Differences between evaluation
platforms • Visualizing lab values • Availability of notes
• Discharge summary vs. other notes
Perceptions of benefit of results (themes in FPs): • Babies • Patients who died • Overdose patients • Patients who had a liver transplant
Methods and Results
Lesson learned: What’s correct for the algorithm may not be correct for the case definition
Are we measuring what we mean to measure?
Case definition: liver injury due to medication, not by another disease
Many FPs were transplant patients Patients correct for the algorithm, but liver enzymes elevated due to procedure
Revision: exclude transplant patients
Lessons learned
Improved algorithm design given themes in FPs
Added exclusions Babies Overdose patients Patients who died Transplant patients
Lessons learned “A collaborative approach to develop an EHR phenotyping algorithm for DILI” in preparation
Lesson learned: Evaluator effectiveness influences ability to drawing appropriate inferences about algorithm performance
How does our evaluation approach influence performance estimations?
Moderate agreement among algorithm reviewers, so interpretation of PPV unclear
Revision: Improve evaluator approach
Lessons learned
Improved evaluator approach
Consensus among 4 reviewers
Assign TP/FP status by 1. First-pass review of temporal relationship Assign preliminary TP, FP, unknown status
2. Perform Chart review Confirm suspected TPs Assign TP/FP if unknown status in first pass
review
●
●●
●●●●
●●●●●●●●●●●●●●●
●
●●●●
●●●●●●●●
●●●●
●●●●●●●
●●●●●
●
●
●●●
●●●●●●●
●●●●●
●
●
●
●●●
●
●●●●●●
●● ●●
●●●●●●
●
● ● ●
● ●●
100
200
300
aph
valu
e
2012−0
3−15
2012−0
3−20
2012−0
3−25
2012−0
3−30
2012−0
4−05
2012−0
4−10
2012−0
4−15
2012−0
4−20
2012−0
4−26
2012−0
5−01
2012−0
5−06
2012−0
5−11
2012−0
5−17
2012−0
5−22
2012−0
5−27
2012−0
6−01
2012−0
6−07
2012−0
6−12
2012−0
6−17
2012−0
6−22
2012−0
6−28
2012−0
7−03
2012−0
7−08
2012−0
7−13
2012−0
7−19
●
●
●
●
●
●
●
●
●
●●●●●●●●●●●●●●●●●●
●
●●●●●
●
●
●
●
●●●
●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ● ●●●●●●●●● ●●●●●●●● ● ● ● ● ● ● ●0
200
400
alt v
alue
2012−0
3−15
2012−0
3−20
2012−0
3−25
2012−0
3−30
2012−0
4−05
2012−0
4−10
2012−0
4−15
2012−0
4−20
2012−0
4−26
2012−0
5−01
2012−0
5−06
2012−0
5−11
2012−0
5−17
2012−0
5−22
2012−0
5−27
2012−0
6−01
2012−0
6−07
2012−0
6−12
2012−0
6−17
2012−0
6−22
2012−0
6−28
2012−0
7−03
2012−0
7−08
2012−0
7−13
2012−0
7−19
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●●●
●●
●
●●●●
●●
●●
●
●●
●
●●
●
●●●
●
●
●
●
●●
●
●●●
●●
●●●
●
●
●
●●
●
●●
●
●
●
●
●●●
●●
●
●●
●
●●
●
●●●
●
●
●
●●●●
●●
●●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●●
●
●
●●
●
●●
●
●●
●●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●●●
●●
●
●●
●
●●
●
●●
●
●●●
●
●●
●●●
●●
●
●●
●
●●
●
●
●●
●
●●
●●●
●●●●●
●
●
●●
●●●●● ●
●●●●●●●●●●●●●
●
●●●●●●●●●
●●
●
●●
●
●●●
●
●●●
●●●●●
●●●0
12
34
bilir
ubin
IV v
alue
2012−0
3−15
2012−0
3−20
2012−0
3−25
2012−0
3−30
2012−0
4−05
2012−0
4−10
2012−0
4−15
2012−0
4−20
2012−0
4−26
2012−0
5−01
2012−0
5−06
2012−0
5−11
2012−0
5−17
2012−0
5−22
2012−0
5−27
2012−0
6−01
2012−0
6−07
2012−0
6−12
2012−0
6−17
2012−0
6−22
2012−0
6−28
2012−0
7−03
2012−0
7−08
2012−0
7−13
2012−0
7−19
Lessons learned “A collaborative approach to develop an EHR phenotyping algorithm for DILI” in preparation
Summary of findings
Lessons learned from applying our evaluation framework What’s correct for the algorithm may not be correct for the case definition (Are we
measuring what we mean to measure?) Evaluator effectiveness influences ability to draw appropriate inferences about algorithm
performance
Demonstrated that our evaluation framework is useful Informed improvements in algorithm design Informed improvements in evaluator approach Likely more useful for rare conditions
Demonstrated EHR phenotyping algorithm development is an iterative process Complexity of the algorithm may influence
Acknowledgments
Dr. Yufeng Shen - Serious Adverse Event Consortium collaborator
eMERGE collaborators Mount Sinai (Drs. Omri Gottesman, Erwin Bottinger, and Steve Ellis) Mayo Clinic (Drs. Jyotishman Pathak, Sean Murphy, Kevin Bruce, Stephanie Johnson,
Jay Talwalker, Christopher G. Chute, Iftikhar J. Kullo) Northwestern (Dr. Abel Kho) Vanderbilt (Dr. Josh Denny)
DILIN collaborator UNC-CH (Dr. Ashraf Farrag)
Columbia Training in Biomedical Informatics (NIH NLM #T15 LM007079)
The eMERGE network U01 HG006380-01 (Mount Sinai)
!
Questions?
Casey L. Overby casey@dbmi.columbia.edu
Measurement study Demonstration study
Qua
ntit
ativ
e re
sult
s Kappa coefficient: 0.50 TP: 27 FP: 42 NA: 30 PPV: TP/(TP+FP) = 39%
Qua
litat
ive
resu
lts
Perceptions of evaluation approach effectiveness: • Differences between evaluation
platforms • Visualizing lab values • Availability of notes
• Discharge summary vs. other notes
Perceptions of benefit of results (themes in FPs): • Babies • Patients who died • Overdose patients • Patients who had a liver transplant
!
DILI Case definition (iSAEC)
Design EHR Phenotyping
algorithm
Evaluate EHR Phenotyping
algorithm
Implement EHR Phenotyping
algorithm
Disseminate EHR Phenotyping
algorithm
Develop an evaluation framework
Report lessons learned