High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from...

47
High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical Informatics Program and Services Kern Center for the Science of Health Care Delivery Associate Professor of Biomedical Informatics Department of Health Sciences Research December 9 th , 2014 Quality Registers Symposium Stockholm, Sweden

Transcript of High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from...

Page 1: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

High-Throughput Phenotyping from Electronic Health Records for Research

Jyotishman Pathak, PhD Director, Clinical Informatics Program and Services Kern Center for the Science of Health Care Delivery Associate Professor of Biomedical Informatics Department of Health Sciences Research December 9th, 2014 Quality Registers Symposium Stockholm, Sweden

Page 2: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

DISCLOSURES

Relevant Financial Relationship(s) None

Off Label Usage

None

©2014 MFMER | slide-2

Page 3: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Electronic health records (EHRs) driven phenotyping

•  EHRs are becoming more and more prevalent within the U.S. healthcare system • Meaningful Use is one of the major drivers

• Overarching goal •  To develop high-throughput semi-automated

techniques and algorithms that operate on normalized EHR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings both retrospectively and prospectively

©2014 MFMER | slide-3

Page 4: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm ©2014 MFMER | slide-4

http://gwas.org [eMERGE Network]

Page 5: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

EHR-driven Phenotyping Algorithms - I

•  Typical components •  Billing and diagnoses codes •  Procedure codes •  Labs •  Medications •  Phenotype-specific co-variates (e.g., Demographics,

Vitals, Smoking Status, CASI scores) •  Pathology •  Radiology

• Organized into inclusion and exclusion criteria

©2014 MFMER | slide-5

[eMERGE Network]

Page 6: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Data Transform Transform

EHR-driven Phenotyping Algorithms - II

Phenotype Algorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings [eMERGE Network]

©2014 MFMER | slide-6

Page 7: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

No secondary causes (e.g., pregnancy, ablation)

No ICD-9s for Hypothyroidism

No Abnormal TSH/FT4

No  An&bodies  for  TTG/TPO  

ICD-­‐9s  for  Hypothyroidism

An&bodies for TTG or TPO (anti-thyroglobulin, anti-thyroperidase)

Abnormal TSH/FT4

No thyroid-altering medications (e.g., Phenytoin, Lithium)

Thyroid replace. meds

Case  1   Case  2  

No thyroid replace. meds

Control  

2+ non-acute visits in 3 yrs

No Hx of myasthenia gravis

Example: Hypothyroidism Algorithm

©2014 MFMER | slide-7

[Denny et al., ASHG, 2012; 89:529-542]

Page 8: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Example: Hypothyroidism Algorithm

©2012 MFMER | slide-8 [Conway et al. AMIA 2011: 274-83]

Page 9: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Example: Hypothyroidism Algorithm

©2012 MFMER | slide-9 [Conway et al. AMIA 2011: 274-83]

Drugs

Labs

Diagnosis

NLP

Proce- dures

Demo- graphics

Vitals

Page 10: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

Phenotyping Algorithms

Data Categories used to define EHR-driven Phenotyping Algorithms

Clinical gold standard

EHR-derived phenotype

Validation (PPV/NPV)

Sensitivity (Case/Cntrl)

Alzheimer’s Dementia

Demographics, clinical examination of mental status, histopathologic

examination

Diagnoses, medications 73%/55% 37.1%/99%

Cataracts Clinical exam finding

(Ophthalmologic examination)

Diagnoses, procedure codes 98%/98% 99.1%/93.6%

Peripheral Arterial Disease

Clinical exam finding (ankle-brachial index

or arteriography)

Diagnoses, procedure codes,

medications, radiology test

results

94%/99% 85.5%/81.6%

Type 2 Diabetes

Laboratory Tests Diagnoses,

laboratory tests, medications

98%/100% 100%/100%

Cardiac Conduction

ECG measurements ECG report results 97% (case only algorithm)

96.9% (case only algorithm) [eMERGE Network]

Page 11: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

0.5 5

Genotype-Phenotype Association Results

0.5 50.5 5.0 1.0

Odds Ratio

rs2200733 Chr. 4q25 rs10033464 Chr. 4q25 rs11805303 IL23R rs17234657 Chr. 5 rs1000113 Chr. 5 rs17221417 NOD2 rs2542151 PTPN22 rs3135388 DRB1*1501 rs2104286 IL2RA rs6897932 IL7RA rs6457617 Chr. 6 rs6679677 RSBN1 rs2476601 PTPN22 rs4506565 TCF7L2 rs12255372 TCF7L2 rs12243326 TCF7L2 rs10811661 CDKN2B rs8050136 FTO rs5219 KCNJ11 rs5215 KCNJ11 rs4402960 IGF2BP2

Atrial fibrillation

Crohn's disease

Multiple sclerosis

Rheumatoid arthritis

Type 2 diabetes

disease gene / region marker

2.0 [Ritchie et al. AJHG 2010; 86(4):560-72]

observed published

©2014 MFMER | slide-11

Page 12: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Key lessons learned from eMERGE •  Algorithm design and transportability

•  Non-trivial; requires significant expert involvement •  Highly iterative process •  Time-consuming manual chart reviews •  Representation of “phenotype logic” is critical

•  Standardized data access and representation •  Importance of unified vocabularies, data elements, and value sets •  Questionable reliability of ICD & CPT codes (e.g., billing the wrong

code since it is easier to find) •  Natural Language Processing (NLP) is critical

[Kho et al. Sc. Trans. Med 2011; 3(79): 1-7]

©2014 MFMER | slide-12

Page 13: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

SHARPn: Secondary Use of EHR Data

©2012 MFMER | slide-13

Page 14: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Data Transform Transform

Algorithm Development Process - Modified

Phenotype Algorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings

Semi-Automatic Execution

•  Standardized representation of clinical data

•  Create new and re-use existing clinical element models (CEMs)

•  Standardized and structured representation of phenotype definition criteria

•  Use the NQF Quality Data Model (QDM)

•  Conversion of structured phenotype criteria into executable queries

•  Use JBoss® Drools (DRLs)

[Welch et al., JBI 2012; 45(4):763-71] [Pathak et al., JAMIA 2013; 20(e2): e341-8]

©2014 MFMER | slide-14

Page 15: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Data Transform Transform

Algorithm Development Process - Modified

Phenotype Algorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings

Semi-Automatic Execution

•  Standardized representation of clinical data

•  Create new and re-use existing clinical element models (CEMs)

[Welch et al., JBI 2012; 45(4):763-71] [Pathak et al., JAMIA 2013; 20(e2): e341-8]

©2014 MFMER | slide-15

Page 16: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Clinical Element Models Higher-Order Structured Representations

©2014 MFMER | slide-16

[Stan Huff, IHC]

Page 17: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm [Stan Huff, IHC]

CEMs available for patient demographics, medications, lab measurements, procedures etc.

Page 18: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

SHARPn data normalization pipeline

©2014 MFMER | slide-18

Page 19: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm ©2014 MFMER | slide-19

[Welch et al., JBI 2012; 45(4):763-71] [Pathak et al., JAMIA 2013; 20(e2): e341-8]

Page 20: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Data Transform Transform

Algorithm Development Process - Modified

Phenotype Algorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings

Semi-Automatic Execution

•  Standardized representation of clinical data

•  Create new and re-use existing clinical element models (CEMs)

•  Standardized and structured representation of phenotype definition criteria

•  Use the NQF Quality Data Model (QDM)

[Welch et al., JBI 2012; 45(4):763-71] [Pathak et al., JAMIA 2013; 20(e2): e341-8]

©2014 MFMER | slide-20

Page 21: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Example algorithm: Hypothyroidism

©2012 MFMER | slide-21

Page 22: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

NQF Quality Data Model (QDM) •  Standard of the National Quality Forum (NQF)

•  A structure and grammar to represent quality measures and phenotype definitions in a standardized format

•  Groups of codes in a code set (ICD-9, etc.) •  "Diagnosis, Active: steroid induced diabetes" using

"steroid induced diabetes Value Set GROUPING (2.16.840.1.113883.3.464.0001.113)”

•  Supports temporality & sequences •  AND: "Procedure, Performed: eye exam" > 1 year(s)

starts before or during "Measurement end date"

•  Implemented as a set of XML schemas •  Links to standardized terminologies (ICD-9, ICD-10,

SNOMED-CT, CPT-4, LOINC, RxNorm etc.)

©2014 MFMER | slide-22

Page 23: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Example: Diabetes & Lipid Mgmt. - I

©2012 MFMER | slide-23

Page 24: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Example: Diabetes & Lipid Mgmt. - II

©2014 MFMER | slide-24

Human readable HTML

Page 25: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Example: Diabetes & Lipid Mgmt. - III

©2012 MFMER | slide-25

Page 26: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Example: Diabetes & Lipid Mgmt. - IV

©2014 MFMER | slide-26

Computer readable HQMF XML (based on HL7 v3 RIM)

Page 27: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Data Transform Transform

Algorithm Development Process - Modified

Phenotype Algorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings

Semi-Automatic Execution

•  Standardized representation of clinical data

•  Create new and re-use existing clinical element models (CEMs)

•  Standardized and structured representation of phenotype definition criteria

•  Use the NQF Quality Data Model (QDM)

•  Conversion of structured phenotype criteria into executable queries

•  Use JBoss® Drools (DRLs)

[Welch et al., JBI 2012; 45(4):763-71] [Pathak et al., JAMIA 2013; 20(e2): e341-8]

©2014 MFMER | slide-27

Page 28: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm ©2014 MFMER | slide-28

A Modular Architecture for Electronic Health Record-Driven Phenotyping Luke V. Rasmussen1; Richard C. Kiefer2, Huan Mo, MD, MS3, Peter Speltz3, William K. Thompson, PhD4,

Guoqian Jiang, MD, PhD2, Jennifer A. Pacheco1, Jie Xu, MS1, Qian Zhu, PhD5, Joshua C. Denny, MD, MS3, Enid Montague, PhD1, Jyotishman Pathak, PhD2

1Northwestern University, Chicago, IL; 2Mayo Clinic, Rochester, MN; 3Vanderbilt University, Nashville, TN;

4NorthShore University HealthSystem, Evanston, IL; 5University of Maryland Baltimore County, Baltimore, MD

Abstract Increasing interest in and experience with electronic health record (EHR)-driven phenotyping has yielded multiple challenges that are at present only partially addressed. Many solutions require the adoption of a single software platform, often with an additional cost of mapping existing patient and phenotypic data to multiple representations. We propose a set of guiding design principles and a modular software architecture to bridge the gap to a standardized phenotype representation, dissemination and execution. Ongoing development leveraging this proposed architecture has shown its ability to address existing limitations.

Introduction Electronic health record (EHR)-based phenotyping is a rich source of data for clinical and genomic research1. Interest in this area, however, has uncovered many challenges including the quality and semantics of the data collected, as well as the disparate modalities in which the information is stored across institutions and EHR systems2, 3. Significant effort has gone into addressing these challenges, and researchers have an increased understanding of the nuances of EHR data and the importance of multiple information sources such as diagnostic codes coupled with natural language processing (NLP)4, 5.

The process of developing a phenotype algorithm is complex – requiring a diverse team engaged in multiple iterations of hypothesis generation, data exploration, algorithm generation and validation 6. The technical tools used during these phases vary across teams and institutions, driven in part by availability, licensing costs, technical infrastructure supported at an institution, and personal preference. For example, initial feasibility queries may be conducted using a system like i2b27 followed by in-depth analysis using statistical software (e.g. SAS) or structured query language (SQL) statements directly against a database. A researcher may document the algorithm by listing steps in Microsoft Word, later converting it to an executable workflow using KNIME (https://www.knime.org/).

Teams creating algorithms need to be able to quickly explore EHR data using tools that they are intimately familiar and comfortable with. However, once an algorithm has been established, the ability to represent and share the algorithm with other institutions in a standardized way currently poses some limitations. While phenotype algorithms are currently published along with scientific publications or in repositories such as the Phenotype KnowledgeBase (PheKB, http://phekb.org), the representation is not standardized and executable at other institutions. Phenotypes from the electronic Medical Records and Genomics (eMERGE) network8 were represented as textual documents that needed to be interpreted by other institutions and adapted for local implementation9.

A solution first requires unambiguous representation of algorithm logic and semantically rich patient data. Next it needs an execution layer that combines the two, generating sharable and computable results. Finally it needs a repository to share phenotypes and execution results for collaborative research. In this work, we describe our approach towards creating a robust Phenotype Execution and Modeling Architecture (PhEMA; http://informatics.mayo.edu/PhEMA) to address these challenges.

Background Several system architectures and implementations are reported in the informatics literature for electronic phenotyping. One example is the i2b2 system, which leverages a service oriented architecture (SOA) to compose a system of individual “cells” 7. Each cell is responsible for a particular function (e.g. project management, terminology management, query execution), and interacts via established application programming interfaces (APIs). While i2b2 is not built on any formal standards, the broad adoption of the platform and its growing community has arguably made its database schema and APIs a “de facto” standard. The i2b2 system has been adopted by many institutions, and it benefits from an active community that has extended the system with new functionality via additional cells and other extensions. For example, the Eureka! platform includes a separate cohort authoring interface, which then feeds into an i2b2 instance for additional analysis10. Work has also been done to allow i2b2 to interoperate with the Health Quality Measure Format (HQMF), providing a more standardized option

Figure 1. Phenotype Execution and Modeling Architecture (PhEMA)

Discussion We propose PhEMA to support the multiple facets of EHR-based phenotyping. There are notable similarities to other architectures and systems, but also clear differences. For example, PhEMA leverages existing standards as much as possible for the interfaces between its components, while other systems primarily define their own. Also, we propose a comprehensive suite of components, including those not as formally defined in other systems (namely the Library and Validation). These components are necessary to produce standardized phenotype definitions that may be shared with other sites, and to verify their accuracy. The Validation system provides a way to define a gold standard against which the phenotype definition may be verified. This is important to not only provide feedback on correct use of the Authoring system, but to assist in identifying breaking changes if the phenotype is modified.

Software architectures themselves provide a critical blueprint for how a system may be developed, yet are not tangible. Architectures such as PhEMA benefit from a reference implementation for multiple reasons. First, the creation of actual software proves the validity of the proposed architecture. More importantly, it allows institutions to adopt those systems for their own use. It provides concrete source code showing the interoperability between components, aiding future developers in their own work. As described, a reference implementation is currently under development, based on these recommended approaches and standards. Ultimately this will include demonstrating adaptation of an existing system and the creation of a novel component. For example – linking the Execution environment to an existing i2b2 instance and an institutional data warehouse, and developing a new Authoring environment while also demonstrating the ability to process logic created by the MAT. We feel this approach is necessary to fully validate the modularity of the architecture, and to demonstrate the ability to use established systems with other novel components.

While PhEMA provides a reference implementation, the details of each component are ultimately left to the adopter. For example, a Library component may be implemented as a simple database, a shared portal like PheKB.org, or tied to a version control system. The Validation component may be one that uses a simulated patient chart, or may be tied to an actual EHR and chart abstraction system.

The architecture is intended to support the diverse team of phenotype authors, and may be leveraged at any point in the phenotyping process. At a minimum, we envision it will be used once a phenotype is fully defined and validated, and ready to be authored in a standardized manner. Our intent is to facilitate collaborative research by increasing the number of standardized phenotypes that may be shared and executed across institutions.

[Rasmussen et al. AMIA 2015]

Page 29: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

[Peterson AMIA 2014]

©2014 MFMER | slide-29

Scalable and High-Throughput Execution of Clinical Quality Measures fromElectronic Health Records using MapReduce and the JBoss R

� Drools Engine

Kevin J. Peterson, MS1,2 and Jyotishman Pathak, PhD1

1 Dept. of Health Sciences Research, Mayo Clinic, Rochester, MN2 Dept. of Computer Science & Engineering, University of Minnesota, Twin Cities, MN

ABSTRACT

Automated execution of electronic Clinical Quality Measures (eCQMs) from electronic health records (EHRs) on largepatient populations remains a significant challenge, and the testability, interoperability, and scalability of measure ex-ecution are critical. The High Throughput Phenotyping (HTP; http://phenotypeportal.org) project alignswith these goals by using the standards-based HL7 Health Quality Measures Format (HQMF) and Quality Data Model(QDM) for measure specification, as well as Common Terminology Services 2 (CTS2) for semantic interpretation. TheHQMF/QDM representation is automatically transformed into a JBoss R� Drools workflow, enabling horizontal scal-ability via clustering and MapReduce algorithms. Using Project Cypress, automated verification metrics can thenbe produced. Our results show linear scalability for nine executed 2014 Center for Medicare and Medicaid Services(CMS) eCQMs for eligible professionals and hospitals for >1,000,000 patients, and verified execution correctness of96.4% based on Project Cypress test data of 58 eCQMs.

1 INTRODUCTION

Secondary use of electronic health record (EHR) data is a broad domain that includes clinical quality measures, ob-servational cohorts, outcomes research and comparative effectiveness research. A common thread across all these usecases is the design, implementation, and execution of “EHR-driven phenotyping algorithms” for identifying patientsthat meet certain criteria for diseases, conditions and events of interest (e.g., Type 2 Diabetes), and the subsequentanalysis of the query results. The core principle for implementing and executing the phenotyping algorithms com-prises extracting and evaluating data from EHRs, including but not limited to, diagnosis, procedures, vitals, laboratoryvalues, medication use, and NLP-derived observations. While we have successfully demonstrated the applicabil-ity of such algorithms for clinical and translational research within the Electronic Medical Records and Genomics(eMERGE)[1] and Strategic Health IT Advance Research Project (SHARPn)[2, 3], an important aspect that has re-ceived limited attention is scalable and high-throughput execution of electronic Clinical Quality Measures (eCQMs)using EHR data. Briefly, eCQMs are Center for Medicare and Medicaid Services (CMS) defined quality measuresfor eligible professionals and eligible hospitals for use in the EHR Incentive program for electronic reporting[4]. Thee-specifications include the data elements, logic and definitions for that measure in an Health Level Seven (HL7)standard - Health Quality Measures Format (HQMF) - which represents a clinical quality measure as an electronicXML document modeled using the Quality Data Model (QDM). While significant advancements have been made inthe clarity of measure logic and coded value sets for the 2014 set of eCQMs for Meaningful Use (MU) Stage 2, thesemeasures often required human interpretation and translation into local queries in order to extract necessary data andproduce calculations. There have been two challenges in particular: (1) local data elements in an EHR may not benatively represented in a format consistent with HQMF/QDM including the required terminologies and value sets; and(2) an EHR typically does not natively have the capability to automatically consume and execute measure logic. Inother words, work has been needed locally to translate the logic (e.g., using an approach such as SQL) and to map localdata elements and codes (e.g., mapping an element using a proprietary code to SNOMED). This erodes the advantagesof the eMeasure approach, but has been the reality for many vendors and institutions in the first stage of MU[5].

In our prior work[3], we have devised solutions for EHR data normalization and standardization, including generationof structured data templates using natural language processing. In this work, we address the second challenge for scal-able and high-throughput execution of eCQMs using MapReduce, the open-source JBoss R� Drools Rules engine, andCommon Terminology Services 2 (CTS2)[6, 7]. Specifically, we have developed a set of architectural principles thatguide the functional and non-functional requirements of our proposed system that supports interoperability, testability,adaptability, and scalability for automated execution of eCQMs. To evaluate and test our system for performance,

Page 30: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Example: Initial Patient Population criteria for CMS eMeasure (CMS163V1)

©2012 MFMER | slide-30

Page 31: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm ©2014 MFMER | slide-31

Scalable and High-Throughput Execution of Clinical Quality Measures fromElectronic Health Records using MapReduce and the JBoss R

� Drools Engine

Kevin J. Peterson, MS1,2 and Jyotishman Pathak, PhD1

1 Dept. of Health Sciences Research, Mayo Clinic, Rochester, MN2 Dept. of Computer Science & Engineering, University of Minnesota, Twin Cities, MN

ABSTRACT

Automated execution of electronic Clinical Quality Measures (eCQMs) from electronic health records (EHRs) on largepatient populations remains a significant challenge, and the testability, interoperability, and scalability of measure ex-ecution are critical. The High Throughput Phenotyping (HTP; http://phenotypeportal.org) project alignswith these goals by using the standards-based HL7 Health Quality Measures Format (HQMF) and Quality Data Model(QDM) for measure specification, as well as Common Terminology Services 2 (CTS2) for semantic interpretation. TheHQMF/QDM representation is automatically transformed into a JBoss R� Drools workflow, enabling horizontal scal-ability via clustering and MapReduce algorithms. Using Project Cypress, automated verification metrics can thenbe produced. Our results show linear scalability for nine executed 2014 Center for Medicare and Medicaid Services(CMS) eCQMs for eligible professionals and hospitals for >1,000,000 patients, and verified execution correctness of96.4% based on Project Cypress test data of 58 eCQMs.

1 INTRODUCTION

Secondary use of electronic health record (EHR) data is a broad domain that includes clinical quality measures, ob-servational cohorts, outcomes research and comparative effectiveness research. A common thread across all these usecases is the design, implementation, and execution of “EHR-driven phenotyping algorithms” for identifying patientsthat meet certain criteria for diseases, conditions and events of interest (e.g., Type 2 Diabetes), and the subsequentanalysis of the query results. The core principle for implementing and executing the phenotyping algorithms com-prises extracting and evaluating data from EHRs, including but not limited to, diagnosis, procedures, vitals, laboratoryvalues, medication use, and NLP-derived observations. While we have successfully demonstrated the applicabil-ity of such algorithms for clinical and translational research within the Electronic Medical Records and Genomics(eMERGE)[1] and Strategic Health IT Advance Research Project (SHARPn)[2, 3], an important aspect that has re-ceived limited attention is scalable and high-throughput execution of electronic Clinical Quality Measures (eCQMs)using EHR data. Briefly, eCQMs are Center for Medicare and Medicaid Services (CMS) defined quality measuresfor eligible professionals and eligible hospitals for use in the EHR Incentive program for electronic reporting[4]. Thee-specifications include the data elements, logic and definitions for that measure in an Health Level Seven (HL7)standard - Health Quality Measures Format (HQMF) - which represents a clinical quality measure as an electronicXML document modeled using the Quality Data Model (QDM). While significant advancements have been made inthe clarity of measure logic and coded value sets for the 2014 set of eCQMs for Meaningful Use (MU) Stage 2, thesemeasures often required human interpretation and translation into local queries in order to extract necessary data andproduce calculations. There have been two challenges in particular: (1) local data elements in an EHR may not benatively represented in a format consistent with HQMF/QDM including the required terminologies and value sets; and(2) an EHR typically does not natively have the capability to automatically consume and execute measure logic. Inother words, work has been needed locally to translate the logic (e.g., using an approach such as SQL) and to map localdata elements and codes (e.g., mapping an element using a proprietary code to SNOMED). This erodes the advantagesof the eMeasure approach, but has been the reality for many vendors and institutions in the first stage of MU[5].

In our prior work[3], we have devised solutions for EHR data normalization and standardization, including generationof structured data templates using natural language processing. In this work, we address the second challenge for scal-able and high-throughput execution of eCQMs using MapReduce, the open-source JBoss R� Drools Rules engine, andCommon Terminology Services 2 (CTS2)[6, 7]. Specifically, we have developed a set of architectural principles thatguide the functional and non-functional requirements of our proposed system that supports interoperability, testability,adaptability, and scalability for automated execution of eCQMs. To evaluate and test our system for performance,

[Peterson AMIA 2014]

Page 32: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Validation using Project Cypress

©2014 MFMER | slide-32

[Peterson AMIA 2014]

Page 33: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

http://phenotypeportal.org [Endle et al., AMIA 2012]

Page 34: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm ©2014 MFMER | slide-34

http://api.phenotypeportal.org

Page 35: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

How is this research applied at Mayo? •  Transfusion related acute lung injury (Kor) •  Pharmacogenomics of depression (Weinshilboum) •  Pharmacogenomics of breast cancer (Wang) •  Pharmacogenomics of heart failure (Pereira) • Multi-morbidity in depression and heart failure

(Pathak) •  Lumbar image reporting with epidemiology

(Kallmes) •  Active surveillance for celiac disease (Murray) •  Phenome-wide association studies (Pathak)

©2014 MFMER | slide-35

Page 36: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

How is this research applied at Mayo? •  Transfusion related acute lung injury (Kor) •  Pharmacogenomics of depression (Weinshilboum) •  Pharmacogenomics of breast cancer (Wang) •  Pharmacogenomics of heart failure (Pereira) • Multi-morbidity in depression and heart failure

(Pathak) •  Lumbar image reporting with epidemiology

(Kallmes) •  Active surveillance for celiac disease (Murray) •  Phenome-wide association studies (Pathak)

©2014 MFMER | slide-36

Page 37: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Blood transfusion management

Transfusion-Related Acute Lung Injury (TRALI) Transfusion-Associated Circulatory Overload (TACO)

©2014 MFMER | slide-37

Page 38: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

[Clifford et al. Transfusion 2013]

©2014 MFMER | slide-38

Page 39: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

Missing reported TRALI/TACO cases

Page 40: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Missing reported TRALI/TACO cases

Of the 88 TRALI cases correctly identified by the CART algorithm, only 11 (12.5%) of these were reported to the blood bank by the clinical service.

Of the 45 TACO cases correctly identified by the CART algorithm, only 5 (11.1%) were reported to the blood bank by the clinical service.

[Clifford et al. Transfusion 2013]

©2014 MFMER | slide-40

Page 41: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Active surveillance for TRALI/TACO

©2012 MFMER | slide-41

TRALI/TACO “sniffers”

[Herasavich et al. Healthcare Info. 2011;1:42-45]

Page 42: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Active surveillance for TRALI/TACO

©2012 MFMER | slide-42

TRALI/TACO “sniffers”

[Herasavich et al. Healthcare Info. 2011;1:42-45]

Example of a data-driven learning health care system

Page 43: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Concluding remarks •  EHRs contain a wealth of phenotypes for clinical

and translational research

•  EHRs represent real-world data, and hence has challenges with interpretation, wrong diagnoses, and compliance with medications • Handling referral patients even more so

•  Standardization and normalization of clinical data and phenotype definitions is critical

•  Phenotyping algorithms are often transportable between multiple EHR settings •  Validation is an important component

©2012 MFMER | slide-43

Page 44: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

It takes a village…

©2014 MFMER | slide-44

Page 45: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

Acknowledgment •  Mayo Clinic eMERGE Phenotyping team

•  Christopher Chute, MD, DrPH •  Suzette Bielinski, PhD •  Mariza de Andrade, PhD •  John Heit, MD •  Hayan Jouni, MD •  Adnan Khan, MBBS •  Sunghwan Sohn, PhD •  Kevin Bruce •  Sean Murphy

•  Mayo Clinic SHARPn Phenotyping team •  Christopher Chute, MD, DrPH •  Dingcheng Li, PhD •  Cui Tao, PhD (now @ UTexas) •  Gyorgy Simon, PhD (now @ UMN) •  Craig Stancl •  Cory Endle •  Sahana Murthy •  Dale Suesse •  Kevin Peterson

©2012 MFMER | slide-45

•  PhEMA and DEPTH collaborators •  Joshua Denny, MD •  William Thompson, PhD •  Guoqian Jiang, MD, PhD •  Enid Montague, PhD •  Luke Rasmussen, MS •  Richard Kiefer •  Peter Speltz •  Jennifer Pachecho •  Huan Mo

•  CSHCD Clinical Informatics Program •  Daryl Kor, MD •  Maryam Panahiazhar, PhD •  Che Ngufor, PhD •  Dennis Murphree, PhD •  Sudhi Upadhyaya

Funding •  NIH R01 GM105688 (PI – Pathak): PheMA •  AHRQ R01 HS23077 (PI – Pathak): DEPTH •  NIH R01 GM103859 (PI – Pathak): PGx •  NIH R25 EB020381 (PI – Pathak): BDC4CM •  NIH U01 HG006379 (PI – Chute, Kullo): eMERGE •  ONC 90TR002 (PI – Chute): SHARPn •  Mayo Clinic Center for the Science of Healthcare Delivery (CSHCD)

Page 46: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

“Essentially, we’re going to be moving from an electronic medical record …

John Noseworthy, M.D. Mayo Clinic President and CEO

Spring 2010 [http://mayoweb.mayo.edu/mayo-today/QAjf10Moving.html]

which initially was just an electronic version of a paper record ...

©2012 MFMER | slide-46

Page 47: High-Throughput Phenotyping from Electronic Health Records ... · High-Throughput Phenotyping from Electronic Health Records for Research Jyotishman Pathak, PhD Director, Clinical

2014 Quality Registers Symposium, Stockholm

Thank You!

©2014 MFMER | slide-47

[email protected] http://jyotishman.info