D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets...

34
SALUS “Scalable, Standard based Interoperability Framework for Sustainable Proactive Post Market Safety Studies” SPECIFIC TARGETED RESEARCH PROJECT PRIORITY Objective ICT-2011.5.3b) Tools and environments enabling the re-use of electronic health records SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31, 2013 Actual Submission Date: May 31, 2013 Project Dates: Project Start Date : February 01, 2012 Project End Date : January 31, 2015 Project Duration : 36 months Deliverable Leader: UMC Project co-funded by the European Commission within the Seventh Framework Programme (2007-2013) Dissemination Level PU Public X PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services)

Transcript of D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets...

Page 1: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

SALUS “Scalable, Standard based Interoperability Framework for

Sustainable Proactive Post Market Safety Studies”

SPECIFIC TARGETED RESEARCH PROJECT PRIORITY Objective ICT-2011.5.3b) Tools and environments enabling the re-use of electronic health records

SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns

Due Date: May 31, 2013 Actual Submission Date: May 31, 2013 Project Dates: Project Start Date : February 01, 2012

Project End Date : January 31, 2015 Project Duration : 36 months

Deliverable Leader: UMC

Project co-funded by the European Commission within the Seventh Framework Programme (2007-2013)

Dissemination Level

PU Public X PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services)

Page 2: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 2 of 34

Document History: Version Date Changes From Review

v0.1 2013-04-15 Initial document UMC All

v0.2 2013-05-17 Section on Case Series Characterization added

SRDC All

v0.3 2013-05-29 Remaining section completed UMC All

v0.4 2013-05-30 Section added about the use of EHRs for Post Marketing Safety Studies and additional clarifications

SRDC All

V1.0 2013-05-31 Final deliverable UMC

Contributors(Benef.) Tomas Bergvall (UMC)

Hanna Lindroos (UMC) Suat Gonul (SRDC) Gokce Banu Laleci Erturkmen (SRDC)

ResponsibleAuthor Tomas Bergvall Email [email protected]

Beneficiary UMC Phone +46-18-656060

Page 3: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 3 of 34

SALUS Consortium Contacts:

Beneficiary Name Phone Fax E-Mail SRDC Gokce Banu Laleci

Erturkmen +90-312-2101763 +90(312)2101837 [email protected]

EUROREC Georges De Moor +32-9-2101161 +32-9-3313350 [email protected] UMC Niklas Norén +4618656060 +46 18 65 60 80 [email protected] OFFIS Wilfried Thoben

+49-441-9722131

+49-441-9722111

[email protected]

AGFA Dirk Colaert +32-3-4448408 +32 3 444 8401 [email protected] ERS Gerard Freriks +31 620347088 +31 847371789 [email protected] LISPA Alberto Daprà +390239331605 +39 02 39331207 [email protected] INSERM Marie-Christine Jaulent +33142346983 +33153109201 marie-

[email protected] TUD Peter Schwarz +49 351 458 2715 +49 351 458 7319 Peter.Schwarz@uniklinikum-

dresden.de ROCHE Jamie Robinson +41-61-687 9433 +41 61 68 88412 [email protected]

Page 4: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 4 of 34

EXECUTIVE SUMMARY The purpose of this document is to describe the analytical framework proposed for signal detection and evaluation based on the data sources available through SALUS. Five different tools towards this purpose were envisioned and will be described in detail in this document. The Case Series Characterization (CSC) tool enables the user to contrast the characteristics of a population e.g. the patients experiencing an adverse drug reaction (ADR), labelled as cases, to a background population not having the ADR to find potential explaining factors why the cases experience the ADR. The Temporal Association Screening (TAS) tool enables broad scale screening for signals of the electronic health record (EHR) data available through SALUS. A statistical measure is used as a threshold for what can be suspected to be a causal association between a drug and a potential ADR. The Temporal Pattern Characterization (TPC) tool gives the users a visual representation of the temporal pattern of a specific drug and event in the patient population. The visual representation is called a chronograph and is especially good for the detection of potential confounding factor like the indication for treatment. The Patient History tool is useful when the suspicion of a potential signal is strong and the user wants to see if there are any other confounding factors for specific patients that were not detected using the summarized statistics. The patient history is represented as a simple line listing of the drugs, lab tests, events and demographic information from the underlying data source. We will also make the EHR data available for secondary use for Post Marketing safety studies were inclusion and exclusion criteria will be defined to extract relevant information from the EHR sources to enable observational studies to be performed.

Page 5: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 5 of 34

TABLE OF CONTENTS EXECUTIVE SUMMARY ..................................................................................................................... 4  TABLE OF CONTENTS ........................................................................................................................ 5  1   PURPOSE ...................................................................................................................................... 5  2   REFERENCE DOCUMENTS ....................................................................................................... 6  

2.1   Definitions and Acronyms ...................................................................................................... 6  3   Introduction .................................................................................................................................... 6  

3.1   What is a signal? ..................................................................................................................... 7  3.2   Methods for signal detection using electronic health records ................................................. 7  3.3   Signal detection and evaluation using electronic health records ............................................ 8  3.4   OMOP Common Data Model (CDM) .................................................................................... 8  

3.4.1   Background ..................................................................................................................... 8  3.4.2   CDM data domains ......................................................................................................... 9  3.4.3   Details of the tables used in SALUS safety analysis tool ............................................. 10  3.4.4   Data Model for Standard Vocabulary of OMOP .......................................................... 12  

3.5   SALUS Interoperability architecture .................................................................................... 12  4   SALUS Safety analysis tools ....................................................................................................... 13  

4.1   Case Series Characterization tool ......................................................................................... 13  4.1.1   Use case ......................................................................................................................... 13  4.1.2   Workflow ...................................................................................................................... 13  4.1.3   Implementation ............................................................................................................. 22  

4.2   Temporal Association Screening tool ................................................................................... 23  4.2.1   Background ................................................................................................................... 24  4.2.2   Use case ......................................................................................................................... 25  4.2.3   Workflow ...................................................................................................................... 25  

4.3   Temporal Pattern Characterization tool ................................................................................ 25  4.3.2   Use case ......................................................................................................................... 26  4.3.3   Workflow ...................................................................................................................... 27  

4.4   Patient History tool ............................................................................................................... 28  4.4.1   Use case ......................................................................................................................... 28  4.4.2   Workflow ...................................................................................................................... 28  

4.5   Using EHRs as secondary use data sources for Post Marketing safety studies .................... 29  4.5.1   Use case ......................................................................................................................... 29  4.5.2   Workflow ...................................................................................................................... 29  

5   References .................................................................................................................................... 30  6   Appendix ...................................................................................................................................... 31  

6.1   Description of the RESTful API created for the Temporal Association Screening and Temporal Pattern Characterization tools. ......................................................................................... 31  

1 PURPOSE

Within task 6.2, the SALUS project will enable exploratory analysis and signal qualification studies of hypothesis generated either from SALUS or other data sources like VigiBaseTM on the traces of electronic health records (EHRs), cumulated in the SALUS registry. To find interesting patterns in the available data, algorithms will be executed and visualizations tools will be developed to help researchers find unusual behaviour deviating from expected effects of drugs for post marketing safety studies. The purpose of deliverable 6.2.1 is to provide a toolset for enabling signal detection on EHRs based on temporal patterns.

Page 6: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 6 of 34

2 REFERENCE DOCUMENTS

The following documents were used or referenced in the development of this document:

• SALUS D3.3.1 Requirement Specification of the SALUS Architecture • SALUS D3.4.1 Conceptual Design of the SALUS Architecture • SALUS D4.3.1-SALUS Harmonized Ontology for Post Market Safety Studies –R1 • SALUS D4.4.1-SALUS Semantic Mediation Framework–R1 • SALUS D8.1.1 Pilot Application Scenario and Requirement Specifications • OMOP CDM Specification

2.1 Definitions and Acronyms Table 1 List of Abbreviations and Acronyms

Abbreviation/ Acronym DEFINITION

ADE Adverse Drug Event ADR Adverse Drug Reaction CDM Common Data Model CIM Common Information Model CSC Case Series Characterization

DWH Data Warehouse EHR Electronic Health Record

IC Information Component ICSR Individual Case Safety Report OE Observed-to-expected

OMOP The Observational Medical Outcomes Partnership project (http://omop.fnih.org/) PCP Primary Care Physician

PROTECT The Pharmacoepidemiological Research on Outcomes of Therapeutics by a European Consortium (http://www.imi-protect.eu/)

REST Representational State Transfer TAS Temporal Association Screening TPC Temporal Pattern Characterization

3 INTRODUCTION Signals are the reported information that constitutes the possible relationship between a drug and an adverse event. The method used depends on the output of interest. Signal detection in EHRs is a relatively new area but there are few different methods for signal detection analysis on EHR data sources. The SALUS project is exploring the possibilities for broad scale signal detection and screening of EHRs. In SALUS Project, the statistical methods for signal detection and screening of EHRs will run on the common data model (CDM) proposed by Observational Medical Outcomes Partnership (OMOP)i as a common format for all electronic records. It should be noted that the SALUS Pilot sites that act as EHR Sources, TUD and LISPA, does not serve the EHR data in OMOP CDM Model. It is the role of SALUS Semantic Interoperability Layer and Technical Interoperability

i http://omop.fnih.org/

Page 7: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 7 of 34

Layer to seamlessly collect the required population data and transform it to OMOP CDM Model as explained in Section 3.5 briefly.

3.1 What is a signal? The term signal is commonly used in the Pharmacovigilance community and there are multiple definitions in use today. In 2002 the World Health Organization1 defines a signal as: “Reported information on a possible causal relationship between an adverse event and a drug, the relationship being unknown or incompletely documented previously. Usually more than a single report is required to generate a signal, depending upon the seriousness of the event and the quality of the information.” Working Group VIII of the Council for International Organizations of Medical Sciences (CIOMS VIII)2 defines a signal as: “Information that arises from one or multiple sources (including observations and experiments), which suggests a new potentially causal association, or a new aspect of a known association, between an intervention and an event or set of related events, either adverse or beneficial, that is judged to be of sufficient likelihood to justify verificatory action.” The main message being that what the pharmacovigilance community is searching for are unknown side effects to drugs. An overview of the processes around signal detection at UMC can be found in deliverable 8.1.1 - “Pilot Application Scenario and Requirement Specifications of the Pilot Application”.

3.2 Methods for signal detection using electronic health records A number of different methods exist for signal detection analysis on EHR data sources [Madigan 2010]. One of the main differences between the methods is the aim of the study. If the study focuses on a few drugs and conditions most often an epidemiological study can be designed tailor-made for that purpose. However, if the aim is to screen the data for unknown potential signals a generic method needs to be used. Such generic methods can be divided into these groups: Disproportionality-based The basis for signal detection on spontaneous reports use methods based on disproportionality analysis e.g. Empirical Bayes Geometric Mean (EBGM), Information Component (IC), Reporting Odds Ratio (ROR), Proportional Reporting Ratio (PRR) etc. These methods can be adjusted to work on EHR data. The benefit of using them on EHR data is that the denominator can be calculated without using approximations since e.g. the information on the number of prescriptions in the population exist in EHR data. Cohort-based Cohort-based studies begin with a group of people (a cohort) free of disease. The people in the cohort are grouped by whether or not they are exposed to a potential cause of disease. The whole cohort is followed over time to see if the development of new cases of the disease (or other outcome) differs between the groups with and without exposure. Case-based A case-based study begins with the selection of cases (people with a disease) and controls (people without the disease). The controls should represent people who would have been study cases if they had developed the disease (population at risk). The controls are often matched to the cases by some similarity measure e.g. propensity scores. Examples of open-source implementations of these methods can be found at http://omop.fnih.org/MethodsLibrary. These methods were developed as part of the Observational

Page 8: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 8 of 34

Medical Outcomes Partnership (OMOP) and have been evaluated for performance on the data sets available in that project.

3.3 Signal detection and evaluation using electronic health records The field of signal detection and evaluation in EHRs is currently in its infancy and only one product (SafetyWorks by United Biosource Corporation, http://unitedbiosource.com/scientific/safety/software/saeftyworks.aspx) exists for broad scale screening of EHRs. Ongoing activities by UMC, OMOP and Sentinel Initiativeii are all working towards the goal of broad scale signal detection and screening of EHRs with prototype implementations under development. One of the main benefits of having such a system in place on the SALUS architecture would be the scalability potential of the architecture having millions of patient lives available for screening which could detect even very rare adverse drug reaction not possible with current standards.

3.4 OMOP Common Data Model (CDM) The information in chapter 3.4 is modified from Observational Medical Outcomes Partnership Common Data Model Specifications, Version 4.0. http://omop.fnih.org/CDMvocabV4.  

3.4.1 Background No single observational data source can meet all expected outcome analysis needs, so there is a demand for assessing and analyzing multiple data sources concurrently. The OMOP Common Data Model (CDM) can be used for that purpose to standardize the data format used by generating a separate CDM instance for each source dataset. The CDM needs to support research to identify and evaluate associations between interventions (drug exposure, procedures, healthcare policy changes etc.) and outcomes caused by these interventions (condition occurrences, procedures, drug exposure etc.). The CDM is designed to store observational data under the following principles:

• Data protection. The CDM aims at providing data storage optimal for analysis. In addition, all data that might jeopardize the identity and protection of patients, such as names, precise birthdays etc. are limited. Exceptions are possible where the research expressly requires more detailed information, such as precise birth dates for the study of infants.

• Reuse of existing models. In designing the CDM, industry-leading data modeling efforts are

leveraged, such as HL7 RIM, the HIMSS EHR Definitional Model, the i2b2 Hive framework, the HMORN Virtual Data Warehouse, etc.

• Design of domains. The domains are modeled in a person-centric relational data model,

where for each record the identity of the person and a date is captured as a minimum.

• Standard vocabulary. To standardize the content of those records, the CDM relies on a Standard Vocabulary containing all necessary and appropriate corresponding standard healthcare concepts.

• Technology neutrality. The CDM does not require a specific technology. It can be realized

in any relational database, such as Oracle, MySQL etc., or as SAS analytical datasets.

ii http://www.fda.gov/safety/FDAsSentinelInitiative/ucm2007250.htm

Page 9: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 9 of 34

• Scalability. The CDM is optimized for data processing and computational analysis to

accommodate data sources that vary in size, up to and including databases with tens of millions of persons and billions of clinical observations.

3.4.2 CDM data domains The CDM includes all observational data elements that are relevant for the identification of demographic information, health care interventions and outcomes. These data domains are comprised of the following (Table 2): Table 2-OMOP CDM data domains

Page 10: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 10 of 34

Figure 1-The CDM conceptual model with interrelations of the CDM data domains. The light red fields will be used by the SALUS safety analysis tool.

3.4.3 Details of the tables used in SALUS safety analysis tool The CDM defines table structures for each of the data in a Person and Provider-centric model. Almost all tables have foreign keys into the Person table and a date. This allows for a longitudinal view on all the healthcare-relevant events. In addition, Providers carrying out healthcare are linked to many of the events as well. Both are linked to healthcare organizations (hospitals, independent physician associations), care sites (doctor's offices, hospital departments etc.) and physical locations (addresses).

3.4.3.1 Person The Person table is one of the basic four mandatory dimensions of analysis, and when combined with the Drug Exposure, Condition, Observation, and Procedure entities, presents the framework for active drug surveillance. The source data for the Person table comes from person demographics data that will be de-identified to comply with the Design Principles. Accordingly, the precise date of birth will only be stored if other measures are taken to protect the patient information. Only the year of birth is mandatory, and no identifiers are stored that could be used to re-identify the Person data.

3.4.3.2 Drug Exposure Drug Exposure contains individual records that reflect drug utilization from within the source data. Indicators of Drug Exposure include drug details, drug quantity, number of days supply, period of exposure, and prescription refill data. A Drug Type is assigned to each Drug Exposure to track from what source the data were drawn or inferred and if possible, the visit in which the drug was prescribed or delivered is recorded through a reference to the visit table. As a minimum, the Person ID, Drug Concept ID, Start Date and Drug Type need to be available for a valid record.

3.4.3.3 Drug Era A Drug Era is defined as a span of time when the Person is assumed to be exposed to a particular drug. A Drug Era is not the same as a Drug Exposure: Exposures are individual records corresponding

Page 11: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 11 of 34

to the source when drug was delivered to the Person, while successive periods of Drug Exposures are combined under certain rules to produce continuous Drug Eras. Each Drug Era corresponds to one or many Drug Exposures that form a continuous interval. The Drug Era Start Date is the start date of the first Drug Exposure and the Drug Era End Date is the end date of the last Drug Exposure.

3.4.3.4 Condition Occurrence Condition Occurrences record individual instances of the conditions suffered by Persons as extracted from source data. As a minimum, the Person ID, Condition Concept ID, Start Date and Condition Type need to be available for a valid record.

3.4.3.5 Condition Era Similar to Drug Eras, Condition Eras are chronological periods of Condition Occurrence. Combining individual Condition Occurrences into a single Condition Era serves two purposes:

• It allows aggregation of chronic conditions that require frequent ongoing care, instead of treating each Condition Occurrence as an independent event.

• It allows aggregation of multiple, closely timed doctor visits for the same condition to avoid double-counting the Condition Occurrences.

For example, consider a Person who visits his Primary Care Physician (PCP) and who is diagnosed leading to a referral to a specialist. One week later, the Person visits the specialist, who confirms the PCP’s diagnosis and provides the appropriate treatment to resolve the condition. These two independent doctor visits should be aggregated into one Condition Era. Persistence Windows can be applied for periods of time between the end date of the last and the start date of the following occurrence. OMOP uses Persistence Windows of 30 days. A Condition Era represents the span of time for which a Person has an episode of care for a given condition. An example is illustrated graphically in Figure 2: Condition Era Examples. A Person who has been diagnosed with Condition A and Condition B, with Condition A four times (A1, A2, A3, A4), and with Condition B twice (B1, B2).

Figure 2-Condition Era Examples. A Person who has been diagnosed with Condition A and Condition B, with Condition A four times (A1, A2, A3, A4), and with Condition B twice (B1, B2).

Page 12: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 12 of 34

3.4.3.6 Procedure Occurrence Procedure occurrences record individual instances of procedures performed on Persons extracted from the source data. Procedure Occurrences are recorded for each procedure performed on a Person. If possible, the visit in which the procedure was performed is recorded through a reference to the visit table. As a minimum, the Person ID, the Procedure Concept ID, the Procedure Type Concept ID and the Date need to be available for a valid record.

3.4.3.7 Observation Period The Observation Period table is designed to capture the time intervals in which data are being recorded for the Person. An Observation Period is the span of time when a Person is expected to have the potential of Drug and Condition information recorded. For claims data, observation periods are equivalent to enrollment periods to a plan.

3.4.4 Data Model for Standard Vocabulary of OMOP The Standard Vocabulary is a semantic network containing all of the Concepts, Concept-to-Concept Relationships and other metadata necessary to describe the meanings and structures of the data within the CDM. The Vocabulary will accommodate Concepts for each of the entities of interest relative to drugs, conditions, procedures, visits, demographics, etc. The Conceptual data for the OMOP Vocabulary is a standardized format designed to integrate and standardize terminologies for observational analysis.

3.5 SALUS Interoperability architecture Within the scope of SALUS Project, we aim to enable querying and subscription of subsets of medical summaries from EHR Systems and supporting data warehouses, and provide these collected medical data sets to specialized SALUS Pilot applications for running Adverse Drug Event (ADE) notification, Individual Case Safety Report (ICSR) reporting tools and drug safety analysis methods such as Case Series Characterization, Temporal Pattern Characterization and Temporal Association Screening. While collecting the medical summaries from underlying EHR Systems, we have chosen to comply with well-defined EHR interface standards, namely HL7 Clinical Document Architecture Release 2 (CDA) based templates, and ISO/CEN EN 13606 EHRExtract based archetypes and templates. On top of this, we also allow EHR Systems to open up SPARQL endpoints to expose anonymized medical data sets. On the research side, each of these applications and methods presented above may require to retrieve medical data sets in different formats. Based on our initial analysis in D8.1.1, Temporal Pattern Characterization, Temporal Association Screening and Patient History tools prefer to retrieve data in conformance to Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)iii, while ICSR Reporting tool will produce case safety reports in E2B(R2)iv specifications along with local models like the ICSR template provided by Italian Medicines Agency (AIFA). SALUS Interoperability Layer composed of Semantic Interoperability Layer (SIL) and Technical Interoperability Layer (TIL) allows the query and collection of population data from the underlying EHR sources through the native interfaces supported by these EHRs systems and translate the collected data to the preferred model to be consumed by safety analysis methods. The details of iii http://omop.fnih.org/CDMvocabV4 iv ICH guideline E2B (R2), Electronic transmission of individual case safety reports - Message specification (ICH ICSR DTD Version 2.1), Final Version 2.3, Document Revision Feb. 1, 2001.

Page 13: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 13 of 34

SALUS SIL are described in SALUS D4.4.1, while details of SALUS TIL are described in SALUS D5.1.1 and 5.2.1. SALUS SIL is responsible for semantically mediating the data collected in source formats to the selected target formats. Instead of defining n*n mappings between source and target models, a hub-and-spoke model is followed: the mediation is achieved through the use of the Common Information Model (CIM) as the hub model. Details of SALUS CIM are described in SALUS D4.3.1. In the context of this deliverable, SALUS SIL is responsible for collecting the data from EHR Sources in SALUS Project, namely: LISPA (A Regional Health Data Warehouse (DWH) is maintained in Lombardy Region in Italy) and TUD (EHR system at Uniklinikum Dresden (UKD)), and translating the collected data first to SALUS CIM Model, then to the OMOP CDM model so that TPC and TAS methods can readily run on these population data. In fact through this mechanism, it readily becomes possible to run the other open source safety analysis methods defined in OMOP project on top of the SALUS EHR sources.

4 SALUS SAFETY ANALYSIS TOOLS

4.1 Case Series Characterization tool The Case Series Characterization Tool (CSCT) enables the query of data sources for EHR extracts of selected patient populations to characterize ADE cases that originate from SALUS EHR data and compares the statistics against a custom background population. It provides a graphical interface to identify the eligibility criteria of the selected patient populations, and also to list the required statistical comparisons between the collected data and background population. It also provides a graphical interface to present the resulting statistics to the safety analyst.

4.1.1 Use case The basis for the CSCT is to enable the safety analyst to contrast a patient group of interest against another group of patients. A typical question to answer would be “What differs between the patients having a myocardial infarction within two weeks of Nifedipine intake to all the other patients taking Nifedipine?” or “What differs between the patients having a myocardial infarction within two weeks of Nifedipine intake that were reported as adverse drug reaction to the other patients taking Nifedipine and having a myocardial infarction within two weeks that were not reported as adverse reactions?”

4.1.2 Workflow The Figure 3 depicts the overall architecture of the case series characterization scenario. We will not go into the details of the whole system as depicted in the Figure 3 since it is not the aim of this document, but focus on the CSCT related parts i.e. the CSCT itself and interaction between the CSCT and Safety Analysis Query Manager.

Page 14: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 14 of 34

Figure 3-Case Series Characterization workflow

4.1.2.1 Eligibility Query Preparation The scenario starts with the requirement that the safety analyst wants to use the SALUS CSCT, to investigate the relation between a medication and a medical event: he wants to compare the specified foreground (e.g. using a medication of interest and having a condition of interest) with the selected background population (e.g. other patients on the same drug or other patients with the same disease). The CSCT enables the user to define detailed inclusion/exclusion criteria for both background and foreground populations as depicted in Figure 4 and Figure 5.

Case  Series  Characterization  

Tool

SIL-­‐DS   (LISPA)

EHR  RDF  Service

TIDSQS

LISPA  Connector

LISPA  SALUS  DWH

TUD  SPARQL  Endpoint

TUD  ORBIS  System

Query  Result  Calculator

1 Safety  Analysis  

Query  Manager

2

3

4 5

6

Page 15: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 15 of 34

Figure 4-Background selection

4.1.2.1.1 Salus CIM Query Format We have used the Health Quality Measures Format (HQMF)v as a base of SALUS Common Information Model (CIM) query format to enable the construction of queries to retrieve a population of patients. HQMF allows grouping different criterions within criteria groups by linking them with logical operators and temporal relationships, so does the SALUS CIM query format. We have extended the SALUS CIM ontology with new data elements to be used to construct eligibility queries. We have also reused some of the existing data elements which are already defined in the SALUS CIM. It is possible to define eligibility criterions on the following clinical statements, all of which are SALUS CIM classes. Definitions of these clinical statements can be found in the Deliverable 4.3.1 - Harmonized Ontology for Post Market Safety Studies. We have also listed the properties of these clinical statements that can be used to construct an eligibility query.

• patient: o gender o dateOfBirth

• Condition: o problemCode o problemStatus o problemSeverity o problemDate.low (Start date) o problemDate.high (End date)

• Medication: o medicationInformation.codedActiveIngredient o medicationInformation.codedProductName

v http://www.hl7.org/implement/standards/product_brief.cfm?product_id=97

Page 16: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 16 of 34

o medicationInformation.codedBrandName o dose.value o dose.unit o productForm o indicateMedicationStartStop.low (Start date) o indicateMedicationStartStop.high (End date)

• Procedure: o procedureType o procedureStatus o procedureDateTime (interval is possible with low & high again)

• Result: o resultType o resultValue o resultDateTime (interval is possible with low & high again)

• VitalSign [Result class again]: o resultType o resultValue o resultDateTime (interval is possible with low & high again)

• SocialHistory: o socialHistoryObservationCode o socialHistoryObservationValue o socialHistoryDate.low (Start date) o socialHistoryDate.high (End date)

• Encounter o encounterDateTime.low (Start date) o encounterDateTime.high (End date)

HQMF Standard also introduces a complete set of temporal constraints. However, in the first version of the CSCT, we are using the following subset of these constraints:

• SAS (starts after start of) • SBS (starts before start of) • SAE (starts after end of) • SBE (starts before end of)

4.1.2.1.2 Populating the Foreground and Background Criteria The safety analyst performs the following steps to populate the foreground and background queries depicted in the Figure 4 and Figure 5 respectively: For the background population:

• S/he chooses "nifedipine" 5th level term (code C08CA05) from the ATC terminology system as the active ingredient code of a medication by using the typeahead search facility of the tool (which is integrated with our terminology server)

For the foreground population:

• S/he chooses "Myocardial infarction" Preferred Term (PT) (code 10028596) from the MedDRA terminology system as the problem

• S/he chooses "nifedipine" 5th level term (code C08CA05) from the ATC terminology system as the active ingredient code of a medication by using the typeahead search facility of the tool

• S/he adds a temporal constraint between the medication and condition statements s/he just created, for stating that the condition shall occur within two weeks after the medication

Page 17: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 17 of 34

Figure 5-Foreground selection

As explained previously, the query sub-model of the SALUS CIM ontology is created by benefiting from HL7 HQMF as a declarative means to express eligibility criteria. We can see in Figure 6 that, the input provided by the safety analyst through the CSCT GUI is transformed into a formal query representation. There are two criterions each referring to a clinical statement: one salus:Medication instance with active ingredient code referring to nifedipine from the ATC terminology system and one salus:Condition instance with problem code referring to myocardial infarction from the MedDRA terminology system. There is a temporal relation defined from the salus:Condition to the salus:Medication, indicating that the myocardial infarction shall "start after start of" (typeCode: SAS) nifedipine intake, with an allowed time interval of two weeks. Finally, these two criterions are grouped with an AND operator inside a criteria group, and assigned to the inclusion criteria at the beginning of Figure 6.

Page 18: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 18 of 34

Figure 6-N3 representation of the example eligibility query

4.1.2.1.3 Statistical Configurations for the Patient Population Apart from defining the inclusion/exclusion criteria, the safety analyst also chooses the statistics to be checked for stratifying datasets of the selected foreground and selected background populations among the following criteria shown in Figure 7.

• Age and gender distribution • Country of origin • Common medications/events prior to/after the medication/medical event of interest

Page 19: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 19 of 34

Figure 7-Statistical configurations for the results

4.1.2.1.4 Risk Factor Definition Apart from the common conditions, and drugs, the safety analyst may also wish to compare the presence of some specific conditions, which are the risk factors of the selected conditions in inclusion criteria of foreground and background population as shown in Figure 8. Risk factors are defined in the same way as defining the foreground and background queries. The analyst chooses a suitable clinical statement type e.g. condition, social history, etc from the left panel and populates it with the required constraints. In our example, the safety analyst defines a risk factor for myocardial infarction again from the Preferred Term (PT) level of the MedDRA by choosing the "Diabetes mellitus" (code 10012601) via the typeahead search of the CSCT.

Page 20: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 20 of 34

Figure 8-Risk factor selection

Once the analyst is done with the preparation of the eligibility queries s/he initiates the process by pressing the "Run Query" button as shown in the Figure 8 and the query is delegated to the Safety Analysis Query Manager.

4.1.2.2 Communication with the Safety Analysis Query Manager The CSCT communicates with the Safety Analysis Query Manager (SAQM) through RESTful services. The CSCT initiates concurrent RESTful calls for background and foreground populations, since they are obtained with separate queries. These calls are further divided for different EHR sources by the SAQM. This means that, in the current SALUS architecture 4 calls are initiated for the following populations to be executed in parallel:

• LISPA Background population • LISPA Foreground population • TUD Background population • TUD Foreground population

The CSCT sends a SAQMCSCRequest object to the RESTful service of the SAQM containing the following parameters:

• Eligibility criteria • Criteria source indicating whether the eligibility criteria are for the foreground or for the

background population. • Statistical options to be processed by the Query Result Calculator to apply statistical safety

analysis queries on the data sets once they are collected from the EHR sources. • Risk factors to be processed by the Query Result Calculator to apply statistical safety analysis

queries on the data sets once they are collected from the EHR sources. Before sending to the SAQM, a SAQMCSCRequest object is serialized into JSON or RDF/TURTLE format which is compatible with the SALUS CIM query format and once retrieved, it is deserialized by the SAQM to be processed. SAQM delegates the query to the rest of the SALUS system i.e. the Semantic Interoperability Layer- Data Service (SIL-DS) to obtain the patient data. SIL-DS accepts queries in SALUS CIM query

Page 21: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 21 of 34

format. It converts the query which can be processed by the EHR sources and conveys it to the EHR source itself. For instance, for the TUD endpoint the query is converted into SPARQL, on the other hand for the LISPA endpoint it is converted into the QEDExt format. Once the query is processed by the EHR sources, SIL-DS converts the result into SALUS CIM format and returns them to the client i.e. the SAQM. Details of the SIL-DS component can be found in the Deliverable 4.4.1 - Semantic Mediation Framework. Once the patient data from the SIL-DS is obtained, it is passed to the Query Result Calculator. This component applies additional calculations (i.e. calculates the gender distribution or calculates the common conditions after the medication of interest) on the retrieved patient population based on the given statistical options. The query result calculation operations are realized on the patient data obtained from each of the EHR sources. These steps are also explained in detail in the Deliverable 4.4.1 - Semantic Mediation Framework. The results of each query initiated by the CSCT are passed to the CSCT in JSON format in separate attempts to be merged by the CSCT. The object carrying the results is named as ResultItem. An instance of ResultItem object includes the following pieces of information as shown in Figure 9:

• A display name for the result calculated for a certain statistical option • A percentage indicating the occurrence of the result in the target population (either

foreground or background). • The number of patients associated with a certain result.

Figure 9-Details for a ResultItem object

4.1.2.3 Result Presentation For each of the statistics options, which are presented in the figure in Figure 7, which the analyst had chosen via the CSCT GUI, a ResultItem is presented in the result page as show in Figure 10.

Page 22: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 22 of 34

Figure 10-Aggregated Representation of ResultItem objects

4.1.3 Implementation In this section, technical details regarding to the CSCT implementation will be given. The CSCT is a multi-module project and composed of two main modules. Below, these modules are introduced briefly, however there will be more details about them in the Section 4.1.3.2 and Section 4.1.3.3

• Core Module: This module contains the CSCT specific model classes, database related classes and classes to access to the SAQM.

• Web Module: This module is a web application deploying the GUI and RESTful services of the CSCT.

4.1.3.1 Build & Run Since we use Maven to organize and compile different modules throughout the SALUS Project, each module of the CSCT is also a Maven project. So, it is enough to compile the CSCT modules with the following Maven command: mvn install Once the compilation is done, the web module is compiled as a war file which should be run on an application server. In the current setup, we are using the Embedded Jetty server to deploy the war file of the CSCT. When the following command is run on the web module, the CSCT is deployed on the 8082 port.

Page 23: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 23 of 34

mvnjetty:run

4.1.3.2 Core Module The main functionality of this module is to provide a database management layer for the eligibility criteria. This layer provides basic CRUD functionalities for the eligibility criteria. This module contains a JPA based Java model of the SALUS CIM query model. More specifically, the Hibernate implementation of the JPA specification is used. For the time being, we use an Embedded H2 database which is constructed based on the JPA mappings of the query model. The core module also organizes the execution of eligibility queries for different EHR sources in parallel by sending concurrent queries for each of the EHR sources and merging the results of the initiated queries.

4.1.3.3 Web Module

As already introduced, the web module of the CSCT is basically a web application composed of two main parts:

• Java Script GUI • Java Server

Java Script GUI provides the rendering of the graphical user interface of which screenshots are presented in previous figures i.e. from Figure 3 to Figure 10. Along with the HTML5 and CSS3 technologies, several third party Java Script libraries as follows are used in this part:

• Bootstrap: • Backbone / Marionette: • Require • JQuery • Underscore

The other main part, named Java Server, provides RESTful services to be used by the GUI. This means that the communication between the GUI and server of the CSCT is realized through the RESTful services. There are two main services provided by the server part. The first one provides CRUD functionalities for the eligibility criteria and the other one is used to initiate the case series characterization analysis The GUI part is run on the JSON representation of the SALUS CIM query model. The serialization and deserialization of the intermittent data between the GUI and server is done by the Jersey framework, which is the JAX-RS implementation used in the CSCT.

4.2 Temporal Association Screening tool The Temporal Association Screening tool (TAS) enables broad scale screening for potential signals of SALUS EHR data. The tool consists of three main parts: the client, the web service and the method. The client has not been implemented yet. The web service is built as a RESTful API described below and the method is built with R scripts that handle all of the calculations needed for the statistical measure, ICΔ that is used to detect potential signals. The method runs on top of the SALUS Clinical Data Repository, which contains the population data in OMOP CDM Format. Currently the integration of TAS tool with SALUS Semantic and Technical Interoperability Layers have not been finalized yet, for this purpose simulated population data is used to test the methods developed as explained in Section 4.2.1.3.

Page 24: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 24 of 34

4.2.1 Background 4.2.1.1 Disproportionality measure - IC The IC measure of disproportionality was previously described by Norén3 and Bate4. Association between single drugs and event, called pairwise associations, is the common thing to search for in large-scale signal detection frameworks. Similar methods can be used to detect associations between groups of drugs and groups of events. Most methods for pairwise associations are based on the same contingency table where x usually denotes the occurrence of the drug and y the occurrence of the event:  

Table 3. 2x2 table of number of reports used in disproportionality analysis

x not x y a b not y c d      A simple observed-to-expected (OE) ratio for the association between x and y can be computed based on the ratio of f(y | x), the relative frequency of the event, y, conditional on the occurrence of the drug, x, f(y) and the marginal relative frequency of y: 𝒇(𝒚|𝒙)𝒇(𝒚)

= 𝒂/(𝒂!𝒄)(𝒂!𝒃)/(𝒂!𝒃!𝒄!𝒅)

( 1 )  

It can be re-expressed as an OE ratio:

𝑶𝑬 = 𝒂𝒂!𝒃 ∙(𝒂!𝒄)/(𝒂!𝒃!𝒄!𝒅)

= 𝑶𝒃𝒔𝒆𝒓𝒗𝒆𝒅𝑬𝒙𝒑𝒆𝒄𝒕𝒆𝒅

( 2 )

The Information Component, IC, is calculated with shrinkage towards zero as:

𝑰𝑪 = 𝐥𝐨𝐠𝟐𝑶𝒃𝒔𝒆𝒓𝒗𝒆𝒅!𝜶𝑬𝒙𝒑𝒆𝒄𝒕𝒆𝒅!  𝜶

( 3 )

where α = 0.5 represent the standard IC method.

4.2.1.2 Disproportionality measure for EHR data - ICΔ

The ICΔ measure of disproportionality was previously described by Norén5. Patterns over time where the occurrence of an event in relation to a drug in time is important for the analysis of electronic health records. Increased occurrence of an event shortly after drug prescription in the population could indicate a potential safety issue of the drug. Decreased occurrence of an event can indicate beneficial effects or contraindications. The method is similar to the IC measure but with the addition of a comparison to time periods seen as controls before the start of a drug prescription. Simultaneous consideration of separate time periods allows for a distinction between true temporal association and underlying tendencies of the drug and event to occur in the same patients. The ICΔ measure contrasts the OE ratio in a time period of interest v  to the corresponding OE ratio between y  and x  in a pre-defined control period u.  α is the shrinkage factor and the same factor (0.5) is used as for the IC value:

𝑰𝑪𝚫 = 𝐥𝐨𝐠𝟐𝑶𝒗!𝜶

𝑬𝒗∙𝑶𝒖/𝑬𝒖!𝜶 ( 4 )  

The baseline is that there is no difference between the IC values before and after the drug prescription. To test that hypothesis the ICΔ method compares the OE ratio in a time period after the drug prescription, v, to the OE ratio in a control period prior to the drug prescription, u. Any deviation is a suggestion of a temporal association between the drug and event. By selecting appropriate time periods many different temporal patterns can be detected.

Page 25: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 25 of 34

 

4.2.1.3 Sample population used The OMOP project created a simulation method called Observational Medical Dataset Simulator (OSIM) where the simulated data is modeled after real observational data. The data consists of hypothetical persons that use fictional drugs and experience fictional medical events. Because of the hypothetical nature of the simulated data no clinical conclusions can be drawn on these bases. For the SALUS project a cohort of 10 000 simulated persons was used and stored in a local OMOP CDM Oracle database. It should be noted that, during the second year of project the TPC will be integrated with SALUS Semantic Interoperability Layer and Technical Interoperability Layer and the patient data from EHR sites (TUD and LISPA) will be dynamically collected in the SALUS Clinical Data Repository in conformance to OMOP CDM Format.

4.2.2 Use case The basis for the Temporal Association Screening tool is to screen EHR data broadly for potential drug related problems without any prior hypothesis. Some common research questions can be “I am interested in the safety of vancomycin, can you give all the events that are likely to temporally related?” or “I would like to know all drugs that could cause acidosis and not having acidosis in the label”.

4.2.3 Workflow Before the TAS is initiated, SALUS Semantic Interoperability Layer and Technical Interoperability Layer will already have collected the patient data in the OMOP CDM format in the Clinical Data Repository. First the safety analyst defines the inclusion criteria whether it be a specific drug, a specific event, all drugs and events or a specific combination of a drug and event. The user clicks on the run button and waits for the results. The execution times can be long if the inclusion criteria include many drugs and events. The query is sent to the Temporal Association Screening web service, described below, which handles the input and creates the necessary configuration files for the Temporal Association Screening method. The web service starts the execution of the method on top of the Clinical Data Repository in OMOP CDM Format. The method then returns the results when completed and the user asks for them. The method generates the statistical measure, ICΔ, for all of the selected drug and event combinations which the safety analyst can look at and decide whether or not the drug and event combination might have the potential of becoming a signal.

4.2.3.1 Temporal Association Screening REST web service The TAS API is built on the Representational State Transfer (REST) framework using .NET 4 and C#. The four methods of the API are described in the Appendix.

4.3 Temporal Pattern Characterization tool The statistical measure derived from the Temporal Association Screening tool is a measure of disproportionality taking some confounders into account. To further analyze a specific drug-event pair a visual representation of the temporal pattern was introduced by Norén et al6 called chronographs (Figure 11).

Page 26: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 26 of 34

Figure 11-Chronograph visualizing the temporal pattern between a prescription of a drug and the occurrence of a medical event.

The lower half of Figure 11 shows a frequency diagram of occurrences of medical events in different time periods in relation to the drug prescription. In this example there are almost 4000 medical events in the first time period after drug prescription. The time periods can be of a custom defined length but the standard implementation only allows for time periods of 30 days. The graph in the upper half show the IC value in the different time periods and allows for easier identification of changes in the ratio of observed to expected number of cases. Analyzing a chronograph gives the safety analyst a visual representation of the empirical basis for a possible association between a drug prescription and an event. A consistently high IC value indicates a higher incidence of the event in this population compared to a background. A temporary increase of the IC value in the months prior to drug prescription could indicate a reversed causality i.e. the event I causing the prescription of the drug e.g. indications. An increase of the IC value in the months after the drug prescription could be a sign of a potential signal. However, such patterns do not have to imply causality since a number of reasons can exist that can explain the pattern.

4.3.1.1 Sample population used To test the TPC method developed, the same population as for the Temporal Association Screening tool was used. See section 4.2.1.3 for more information. It should be noted that, during the second year of project the TPC will be integrated with SALUS Semantic Interoperability Layer and Technical Interoperability Layer and the patient data from EHR sites (TUD and LISPA) will be dynamically collected in the SALUS Clinical Data Repository in conformance to OMOP CDM Format.

4.3.2 Use case The basis for the Temporal Pattern Characterization tool is to enable the safety analysts to analyze temporal patterns using a visual representation that enhances the possibilities to strengthen and weaken a potential signal. The following scenario is an example of a research question that the tool will be able to answer: “A safety analyst at the UMC discovered a potential signal between vancomycin and acidosis using either the Temporal Association Screening tool or the in-house

Page 27: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 27 of 34

technology to find new signals from ICSRs. The evidence in the cases found was not strong enough to warrant a real signal, more information was needed. The safety analyst logged on to the Temporal Pattern Characterization defined the criteria for the query and analyzed the chronograph. In the chronograph a clear temporal pattern with an increased IC value in the month after vancomycin prescription giving the safety analyst increased confidence of a real problem.”

4.3.3 Workflow Based on the previously described use case the safety analyst needs to specify which drug and event to search for according to the terminology specified by the user (e.g. ATC and MedDRA). When the inclusion criteria are specified the safety analyst starts the method execution by clicking the run button. The interface sends the query to the Temporal Pattern Characterization RESTful API, which actually is the same API as for the Temporal Association Screening, which creates the necessary configuration files that the method scripts will use to calculate the statistics needed to create the chronographs. The chronograph image is returned to the interface to be displayed to the user. Although not yet implemented the user interface could look like Figure 12. The necessary input is the drug and reaction selection and the output is the chronograph as depicted in Figure 11.

Figure 12-Mockup of the graphical user interface for the Temporal Pattern Characterization tool

4.3.3.1 Temporal Pattern Characterization RESTful web service The TPC web service uses the same RESTful API as described in section 4.2.3.1. The only difference is that the OrderType needs to be changed into ‘PatternCharacterzation’ instead of ‘TemporalAssociation’ for the input xml, otherwise the same methods apply.

Page 28: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 28 of 34

4.4 Patient History tool During signal evaluation the possibility to view individual patient histories is essential for ensuring that the evidence of a possible causal association between a drug and an event is not confounded by something in the patient. The patient history tool would enable the safety analyst to view the drugs, events, lab tests and patient demographics for single patients to confirm or refute the evidence of potential signals. During the course of the SALUS project it was decided to postpone the implementation of the patient history tool to the next release of the deliverable to focus our attention in the signal detection and characterization part of the project.

4.4.1 Use case The basis for the patient history tool is that during signal evaluation a safety analyst often would like to see more information about the individual patients to rule out any confounding factors hidden by the summarized statistics shown in other tools. A typical research question could be “I have a potential signal with 8 cases diagnosed with Steven Johnsons Syndrome within 3 weeks of first prescription of the antibiotic Cefotaxime. To rule out any confounding factors I need to see if there were any signs of the disease prior to the drug prescription, if the patient was currently taking drugs known to cause the disease or if there were other patient demographic information that could influence the signal/no signal decision”.

4.4.2 Workflow The workflow starts with the input of an id unique for a specific patient. The id could be the worldwide unique identifier from an individual case safety report (ICSR) or a patient id from the clinical data warehouse (the OMOP CDM database) where only de-identified patient information is stored. In other words this patient id is already a pseudonym generated. The pseudonym can e.g. be procured using the same queries as for the inclusion criteria of the Temporal Association Screening tool, but instead of returning the summarized statistics the pseudonyms can be returned from the clinical data warehouse. The query including the pseudonym is sent to the clinical data warehouse and the requested information is returned i.e. problems/allergies/diagnoses/problems, drug prescriptions, lab values and patient demographics like age, weight, height, gender after the required de-identification methods are run. The results are returned to the interface and presented to the user in a table where the columns are possible to sort by value.

Page 29: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 29 of 34

Figure 13-Interface mock-up of the Patient History tool

4.5 Using EHRs as secondary use data sources for Post Marketing safety studies

SALUS architecture can also be used to run safety analysis studies focusing on few drugs and conditions. ROCHE has defined an example study definition to collect a data set “to estimate incidence rates of Congestive Heart Failure (CHF) in diabetic patients with a recent acute coronary syndrome (ACS) event and to estimate incidence rates of CHF in patients on different diabetic medications”. During the course of the SALUS project it was decided to postpone the implementation of the Post Marketing Safety Analysis Tool to the second year of the project.

4.5.1 Use case Roche is conducting clinical trials in both acute coronary syndrome (ACS) patients and in ACS patients with diabetes. Whilst the trials are blind, it is important to compare the observed overall incidence rate of an important adverse event like CHF in the trials with that in similar background populations. Such a comparison helps provide a context to the observed incidence and enables us to identify any potential safety concerns earlier on (eg if the observed incidence in the trial is greater than the background).

4.5.2 Workflow In SALUS we would like to enable a semi-automatic approach to collect data sets from EHR Sources given the eligibility criteria and data collection set definition. The envisioned flow is as follows:

Page 30: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 30 of 34

• Safety analyst defines the data collection set and eligibility criteria in through the local data elements maintained in the pharmaceutical company which are defined in reference to CDISC Data set definitions like SDTM variables. In other words, SDTM variables are used to annotate the meaning of data elements in data set definitions.

• SALUS Services takes this machine processable data set definition and query the EHR Sources, retrieve the medical summaries of the eligible patients, de-identify them

• By making use of the annotations in "data collection set" definition (possibly SDTM vaiables), SALUS Services process the medical summaries automatically, and extract the relevant information from these medical summaries, and construct the data set collections for eligible patients. The data set collection is converted to a model (which can be spreadsheets, SAS files, to be decided by safety analyst) by SALUS services, and sent back to safety anayst.

• Safety analyst imports these data set collections to the local system and runs the analysis methods implemented either in “R” or as SAS Methods.

5 REFERENCES 1. WHO. The IMPORTANCE of PHARMACOVIGILANCE. Available from: http://apps.who.int/medicinedocs/pdf/s4893e/s4893e.pdf. 2. CIOMS. Practical aspects of signal detection in pharmacovigilance. Geneva: Council for International Organizations of Medical Sciences (CIOMS)2010. 3. Norén GN, Hopstadius J, Bate A. Shrinkage observed-to-expected ratios for robust and transparent large-scale pattern discovery. Statistical Methods in Medical Research, 2011. 4. Bate A, Lindquist M, Edwards IR, Olsson S, Orre R, Lansner A, et al. A Bayesian neural network method for adverse drug reaction signal generation. European Journal of Clinical Pharmacology, 1998, 54(4):315-321. 5. Norén N, Hopstadius J, Bate A, Star K, Edwards R. Temporal pattern discovery in longitudinal electronic patient records. Data Mining and Knowledge Discovery, 2010, (20):361-387. 6. Norén GN, Bate A, Hopstadius J, Star K, Edwards IR, editors. Temporal pattern discovery for trends and transient effects: its application to patient records. ACM SIGKDD international Conference on Knowledge Discovery and Data Mining; 2008 August 24 - 27, 2008; Las Vegas, Nevada, USA. 3.10: KDD '08. ACM.

Page 31: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 31 of 34

6 APPENDIX

6.1 Description of the RESTful API created for the Temporal Association Screening and Temporal Pattern Characterization tools.

CreateOrder takes the parameters used in the TAS method as xml input, creates the necessary files that are used by the TAS method and starts the execution of the TAS method. Input: <OrderClass> <DrugExposureType>6</DrugExposureType> <ConditionOccurrenceType>64</ConditionOccurrenceType> <Shrinkage>0.5</Shrinkage> <IcPercentile>0.25</IcPercentile> <DrugEraTable>DRUG_ERA</DrugEraTable> <ConditionEraTable>CONDITION_ERA</ConditionEraTable> <Metric>IC</Metric> <ControlPeriodStart>-1080</ControlPeriodStart> <ControlPeriodEnd>-361</ControlPeriodEnd> <MultipleControlPeriod>110</MultipleControlPeriod> <MultipleObsPeriod>1000</MultipleObsPeriod> <EndOfTreatmentCensoring>0</EndOfTreatmentCensoring> <BackgroundSelection>0</BackgroundSelection> <Domain>UMC</Domain> <User>tomasb</User> <OrderType>AssociationScreening</OrderType> <ConditionOfInterests> <ConditionOfInterest> <ConditionConceptID>123456</ConditionConceptID> </ConditionOfInterest> </ConditionOfInterests> <DrugOfInterests> <DrugOfInterest> <DrugConceptID>654654</DrugConceptID> </DrugOfInterest> </DrugOfInterests> </OrderClass>

The specification for the xml input can be found here: DrugExposureType  valid:(6  or  7)  ConditionOccurrenceType  valid:(6  or  7)  Shrinkage  valid:(  0.01  -­‐>  1000)  IcPercentile  valid:(  0.01  -­‐>  1)  DrugEraTable  valid:  (name  of  drug  era  table  in  OMOP  CDM)  ConditionEraTable  valid:  (name  of  condition  era  table  in  OMOP  CDM)  ControlPeriodStart  valid:(  -­‐99999_to_0)  ControlPeriodEnd  valid:(  -­‐99999_to_0)  MultipleControlPeriod  valid:(  000_to_111)  MultipleObsPeriod  valid:(  00000_to_11111)  EndOfTreatmentCensoring  valid:(  0  or  1)  BackgroundSelection  valid:(  0  -­‐>  4)  Domain  valid:  (any  string)  User  valid:  (any  string)  OrderType  valid:  (AssociationScreening  or  PatternCharacterization)    ConditionConceptID  valid:  (integer  corresponding  to  OMOP  CDM  concept  ids)  DrugConceptID  valid:  (integer  corresponding  to  OMOP  CDM  concept  ids)   Output: A string identifier (OrderID) for the order created which can be used in subsequent calls to identify the order. Example: public  string  CreateOrder(OrderClass  orderClass)  {  var  req  =  WebRequest.Create(ConfigurationManager.AppSettings["Umc.Salus.SafetyAnalysis.App.Client.CreateOrder.Url"].ToString(CultureInfo.InvariantCulture));      req.Method  =  "POST";      req.ContentType  =  @"application/xml;  charset=utf-­‐8";  

Page 32: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 32 of 34

   WriteOrderClassXml(req,  orderClass);          string  result  =  string.Empty;      var  resp  =  req.GetResponse()  as  HttpWebResponse;      if  (resp  !=  null  &&  resp.StatusCode  ==  HttpStatusCode.OK)      {          using  (var  respStream  =  resp.GetResponseStream())          {              if  (respStream  !=  null)              {                  var  reader  =  new  StreamReader(respStream,  Encoding.UTF8);                  result  =  reader.ReadToEnd();                  XElement  element  =  XElement.Parse(result);                  result  =  element.Value;              }          }      }      else      {          if  (resp  !=  null)              {            result  =  string.Format("Status  Code:  {0},  Status  Description:  {1}",  resp.StatusCode,  resp.StatusDescription);              }      }      return  result;  }   private  static  void  WriteOrderClassXml(WebRequest  req,  OrderClass  orderClass)  {      string  serilzer  =  XmlSerializer.SerializeNoNameSpace(orderClass).ToString();      req.ContentLength  =  Encoding.UTF8.GetByteCount(serilzer);      using  (var  stream  =  req.GetRequestStream())      {          stream.Write(Encoding.UTF8.GetBytes(serilzer),  0,  Encoding.UTF8.GetByteCount(serilzer));      }  }   OrderCompleted can be used to get a notification of whether the order is created or not. Input: OrderID as a string Output: boolean that indicated whether the order is completed or not Example: public  bool  OrderCompleted(string  orderID)  {      string  webRequestUrl  =  string.Format("{0}/{1}",      ConfigurationManager.AppSettings["Umc.Salus.SafetyAnalysis.App.Client.OrderCompleted.Url"].ToString(CultureInfo.InvariantCulture),  orderID);                                var  req  =  WebRequest.Create(webRequestUrl);      req.Method  =  "GET";          bool  result  =  false;          var  resp  =  req.GetResponse()  as  HttpWebResponse;      if  (resp  !=  null  &&  resp.StatusCode  ==  HttpStatusCode.OK)      {          using  (var  respStream  =  resp.GetResponseStream())          {              if  (respStream  !=  null)              {                  var  reader  =  new  StreamReader(respStream,  Encoding.UTF8);                  var  rt  =  reader.ReadToEnd();                  XElement  element  =  XElement.Parse(rt);                  result  =  element.Value  ==  "true";              }          }      }      

Page 33: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 33 of 34

return  result;  }   GetFileList retrieves all files created by the specified user. Input: domain and user as strings Output: a list of all files created by the user. The list contains the name and creation date of the files. Example: public  List<FileInformation>  GetFileList(string  domain,  string  user)  {      string  webRequestUrl  =  string.Format("{0}/{1}/{2}",      ConfigurationManager.AppSettings["Umc.Salus.SafetyAnalysis.App.Client.GetFileList.Url"].ToString(CultureInfo.InvariantCulture),  domain,  user);          var  req  =  WebRequest.Create(webRequestUrl);      req.Method  =  "GET";          var  result  =  new  List<FileInformation>();      var  resp  =  req.GetResponse()  as  HttpWebResponse;      if  (resp  !=  null  &&  resp.StatusCode  ==  HttpStatusCode.OK)      {          using  (var  respStream  =  resp.GetResponseStream())          {              if  (respStream  !=  null)              {                  var  reader  =  new  StreamReader(respStream,  Encoding.UTF8);                  var  rt  =  reader.ReadToEnd();                  result  =  XmlSerializer.Deserialize  <List<FileInformation>>(rt);              }          }      }      return  result;  }   public  class  FileInformation  {      public  string  Name  {  get;  set;  }      public  DateTime  CreateDate  {  get;  set;  }  }   GetFile gets a specific file. Input: filename as string Output: filecontent Example: public  FileContent  GetFile(string  fileName)  {      string  webRequestUrl  =  string.Format("{0}/{1}",      ConfigurationManager.AppSettings["Umc.Salus.SafetyAnalysis.App.Client.GetFile.Url"].ToString(CultureInfo.InvariantCulture),  fileName);          var  req  =  WebRequest.Create(webRequestUrl);      req.Method  =  "GET";          var  result  =  new  FileContent();      var  resp  =  req.GetResponse()  as  HttpWebResponse;      if  (resp  !=  null  &&  resp.StatusCode  ==  HttpStatusCode.OK)      {          using  (var  respStream  =  resp.GetResponseStream())          {              if  (respStream  !=  null)              {                  var  reader  =  new  StreamReader(respStream,  Encoding.UTF8);                  var  rt  =  reader.ReadToEnd();                  result  =  XmlSerializer.Deserialize<FileContent>(rt);              }          }      }      

Page 34: D6 2 1-Toolsets for Enabling Signal Detection on EHRs ... 2 1-Toolsets... · SALUS D6.2.1 Toolsets for Enabling Signal Detection on EHRs based on temporal patterns Due Date: May 31,

FP7-287800 SALUS

SALUS-FP7-287800 • D6.2.1 • Version 1.0, dated May 31, 2013 Page 34 of 34

return  result;  }    public  class  FileContent  {      public  string  Name  {  get;  set;  }      public  string  MIMEType  {  get;  set;  }      public  string  Type  {  get;  set;  }      public  byte[]  Content  {  get;  set;  }  }