Multi-modal Patient Cohort Identification from EEG Report ......
date post
23-Jul-2018Category
Documents
view
213download
0
Embed Size (px)
Transcript of Multi-modal Patient Cohort Identification from EEG Report ......
Multi-modal Patient Cohort Identificationfrom EEG Report and Signal Data
Travis R. Goodwin and Sanda M. Harabagiu
The University of Texas at Dallas
Human Language Technology Research Institute
http://www.hlt.utdallas.edu
Conflicts
Nothing to disclose.
Presentation Outline
1. Introduction
2. Data
3. Methods
4. Evaluation
5. Conclusions
Introduction: The ProblemBackground:
An electroencephalogram (EEG) measures the electrical activity of the brain.
EEGs are an important investigational tool in the diagnosis and management of epilepsies and other types of brain disorders.
Problem:
EEG interpretation is known to have only moderate agreement (Beniczky et al., 2013).
Introduction: The Solution
Solution:
Leverage Big Data of EEG reports
Improve interpretation agreement by allowing neurologists to search for patients that exhibit similar EEG characteristics
Automatically identifying patient cohorts can inform the clinical decision of neurologists
enable comparative clinical effectiveness research
Introduction: Examples
Scenario 1:
Neurologist suspects their patient has epilepsy potential and wants to review similar cases
Query: Patients with a history of seizures and EEG with TIRDA without sharps, spikes, or electrographic seizures.
Scenario 2:
Neurologist wants to investigate effective interventions for epilepsy accompanied by mental health disorders.
Query: Patients with a history of Alzheimer and abnormal EEG
Temporal Intermittent Rhythmic Delta Activity
Multi-modal Encephalogram Patient Cohort Discovery(MERCuRY)
Goal: Identify patients satisfying cohort criteria expression in a natural language query.
MERCuRY System:
Considers both the EEG report as well as the signal data
Natural language processing to identify inclusion/exclusion criteria in the query
Deep learning to represent EEG signals and produce a multi-modal index of EEG information
Ranks patients using relevance models
Presentation Outline
1. Introduction
2. Data
3. Methods
4. Evaluation
5. Conclusions
Data: TUH EEG Corpus
Temple University Hospital (TUH) EEG Corpus
Largest publically available dataset of EEG data 25,000 EEG sessions
15,000 patients
Collected over 12 years
Contains both EEG Reports and EEG signal data
Data: EEG Reports
American Clinical Neurophysiology Society (ACNS) Guidelines for writing EEG reports
Clinical History: patients age, gender, relevant medical conditions and medications
Introduction: EEG technique/configuration digital video EEG, standard 10-20 system with 1 channel EKG
Description: describes any notable waveform activity, patterns, or EEG events sharp wave, burst suppression pattern, very quick jerks of the head
Impression: interpretation of whether the EEG indicates normal or abnormal brain activity, as well as a list of contributing epileptiform phenomena
abnormal EEG due to background slowing
Clinical correlation: relates the EEG findings to the over-all clinical picture of the patient very worrisome prognostic features
Data: EEG Signals
Each report is associated with its EEG signal recording
EEG signals contain 24 to 36 channels with an additional annotation channel identifying events of interest to the physicians and/or technicians
Sampled at a rate of 250 Hz or 256 Hz using 16-bits per sample
Each EEG recording contains roughly 20 MB of raw data!
Uses the European Data Format (EDF+) schema
Presentation Outline
1. Introduction
2. Data
3. Methods
4. Evaluation
5. Conclusions
Methods: Overview
Queries are processed by natural language analysis: term filtering, query formulation, query expansion
EEG reports are identified by a relevance model Case 1: EEG report ONLY Case 2: EEG report + EEG signal (Multi-modal)
Relevance model relies on multi-modal Index Term/phrase dictionary, tiered inverted lists, EEG signal fingerprints
EEG Report Processing: section identification, medical language processing
EEG Signal Processing: deep neural learning: fingerprint detection
Query
Relevance Model
Index
Methods: MERCuRY System Overview
EEG Cohort Description
EEG Report
EEG Signal
EEG Report Processing1. Section identification2. Medical language processing
EEG Signal ProcessingDeep Neural Network
Multi-Modal EEG INDEX
Analysis1. Term Filtering2. Query Formulation3. Query Expansion
Term / PhraseDictionary
Tiered Inverted Lists
EEG Signal fingerprints
Relevance Model
Case 1
Case 2
Patient Cohort CASE 1
Patient Cohort CASE 2
Methods: Indexing EEG Reports
Concept dictionary
Identified 5 types of medical concepts: PROBLEMS (e.g. seizure)
TREATMENTS (e.g. Topamax)
TESTS (e.g. video EEG)
EEG PATTERNS/ACTIVITIES (e.g. focal slowing, polyspike)
EEG EVENTS (e.g. blink, jerk of the head)
Links each medical concept to the terms expressing it: e.g., spike and slow wave spike, and, slow, wave
Methods: Indexing EEG Reports
Term Dictionary
Each term is associated with two inverted lists positive polarity: mentions in which the term is positive
negative polarity: mentions in which the term is negated
Each cell in the inverted list contains: the EEG report containing the mention
the name of the section containing the mention
the position of the mention within the section (e.g. term offset)
position(s) of the term in any associated medical concepts
the EEG signal fingerprint associated with the EEG report
Overview of the Tiered Multi-Modal Index
term IDterm IDterm ID.term IDterm IDterm IDterm IDterm IDterm ID.term ID
alphabetahypertension.lovenoxseizuresharpslowspikestroke.wave
TER
M
DIC
TIO
NA
RY
POSITIVEPOLARITY
NEGATIVEPOLARITY
EEG Report ID EEG Signal Fingerprint ID
Report Section Report Section Position
Medical Concept ID Concept Position
EEG Report ID EEG Signal Fingerprint ID
Report Section Report Section Position
Medical Concept ID Concept Position
Next
Next
EEG Signalfingerprints
Medical Concept ID.Medical Concept ID
Concept Type.Concept Type
alpha.Sharp and slow wave
term ID.term ID
Medical ConceptDICTIONARY
Tiered Inverted Lists
Methods: Fingerprinting EEG Signals
EEG signal encoded as a dense floating-point matrix, ,
is the number of electrode channels
is the number of samples in the recording (e.g. / 250 = duration of the recording in seconds)
One pass over the EEG signals requires considering over 1.8 terabytes of information!
We need a more compact representation: EEG fingerprints
Methods: Learning EEG Fingerprints
Deep neural learning Process EEG signals in a matter of hours
rather than weeks Reduce each EEG signal from 20 MB to a
few hundred bytes
Recurrent Neural Network Consider the EEG signal as a sequence of samples For each sample , learn to predict the next sample +
Long Short-Term Memory Can learn long-range interactions in the EEG signal Maintains & updates an internal memory Final internal memory becomes the EEG fingerprint
Methods: Relevance Model
Purpose: measure the relevance between a query and an EEG report
Case 1: consider EEG reports only BM25F ranking function
Gives a different weight to query matches in each field, and for each polarity
Case 2: consider EEG report + EEG fingerprint Retrieve initial set of EEG reports as in Case 1
Identify the top-ranked EEG reports
Lookup the most-similar EEG fingerprints for top-ranked reports
In our experiments, = ; =
Presentation Outline
1. Introduction
2. Data
3. Methods
4. Evaluation
5. Conclusions
Evaluation: Queries
Asked neurologists to provide patient cohort descriptions (queries)
Patient Cohort Description (Queries)1. History of seizures and EEG with TIRDA without sharps, spikes, or electrographic
seizures2. History of Alzheimer dementia and normal EEG3. Patients with altered mental status and EEG showing nonconvulsive status
epilepticus (NCSE)4. Patients under 18 years old with absence seizures5. Patients over age 18 with history of developmental delay and EEG with
electrographic seizures
Evaluation: Patient Cohort Quality
For each query Identified the 10 most relevant EEG reports
Random sample of 10 EEG reports retrieved between ranks 11 and 100.
Asked neurologists to judge whether each EEG report was relevant: 1: the patient described in the report definitely belongs to the cohort
0: the patient described in the report does not belong to the cohort
Measured using standard information retrieval metrics: Mean Average Precision (MAP)
Normalized Discounted Cumulative Gain (NDCG)
Precision at Rank 10 ([email protected])
Evaluation: Patient Cohort Quality
Relevance Model MAP NDCG P @ 10Baseline 1: BM25 52.05% 66.41% 80.00%Baseline 2: LMD 50.37% 65.90% 80.00%Baseline 3: DFR 46.22% 59.35% 70.00%MERCuRY: Case 1 (a) 58.59% 72.14% 90.00%MERCuRY: Case 1 (b) 57.95% 70.34% 90.00%MERCuRY: Case 2 (a) 70.43% 84.62% 100.00%MERCuRY