Multi-modal Patient Cohort Identification from EEG Report ......

Click here to load reader

  • date post

    23-Jul-2018
  • Category

    Documents

  • view

    213
  • download

    0

Embed Size (px)

Transcript of Multi-modal Patient Cohort Identification from EEG Report ......

  • Multi-modal Patient Cohort Identificationfrom EEG Report and Signal Data

    Travis R. Goodwin and Sanda M. Harabagiu

    The University of Texas at Dallas

    Human Language Technology Research Institute

    http://www.hlt.utdallas.edu

  • Conflicts

    Nothing to disclose.

  • Presentation Outline

    1. Introduction

    2. Data

    3. Methods

    4. Evaluation

    5. Conclusions

  • Introduction: The ProblemBackground:

    An electroencephalogram (EEG) measures the electrical activity of the brain.

    EEGs are an important investigational tool in the diagnosis and management of epilepsies and other types of brain disorders.

    Problem:

    EEG interpretation is known to have only moderate agreement (Beniczky et al., 2013).

  • Introduction: The Solution

    Solution:

    Leverage Big Data of EEG reports

    Improve interpretation agreement by allowing neurologists to search for patients that exhibit similar EEG characteristics

    Automatically identifying patient cohorts can inform the clinical decision of neurologists

    enable comparative clinical effectiveness research

  • Introduction: Examples

    Scenario 1:

    Neurologist suspects their patient has epilepsy potential and wants to review similar cases

    Query: Patients with a history of seizures and EEG with TIRDA without sharps, spikes, or electrographic seizures.

    Scenario 2:

    Neurologist wants to investigate effective interventions for epilepsy accompanied by mental health disorders.

    Query: Patients with a history of Alzheimer and abnormal EEG

    Temporal Intermittent Rhythmic Delta Activity

  • Multi-modal Encephalogram Patient Cohort Discovery(MERCuRY)

    Goal: Identify patients satisfying cohort criteria expression in a natural language query.

    MERCuRY System:

    Considers both the EEG report as well as the signal data

    Natural language processing to identify inclusion/exclusion criteria in the query

    Deep learning to represent EEG signals and produce a multi-modal index of EEG information

    Ranks patients using relevance models

  • Presentation Outline

    1. Introduction

    2. Data

    3. Methods

    4. Evaluation

    5. Conclusions

  • Data: TUH EEG Corpus

    Temple University Hospital (TUH) EEG Corpus

    Largest publically available dataset of EEG data 25,000 EEG sessions

    15,000 patients

    Collected over 12 years

    Contains both EEG Reports and EEG signal data

  • Data: EEG Reports

    American Clinical Neurophysiology Society (ACNS) Guidelines for writing EEG reports

    Clinical History: patients age, gender, relevant medical conditions and medications

    Introduction: EEG technique/configuration digital video EEG, standard 10-20 system with 1 channel EKG

    Description: describes any notable waveform activity, patterns, or EEG events sharp wave, burst suppression pattern, very quick jerks of the head

    Impression: interpretation of whether the EEG indicates normal or abnormal brain activity, as well as a list of contributing epileptiform phenomena

    abnormal EEG due to background slowing

    Clinical correlation: relates the EEG findings to the over-all clinical picture of the patient very worrisome prognostic features

  • Data: EEG Signals

    Each report is associated with its EEG signal recording

    EEG signals contain 24 to 36 channels with an additional annotation channel identifying events of interest to the physicians and/or technicians

    Sampled at a rate of 250 Hz or 256 Hz using 16-bits per sample

    Each EEG recording contains roughly 20 MB of raw data!

    Uses the European Data Format (EDF+) schema

  • Presentation Outline

    1. Introduction

    2. Data

    3. Methods

    4. Evaluation

    5. Conclusions

  • Methods: Overview

    Queries are processed by natural language analysis: term filtering, query formulation, query expansion

    EEG reports are identified by a relevance model Case 1: EEG report ONLY Case 2: EEG report + EEG signal (Multi-modal)

    Relevance model relies on multi-modal Index Term/phrase dictionary, tiered inverted lists, EEG signal fingerprints

    EEG Report Processing: section identification, medical language processing

    EEG Signal Processing: deep neural learning: fingerprint detection

    Query

    Relevance Model

    Index

  • Methods: MERCuRY System Overview

    EEG Cohort Description

    EEG Report

    EEG Signal

    EEG Report Processing1. Section identification2. Medical language processing

    EEG Signal ProcessingDeep Neural Network

    Multi-Modal EEG INDEX

    Analysis1. Term Filtering2. Query Formulation3. Query Expansion

    Term / PhraseDictionary

    Tiered Inverted Lists

    EEG Signal fingerprints

    Relevance Model

    Case 1

    Case 2

    Patient Cohort CASE 1

    Patient Cohort CASE 2

  • Methods: Indexing EEG Reports

    Concept dictionary

    Identified 5 types of medical concepts: PROBLEMS (e.g. seizure)

    TREATMENTS (e.g. Topamax)

    TESTS (e.g. video EEG)

    EEG PATTERNS/ACTIVITIES (e.g. focal slowing, polyspike)

    EEG EVENTS (e.g. blink, jerk of the head)

    Links each medical concept to the terms expressing it: e.g., spike and slow wave spike, and, slow, wave

  • Methods: Indexing EEG Reports

    Term Dictionary

    Each term is associated with two inverted lists positive polarity: mentions in which the term is positive

    negative polarity: mentions in which the term is negated

    Each cell in the inverted list contains: the EEG report containing the mention

    the name of the section containing the mention

    the position of the mention within the section (e.g. term offset)

    position(s) of the term in any associated medical concepts

    the EEG signal fingerprint associated with the EEG report

  • Overview of the Tiered Multi-Modal Index

    term IDterm IDterm ID.term IDterm IDterm IDterm IDterm IDterm ID.term ID

    alphabetahypertension.lovenoxseizuresharpslowspikestroke.wave

    TER

    M

    DIC

    TIO

    NA

    RY

    POSITIVEPOLARITY

    NEGATIVEPOLARITY

    EEG Report ID EEG Signal Fingerprint ID

    Report Section Report Section Position

    Medical Concept ID Concept Position

    EEG Report ID EEG Signal Fingerprint ID

    Report Section Report Section Position

    Medical Concept ID Concept Position

    Next

    Next

    EEG Signalfingerprints

    Medical Concept ID.Medical Concept ID

    Concept Type.Concept Type

    alpha.Sharp and slow wave

    term ID.term ID

    Medical ConceptDICTIONARY

    Tiered Inverted Lists

  • Methods: Fingerprinting EEG Signals

    EEG signal encoded as a dense floating-point matrix, ,

    is the number of electrode channels

    is the number of samples in the recording (e.g. / 250 = duration of the recording in seconds)

    One pass over the EEG signals requires considering over 1.8 terabytes of information!

    We need a more compact representation: EEG fingerprints

  • Methods: Learning EEG Fingerprints

    Deep neural learning Process EEG signals in a matter of hours

    rather than weeks Reduce each EEG signal from 20 MB to a

    few hundred bytes

    Recurrent Neural Network Consider the EEG signal as a sequence of samples For each sample , learn to predict the next sample +

    Long Short-Term Memory Can learn long-range interactions in the EEG signal Maintains & updates an internal memory Final internal memory becomes the EEG fingerprint

  • Methods: Relevance Model

    Purpose: measure the relevance between a query and an EEG report

    Case 1: consider EEG reports only BM25F ranking function

    Gives a different weight to query matches in each field, and for each polarity

    Case 2: consider EEG report + EEG fingerprint Retrieve initial set of EEG reports as in Case 1

    Identify the top-ranked EEG reports

    Lookup the most-similar EEG fingerprints for top-ranked reports

    In our experiments, = ; =

  • Presentation Outline

    1. Introduction

    2. Data

    3. Methods

    4. Evaluation

    5. Conclusions

  • Evaluation: Queries

    Asked neurologists to provide patient cohort descriptions (queries)

    Patient Cohort Description (Queries)1. History of seizures and EEG with TIRDA without sharps, spikes, or electrographic

    seizures2. History of Alzheimer dementia and normal EEG3. Patients with altered mental status and EEG showing nonconvulsive status

    epilepticus (NCSE)4. Patients under 18 years old with absence seizures5. Patients over age 18 with history of developmental delay and EEG with

    electrographic seizures

  • Evaluation: Patient Cohort Quality

    For each query Identified the 10 most relevant EEG reports

    Random sample of 10 EEG reports retrieved between ranks 11 and 100.

    Asked neurologists to judge whether each EEG report was relevant: 1: the patient described in the report definitely belongs to the cohort

    0: the patient described in the report does not belong to the cohort

    Measured using standard information retrieval metrics: Mean Average Precision (MAP)

    Normalized Discounted Cumulative Gain (NDCG)

    Precision at Rank 10 ([email protected])

  • Evaluation: Patient Cohort Quality

    Relevance Model MAP NDCG P @ 10Baseline 1: BM25 52.05% 66.41% 80.00%Baseline 2: LMD 50.37% 65.90% 80.00%Baseline 3: DFR 46.22% 59.35% 70.00%MERCuRY: Case 1 (a) 58.59% 72.14% 90.00%MERCuRY: Case 1 (b) 57.95% 70.34% 90.00%MERCuRY: Case 2 (a) 70.43% 84.62% 100.00%MERCuRY