Download - Deep Learning for the Automatic Generation of Radiology … · 2018. 9. 26. · #CMIMI18#CMIMI18 Deep Learning for the Automatic Generation of Radiology Impressions from Radiology

#CMIMI18#CMIMI18

Deep Learning for the Automatic Generation of Radiology

Impressions from Radiology Findings

Yuhao Zhang, Daisy Yi Ding, Tianpei Qian, Curtis P. LanglotzStanford University

#CMIMI18

Radiology Impressions

Impressions are important Summarize clinically important radiology

findings > 50% of referring physicians read only

the impressions (Lafortune et al. ,1988)

However, writing impressions is Time-consuming and repetitive Error-prone (Gershanik et al., 2011)

Findings: PA and lateral chest radiographs were obtained. Midline sternotomy wires and mediastinal clips are in unchanged position. The cardiomediastinalsilhouette remains stable.The lungs remain clear. No pleural effusion or pneumothorax.

Impression:No interval change. No acute cardiopulmonary process.

A typical chest X-ray radiology report consisting of findings and impression

#CMIMI18

Research Question

Question: Can we automate the generation of radiology impressions with deep learning and natural language processing?

Opportunity: Neural sequence-to-sequence learning

#CMIMI18

Our Model

Overall architecture

#CMIMI18

Our Model

Input: free-text findings sequence

#CMIMI18

Our Model

Word vectors: mapping words to pretrained vectors

#CMIMI18

Our Model

Encoder: an LSTM network encodes the input into vectors

#CMIMI18

Our Model

Decoder: a decoder LSTM that predict an impression word from the vocabulary at a step, given the previously predicted word as input

#CMIMI18

Our Model

Attention: at each decoder step, an “attention” distribution over the input is calculated and used for decoding

#CMIMI18

Our Model

Copy mechanism At each step of decoding, allows the model to “copy” a word from the

input findings (See et al., 2017) Combine the generation probability and copy probability with:

P(“abnormality”) = P(generating “abnormality”) + P(copying “abnormality”) Ease optimization and improve results

#CMIMI18

Experiments: Data

Stanford Hospital Dataset Radiograph reports from 2000-2014 Keep only top 12 body parts Exclude reports where no findings or impression can be found

Dataset Split # Examples % of allTrain 60,990 70Dev 8,712 10Test 17,425 20Total 87,127 -

#CMIMI18

Experiments

Two extractive baseline models Latent Semantic Analysis (Steinberger and Jezek, 2004) LexRank (Erkan and Radev, 2004) Both model works by scoring sentences in the findings and selecting

the top-scored sentence

Word vectors Pretrained using GloVe algorithm (Pennington et al., 2015) on 4.5 million

Stanford reports

#CMIMI18

Results

Standard ROUGE scores (Lin, 2004)

SystemROUGE Scores (With Human-written

Impressions)ROUGE-1 ROUGE-2 ROUGE-L

Baseline: Latent Semantic Analysis 29.4 16.3 27.4

Baseline: LexRank 30.5 17.1 28.5

Our deep learning model 46.5 33.4 45.0

* All scores have a confidence interval of at most (-0.5, +0.5) calculated with the official ROUGE script

#CMIMI18

Results

Findings: PA and lateral chest radiographs were obtained. Midline sternotomy wires and mediastinal clips are in unchanged position. The cardiomediastinal silhouette remains stable. The lungs remain clear. No pleural effusion or pneumothorax.

Human Impression:No interval change. No acute cardiopulmonary process.

LexRank Baseline:Midline sternotomy wires and mediastinal clips are in unchanged position.

Our Model:Stable appearance of the chest.

#CMIMI18

Human Evaluation

A board-certified radiologist Reviewed shuffled human-written

and system-predicted impression Selected the better one (or

equal)

System predictions are at least as good as human in 67% examples

Human Evaluation Interface

Category Percentage

Human Summary Wins 33System Prediction Wins 16Roughly Equal Quality 51

#CMIMI18

Conclusion & Future Directions

A deep learning-based sequence-to-sequence model to automatically summarize radiology report findings showed high lexical overlap with human-written summaries and good clinical validity.

Future Directions Improve recall with an external model that recognizes important

findings Improve clinical usability with a model that learns to select and edit

templates

#CMIMI18

Thank [email protected]

#CMIMI18

More ExamplesFindings: 3 views of the left shoulder demonstrate a shallow, somewhat dentate clinoid with mild glenohumeral osteoarthritis. … 3 views of the cervical spine demonstrate mild multilevel degenerative disc disease as well as multiple surgical clips in the anterior neck. No definite evidence of spondylolysis or spondylolisthesis.

Human Impression:Degenerative changes of the shoulder and cervical spine.

LexRank Baseline:3 views of the left shoulder demonstrate a shallow, somewhat dentate clinoid with mild glenohumeral osteoarthritis.

Our Model:Shallow multilevel degenerative disc disease of the cervical spine.

#CMIMI18

More Examples

Findings: The lungs are clear. There is no evidence of pleural effusion. The mediastinum and the cardiac silhouette are within normal limits. The bony structures are unremarkable.

Human Impression:No acute abnormality within the chest.

LexRank Baseline:There is no evidence of pleural effusion.

Our Model:No evidence of acute cardiopulmonary process.