#CMIMI18#CMIMI18
Deep Learning for the Automatic Generation of Radiology
Impressions from Radiology Findings
Yuhao Zhang, Daisy Yi Ding, Tianpei Qian, Curtis P. LanglotzStanford University
#CMIMI18
Radiology Impressions
Impressions are important Summarize clinically important radiology
findings > 50% of referring physicians read only
the impressions (Lafortune et al. ,1988)
However, writing impressions is Time-consuming and repetitive Error-prone (Gershanik et al., 2011)
Findings: PA and lateral chest radiographs were obtained. Midline sternotomy wires and mediastinal clips are in unchanged position. The cardiomediastinalsilhouette remains stable.The lungs remain clear. No pleural effusion or pneumothorax.
Impression:No interval change. No acute cardiopulmonary process.
A typical chest X-ray radiology report consisting of findings and impression
#CMIMI18
Research Question
Question: Can we automate the generation of radiology impressions with deep learning and natural language processing?
Opportunity: Neural sequence-to-sequence learning
#CMIMI18
Our Model
Overall architecture
#CMIMI18
Our Model
Input: free-text findings sequence
#CMIMI18
Our Model
Word vectors: mapping words to pretrained vectors
#CMIMI18
Our Model
Encoder: an LSTM network encodes the input into vectors
#CMIMI18
Our Model
Decoder: a decoder LSTM that predict an impression word from the vocabulary at a step, given the previously predicted word as input
#CMIMI18
Our Model
Attention: at each decoder step, an “attention” distribution over the input is calculated and used for decoding
#CMIMI18
Our Model
Copy mechanism At each step of decoding, allows the model to “copy” a word from the
input findings (See et al., 2017) Combine the generation probability and copy probability with:
P(“abnormality”) = P(generating “abnormality”) + P(copying “abnormality”) Ease optimization and improve results
#CMIMI18
Experiments: Data
Stanford Hospital Dataset Radiograph reports from 2000-2014 Keep only top 12 body parts Exclude reports where no findings or impression can be found
Dataset Split # Examples % of allTrain 60,990 70Dev 8,712 10Test 17,425 20Total 87,127 -
#CMIMI18
Experiments
Two extractive baseline models Latent Semantic Analysis (Steinberger and Jezek, 2004) LexRank (Erkan and Radev, 2004) Both model works by scoring sentences in the findings and selecting
the top-scored sentence
Word vectors Pretrained using GloVe algorithm (Pennington et al., 2015) on 4.5 million
Stanford reports
#CMIMI18
Results
Standard ROUGE scores (Lin, 2004)
SystemROUGE Scores (With Human-written
Impressions)ROUGE-1 ROUGE-2 ROUGE-L
Baseline: Latent Semantic Analysis 29.4 16.3 27.4
Baseline: LexRank 30.5 17.1 28.5
Our deep learning model 46.5 33.4 45.0
* All scores have a confidence interval of at most (-0.5, +0.5) calculated with the official ROUGE script
#CMIMI18
Results
Findings: PA and lateral chest radiographs were obtained. Midline sternotomy wires and mediastinal clips are in unchanged position. The cardiomediastinal silhouette remains stable. The lungs remain clear. No pleural effusion or pneumothorax.
Human Impression:No interval change. No acute cardiopulmonary process.
LexRank Baseline:Midline sternotomy wires and mediastinal clips are in unchanged position.
Our Model:Stable appearance of the chest.
#CMIMI18
Human Evaluation
A board-certified radiologist Reviewed shuffled human-written
and system-predicted impression Selected the better one (or
equal)
System predictions are at least as good as human in 67% examples
Human Evaluation Interface
Category Percentage
Human Summary Wins 33System Prediction Wins 16Roughly Equal Quality 51
#CMIMI18
Conclusion & Future Directions
A deep learning-based sequence-to-sequence model to automatically summarize radiology report findings showed high lexical overlap with human-written summaries and good clinical validity.
Future Directions Improve recall with an external model that recognizes important
findings Improve clinical usability with a model that learns to select and edit
templates
#CMIMI18
Thank [email protected]
#CMIMI18
More ExamplesFindings: 3 views of the left shoulder demonstrate a shallow, somewhat dentate clinoid with mild glenohumeral osteoarthritis. … 3 views of the cervical spine demonstrate mild multilevel degenerative disc disease as well as multiple surgical clips in the anterior neck. No definite evidence of spondylolysis or spondylolisthesis.
Human Impression:Degenerative changes of the shoulder and cervical spine.
LexRank Baseline:3 views of the left shoulder demonstrate a shallow, somewhat dentate clinoid with mild glenohumeral osteoarthritis.
Our Model:Shallow multilevel degenerative disc disease of the cervical spine.
#CMIMI18
More Examples
Findings: The lungs are clear. There is no evidence of pleural effusion. The mediastinum and the cardiac silhouette are within normal limits. The bony structures are unremarkable.
Human Impression:No acute abnormality within the chest.
LexRank Baseline:There is no evidence of pleural effusion.
Our Model:No evidence of acute cardiopulmonary process.
Top Related