Pneumoperitoneum on Chest X-Ray · 2018. 9. 26. · Pneumoperitoneum refers to pneumatosis, or free...

Post on 06-Sep-2020

4 views 0 download

Transcript of Pneumoperitoneum on Chest X-Ray · 2018. 9. 26. · Pneumoperitoneum refers to pneumatosis, or free...

#CMIMI18

Pneumoperitoneum on Chest X-RayA DCNN Approach to Automated Detection and Localization

Using Saliency and Class Activation Maps

Jack W. Luo, Jia L. Liu, Jaron Chong MD

Department of RadiologyMcGill University,

Montreal, QC, Canada

Outline

▪ Introduction▪ Methodology

▪ Cohort Selection▪ Data Pipeline

▪ Model Design▪ Results▪ Examples▪ Conclusion

Introduction

▪ Pneumoperitoneum refers to pneumatosis, or free air, inside the peritoneal cavity

▪ Can be caused by surgical or non-surgical causes: following laparoscopy, or perforated duodenal or peptic ulcers, diverticulitis, trauma, etc.

Free air under right hemidiaphragm

Introduction

▪ Rare, often incidental finding

▪ Of critical clinical importance, as pneumoperitoneum often warrants urgent surgical intervention

▪ Delay associated with X-ray reading as CT/MRI is prioritized

▪ Deep learning can help triage chest X-rays for free air

Methodology: Cohort Selection

▪ Challenge: No standardized nomenclature for pneumoperitoneum reporting in radiology reports (e.g. pneumoperitoneum vs “free air under the right hemidiaphragm”)

▪ Pneumoperitoneum severity (e.g. mild, moderate, severe) is variable, labeling inconsistent (e.g small vs mild) and grading subjective

Methodology: Cohort Selection

▪ Solution: use highly specific RIS keyphrases that correlate with presence of pneumoperitoneum

▪ Use keywords that minimize negation in sentence, then use negex to review positive reports for sentence negation(e.g. no evidence of free air under the right hemidiaphragm)

▪ Amount and breadth of keywords makes results more sensitive

Methodology: Cohort Selection

▪ Keyphrase examples:

“suspected free air” -"regression" -"resolution"“there is evidence of pneumoperitoneum”“free air under the right hemidiaphragm”“tiny|small|mild|moderate|severe|medium|large pneumoperitoneum”“no free air”

Methodology: Data Pipeline

▪ Frontal X-rays extracted from McGill PACS with data from 2006-2017, covering 2 academic hospitals

▪ Clean subset with manually reviewed X-rays as baseline▪ Highly imbalanced, 10:1 ratio of normal to positive cases

N = 10,751968 positive, 9783 negative

N = 1,288268 positive, 1020 negative

Full dataset:

Clean dataset (manually reviewed images):

Methodology: Data Pipeline

Data conversion▪ Raw DICOM -> 299x299px▪ Standard windowing, no

histogram normalization, no position alignment

Data augmentation▪ 10° rotation, random cropping▪ No horizontal or vertical flipping

Dataset examples

Model Design

Latest advances in computer vision▪ InceptionResnetV2 network

▪ Residual summation across layers allows for deeper DCNN networks▪ 3.7% top-5 error on ImageNet

▪ Cosine Annealing▪ Get closer to global minima by cycling learning rate up and down

instead of monotonically decreasing it and getting stuck at saddle point▪ Snapshot Ensembling

▪ Take the best N epochs of your network and make a free ensemble out of one training, don’t train network N times for nothing

Model Design

Model Design

I. Loshchilov, F. Hutter. SGDR: Stochastic gradient descent with warm restarts. In proceedings of the International Conference on Learning Representations (ICLR), 2017.

Model Design

G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger. Snapshot ensembles: Train 1, get m for free. In proceedings of the International Conference on Learning Representations (ICLR), 2017.

Model Design

▪ Weighted binary cross-entropy loss function▪ A false negative is penalized 10.11x more than a false positive

▪ Splits: 70% training, 10% validation, 20% test▪ No patient or study overlap

▪ 100 epochs▪ ImageNet pre-training▪ SGD + Nesterov momentum▪ Starting LR = 0.003▪ 5 cosine annealing schedules, 10 snapshots

Results

Clean subset▪ 99.6% test accuracy▪ 0.998 test AUC

Accuracy AUC

Validation 0.997 0.998

Test 0.996 0.998

Clean dataset:

Results

Full dataset▪ 97.8% test accuracy▪ 0.988 test AUC

▪ Snapshot ensembling bumps test AUC to 0.991▪ 5-way voting ensembling (same data) bumps test AUC to 0.992▪ Not a statistically significant difference (p > 0.05)

Full dataset (no ground truth checking):

Accuracy AUC

Validation 0.986 0.991

Test 0.978 0.988

Results

▪ Single network, full dataset

▪ Network shows excellent sensitivity and specificity to free air detection

Example: Severe

▪ Use saliency + class activation maps to understand what the network sees or focuses on

▪ Model successfully locates pneumoperitoneum under right hemidiaphragm

Example: Mild

▪ By feeding free air cases ranging from tiny to severe, network generalizes free air finding across varying sizes

Conclusion

▪ DCNN architecture shows a 0.988 AUC & excellent accuracy, and localization or pneumoperitoneum classification

▪ Do not need large positive classes to create performant networks

▪ Neither conventional nor snapshot ensembling increased AUC in a statistically significant way▪ Too high correlation between networks▪ Ensembling only useful for broad, multi-label classification tasks

Conclusion

Further Investigations▪ Validate network AUC against inter-human performance▪ Generalize results to external, non-McGill studies

▪ Multi-view (frontal + lateral) semi-supervised learning ▪ Bypasses noisy label issue in radiology reports, can generate 100k+ cases

from a small labeled bootstrap▪ Make network output rough bounding boxes denoting regions of

attention from gross labels only (unsupervised segmentation)

Thank You!