Pneumoperitoneum on Chest X-Ray · 2018. 9. 26. · Pneumoperitoneum refers to pneumatosis, or free...
Transcript of Pneumoperitoneum on Chest X-Ray · 2018. 9. 26. · Pneumoperitoneum refers to pneumatosis, or free...
#CMIMI18
Pneumoperitoneum on Chest X-RayA DCNN Approach to Automated Detection and Localization
Using Saliency and Class Activation Maps
Jack W. Luo, Jia L. Liu, Jaron Chong MD
Department of RadiologyMcGill University,
Montreal, QC, Canada
Outline
▪ Introduction▪ Methodology
▪ Cohort Selection▪ Data Pipeline
▪ Model Design▪ Results▪ Examples▪ Conclusion
Introduction
▪ Pneumoperitoneum refers to pneumatosis, or free air, inside the peritoneal cavity
▪ Can be caused by surgical or non-surgical causes: following laparoscopy, or perforated duodenal or peptic ulcers, diverticulitis, trauma, etc.
Free air under right hemidiaphragm
Introduction
▪ Rare, often incidental finding
▪ Of critical clinical importance, as pneumoperitoneum often warrants urgent surgical intervention
▪ Delay associated with X-ray reading as CT/MRI is prioritized
▪ Deep learning can help triage chest X-rays for free air
Methodology: Cohort Selection
▪ Challenge: No standardized nomenclature for pneumoperitoneum reporting in radiology reports (e.g. pneumoperitoneum vs “free air under the right hemidiaphragm”)
▪ Pneumoperitoneum severity (e.g. mild, moderate, severe) is variable, labeling inconsistent (e.g small vs mild) and grading subjective
Methodology: Cohort Selection
▪ Solution: use highly specific RIS keyphrases that correlate with presence of pneumoperitoneum
▪ Use keywords that minimize negation in sentence, then use negex to review positive reports for sentence negation(e.g. no evidence of free air under the right hemidiaphragm)
▪ Amount and breadth of keywords makes results more sensitive
Methodology: Cohort Selection
▪ Keyphrase examples:
“suspected free air” -"regression" -"resolution"“there is evidence of pneumoperitoneum”“free air under the right hemidiaphragm”“tiny|small|mild|moderate|severe|medium|large pneumoperitoneum”“no free air”
Methodology: Data Pipeline
▪ Frontal X-rays extracted from McGill PACS with data from 2006-2017, covering 2 academic hospitals
▪ Clean subset with manually reviewed X-rays as baseline▪ Highly imbalanced, 10:1 ratio of normal to positive cases
N = 10,751968 positive, 9783 negative
N = 1,288268 positive, 1020 negative
Full dataset:
Clean dataset (manually reviewed images):
Methodology: Data Pipeline
Data conversion▪ Raw DICOM -> 299x299px▪ Standard windowing, no
histogram normalization, no position alignment
Data augmentation▪ 10° rotation, random cropping▪ No horizontal or vertical flipping
Dataset examples
Model Design
Latest advances in computer vision▪ InceptionResnetV2 network
▪ Residual summation across layers allows for deeper DCNN networks▪ 3.7% top-5 error on ImageNet
▪ Cosine Annealing▪ Get closer to global minima by cycling learning rate up and down
instead of monotonically decreasing it and getting stuck at saddle point▪ Snapshot Ensembling
▪ Take the best N epochs of your network and make a free ensemble out of one training, don’t train network N times for nothing
Model Design
Model Design
I. Loshchilov, F. Hutter. SGDR: Stochastic gradient descent with warm restarts. In proceedings of the International Conference on Learning Representations (ICLR), 2017.
Model Design
G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger. Snapshot ensembles: Train 1, get m for free. In proceedings of the International Conference on Learning Representations (ICLR), 2017.
Model Design
▪ Weighted binary cross-entropy loss function▪ A false negative is penalized 10.11x more than a false positive
▪ Splits: 70% training, 10% validation, 20% test▪ No patient or study overlap
▪ 100 epochs▪ ImageNet pre-training▪ SGD + Nesterov momentum▪ Starting LR = 0.003▪ 5 cosine annealing schedules, 10 snapshots
Results
Clean subset▪ 99.6% test accuracy▪ 0.998 test AUC
Accuracy AUC
Validation 0.997 0.998
Test 0.996 0.998
Clean dataset:
Results
Full dataset▪ 97.8% test accuracy▪ 0.988 test AUC
▪ Snapshot ensembling bumps test AUC to 0.991▪ 5-way voting ensembling (same data) bumps test AUC to 0.992▪ Not a statistically significant difference (p > 0.05)
Full dataset (no ground truth checking):
Accuracy AUC
Validation 0.986 0.991
Test 0.978 0.988
Results
▪ Single network, full dataset
▪ Network shows excellent sensitivity and specificity to free air detection
Example: Severe
▪ Use saliency + class activation maps to understand what the network sees or focuses on
▪ Model successfully locates pneumoperitoneum under right hemidiaphragm
Example: Mild
▪ By feeding free air cases ranging from tiny to severe, network generalizes free air finding across varying sizes
Conclusion
▪ DCNN architecture shows a 0.988 AUC & excellent accuracy, and localization or pneumoperitoneum classification
▪ Do not need large positive classes to create performant networks
▪ Neither conventional nor snapshot ensembling increased AUC in a statistically significant way▪ Too high correlation between networks▪ Ensembling only useful for broad, multi-label classification tasks
Conclusion
Further Investigations▪ Validate network AUC against inter-human performance▪ Generalize results to external, non-McGill studies
▪ Multi-view (frontal + lateral) semi-supervised learning ▪ Bypasses noisy label issue in radiology reports, can generate 100k+ cases
from a small labeled bootstrap▪ Make network output rough bounding boxes denoting regions of
attention from gross labels only (unsupervised segmentation)
Thank You!