Improving Network Accuracy with Augmented Imagery Training...
Transcript of Improving Network Accuracy with Augmented Imagery Training...
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Improving Network Accuracy with Augmented Imagery Training Data
Ted Hromadka1, LCDR Niels Olson2 MD 1Integrity Applications Incorporated℠, 2US Navy NMCSD
Presented at GTC 2017
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Goals
• Determine effectiveness of various affine transforms for augmenting medical imagery training data
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Background – prostate cancer is a significant problem for DoD
• US military’s hospitals care for disproportionately more male patients • Prostate cancer is second-leading cause of cancer death in American men
– Approximately 220,000 new cases per year
• Early screening involves a blood test for prostate-specific antigen (PSA) or a digital rectal exam (DRE) – If those tests generate abnormal results, then a prostate biopsy may be
required
http://www.va.gov/vetdata/docs/quickfacts/Population_slideshow.pdf http://seer.cancer.gov/statfacts/html/prost.html http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Each biopsy procedure creates around 12 samples
• Prostate biopsy is conducted by taking “core samples” using a hollow needle • After processing, 5 micron sections of these samples are placed on glass slides,
stained, and manually interpreted by a pathologist under a microscope.
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Analysis is very labor-intensive
• Digital scans are opened with custom viewing software from the microscope vendor – Multiple zoom levels available up to
40x. This dataset was scanned at 20x.
• Pathologist will annotate cancerous regions with polygons drawn by hand with a mouse
• Process requires careful judgment and is susceptible to fatigue and stress factors. Polygons cannot be edited once drawn (e.g., at higher magnification).
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Augmented data
• Necessary when training data set is too small to generalize – Some data is more expensive to source – Medical, copyrighted, classified, etc
• Also necessary when doing object detection, not just image classification
– Sensitivity and accuracy
• Generally = supplementing original training data set with significant transforms
• Also = synthesizing new training data from models or GANs
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Using MNIST to demonstrate effect of rotation on CNN
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
DetectNet
• Released by NVIDIA with DIGITS4, last summer – Based on KITTI - self-driving car computer vision project
• Incorporates some affine transformations as a training option
detectnet_augmentation_param {
crop_prob: 1.0
shift_x: 32
shift_y: 32
scale_prob: 0.4
scale_min: 0.8
scale_max: 1.2
flip_prob: 0.5
rotation_prob: 0.0
max_rotate_degree: 5.0
hue_rotation_prob: 0.8
hue_rotation: 30.0
desaturation_prob: 0.8
desaturation_max: 0.8
}
https://devblogs.nvidia.com/parallelforall/deep-learning-object-detection-digits/ https://devblogs.nvidia.com/parallelforall/detectnet-deep-neural-network-object-detection-digits/
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Test Procedure
• Goal = controlling all other variables, determine the effect of each augmented data approach
• First evaluate DetectNet-supported parameters – Stride (shift) – Scale
• We have two variations on this: image zoom vs microscope zoom – Flip (horizontal, vertical) – Rotation, Hue, Desaturation – Not relevant: Crop
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Test Procedure
• Also evaluate factors beyond DetectNet – Oversaturation – Noise (random speckle) – Artifacts
• Ink • Scanning stripes
• Not evaluated: skew/shear, warping, squeeze mapping, non-affine transforms
• All tests use the same learning parameters and stock GoogLeNet on top of Caffe1 • Cross-validation (25%) to reduce overfitting • Each factor test run five times
– Once for each of 5 training/testing permutations (staggered 80%/20% cohorts) – Accuracy results were averaged together at the end
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Specifications
• System – NVIDIA DevBox 1 – 4x NVIDIA TITAN X GPU
• 12GB memory per GPU – 6-core Intel i7-5930K at 3.5 GHz with 64GB DDR4 – Ubuntu 14.04, DIGITS 4, CUDA/cuDNN/NVCaffe
• DIGITS 5 rumored to support new image segmentation features
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Specifications
• Imagery – Source: NMCSD samples
• Data later shared with “Cancer Moonshot”, Joint Pathology Center, and commercial partners
– Collected, scanned, and annotated by NMCSD pathologists – 202 annotated full-size color SVS images over 100K image chips @ 256px
• Average full size image ~ 845 MB
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
MATLAB image chipper prepared the images
• Split SVS into image chips of size 256x256 pixels at the 4:1 zoom level • Chipper labels each image chip based on XML annotation polygons (50%
inclusion rule for Caffe; more refined segmentation for TF) • Chipper 2.0 also used pixel averaging and histograms to determine if chip was a
“blank”. This could also be applied to ink smears.
• Original set of image chips were then augmented with a supplemental set of processed chips
http://caffe.berkeleyvision.org/ XML parser built on work by Andrew Janowczyk (http://www.andrewjanowczyk.com/)
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Need to handle pragmatic labeling, real-world data
• Training data suffers from some inaccuracies – Annotation was not meant for training
neural networks – Not pixel-perfect
• Artifacts due to the scanner or tissue
preparation – Striping – Ink
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Side note: how accurate is the measure of accuracy?
• Elmore et al – Breast Biopsy Concordance study found only 75% agreement between expert pathologists
– JAMA, 2015: http://jama.jamanetwork.com/article.aspx?articleid=2203798
• If an image chip is 50% cancer and 50% other tissue, how should it be labeled? – This issue already addressed in frameworks like TensorFlow (regions exactly
specified in JSON), Caffe+DetectNet (regions specified in labels .txt up to 400px), other frameworks
• Need protocol for the confidence levels
– What threshold to use when network gives it a substantial but minority chance of cancer?
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Original “Naïve” Model baseline
• Chip count – Cancer: 38,080 – Benign: 34,095
• MATLAB
– None
• Effect – Control set
Overall Accuracy 0.86 -
Precision 0.86 - Recall = Sensitivity 0.89 - Specificity 0.84 -
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Stride-varied Model
• Chip count – Cancer: 35,033 – Benign: 32,731
• MATLAB
– None
• Effect – Surprisingly similar to the control set
Overall Accuracy 0.86 - Precision 0.87 +0.01 Recall = Sensitivity 0.88 -0.01 Specificity 0.84 -
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Flip Models
• Chip count – Cancer: 76,160 – Benign: 68,190
• MATLAB
– B = fliplr(A); – B = flipud(A);
• Effect in FlipH
– Consistent 2 point improvement. FlipV was similar.
Overall Accuracy 0.88 +0.02 Precision 0.88 +0.02 Recall = Sensitivity 0.91 +0.02 Specificity 0.86 +0.02
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Rotate-90 Model
• Chip count – Cancer: 76,160 – Benign: 68,190
• MATLAB
– B = imrotate(A, 90);
• Effect
– Slightly improved from flipping, not sure why
Overall Accuracy 0.88 +0.02 Precision 0.88 +0.02 Recall = Sensitivity 0.90 +0.01 Specificity 0.87 +0.03
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Rotate-360 Model
• Chip count – Cancer: 152,320 – Benign: 136,380
• MATLAB
– B = imrotate(A, 90); – B = imrotate(A, 180); – B = imrotate(A, 270);
• Effect
– 3-point improvement, no overfitting indicated • Training loss rate decreased with validation loss
Overall Accuracy 0.89 +0.03 Precision 0.89 +0.03 Recall = Sensitivity 0.91 +0.02 Specificity 0.88 +0.04
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Hue Model
• Chip count – Cancer: 76,160 – Benign: 68,190
• MATLAB
– H = rgb2hsv(A);
– H(:,:,1) = H(:,:,1) * (rand / 50.0);
– B = hsv2rgb(H); • Effect
– Insignificant
Overall Accuracy 0.87 +0.01 Precision 0.88 +0.02 Recall = Sensitivity 0.88 -0.01 Specificity 0.86 +0.02
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Saturation Model
• Chip count – Cancer: 76,160 – Benign: 68,190
• MATLAB
– B(:,:,:) = A(:,:,:) * 2;
• Effect – Better at benign tissue – De-saturation = insignificant effect
Overall Accuracy 0.88 +0.02 Precision 0.89 +0.03 Recall = Sensitivity 0.88 -0.01 Specificity 0.88 +0.04
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Poisson Noise Model
• Chip count – Cancer: 76,160 – Benign: 68,190
• MATLAB
– B = imnoise(A, ‘poisson’);
• Effect
– Promising, needs more variations (salt & pepper, etc)
Overall Accuracy 0.90 +0.04 Precision 0.89 +0.03 Recall = Sensitivity 0.91 +0.02 Specificity 0.87 +0.03
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Inked Model
• Chip count – Cancer: 76,160 – Benign: 68,190
• MATLAB
– B = ink(A); % 100-line script
• Effect
– Insignificant – probably only helped with edge cases
Overall Accuracy 0.88 +0.02 Precision 0.87 +0.01 Recall = Sensitivity 0.90 +0.01 Specificity 0.85 +0.01
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Streaked Model
• Chip count – Cancer: 76,160 – Benign: 68,190
• Pseudo-MATLAB
– Choose random column – Starting from that col, add ++i to each col – Do for 80 cols
• Effect
– Similar to inked version – edge cases?
Overall Accuracy 0.88 +0.02 Precision 0.88 +0.02 Recall = Sensitivity 0.90 +0.01 Specificity 0.86 +0.02
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Conclusions
• For medical imagery – Didn’t help:
• Varying chip stride (a.k.a. translation) • Hue • Adding “ink” • Adding “scanner streaks”
– Not sure: • Horizontal and vertical flips
– Helpful: • Rotations (surprising – implies quantifiable orientation?) • Oversaturation helped with benign tissue true identification • Adding random noise was the most help
– Only tried Poisson, will explore other options
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Current incarnation uses TensorFlow
• Shapely, TensorFlow, GoogLeNet, TensorBox • Augmented with rotations + noise only • Struggles with arteries, seminal vesicles • Correctly labels benign tissue regions that
were incorrectly included in cancer annotations
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Next steps
• No signs of overfitting – but seeking more data • Adaptive refinement
– Detection @ low-magnification whole slide image initially – Classification @ high-magnification regions of interest
• Try 128px chips (or higher magnification 256px chips) to reduce chance of multiple
tissue types per image
• Ensemble classifiers for the edge cases
• Evaluating possible R&D deployment via HPCMP Portal on Hokule’a machine
> <
=
Integrity Applications Incorporated 15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com
Thank you
Ted Hromadka Integrity Applications Incorporated [email protected]