K40 - Debunking Climate Denialism Junk “Science” and Other Claims.
Using Multiple GPUs To Reconstruct The Brain From ... › gtc › 2015 › ...2 images with 1 GPU...
Transcript of Using Multiple GPUs To Reconstruct The Brain From ... › gtc › 2015 › ...2 images with 1 GPU...
-
INSTITUTE OF NEUROSCIENCE AND MEDICINE (INM-1)
Using Multiple GPUs To Reconstruct
The Brain From Histological Images
Prof. Dr. Katrin Amunts
Dr. Markus Axer
Dr. Timo Dickscheid
Jiri Kraus
-
High Resolution
Image Data
Requirements:
1 2 Accurate Reconstruction
Algorithms
-
Part 1
Data & Algorithm
-
Preparation
Imaging
-
Preparation
Imaging
Analysis*
* M. Axer, A novel approach to the human connectome (NeuroImage, 2011)
-
Preparation
Imaging
Analysis*
Registration
Blockface images (800 sections)
Histologies (150 sections)
* M. Axer, A novel approach to the human connectome (NeuroImage, 2011)
-
* D. Rueckert, Nonrigid registration using free-form deformations (IEEE Trans Med Imaging, 1999)
1. Geometric Transformation (B-Spline model)*
-
2. Metric function (Mutual Information, MI)**
3. Optimizer
* D. Rueckert, Nonrigid registration using free-form deformations (IEEE Trans Med Imaging, 1999)
** J. Pluim, Mutual information based registration of medical images (IEEE Trans Med Imaging, 2003)
1. Geometric Transformation (B-Spline model)*
-
Histologies (b-spline):
Blockfaces:
-
Histologies (affine):
Blockfaces:
-
• Layers: 1000
• Grid size: 10x10
Assumptions:
200.000 Parameters
Solution:
• Efficient global metric
• Efficient optimizer (MRF)
-
Efficient global metric*:
* B. Glocker, Dense Image Registration through MRFs and efficient linear programming (Medical Image Analysis, 2008)
M. Feuerstein, Reconstruction of 3D Histology Images by Simultaneous Deformable Registration (MICCAI, 2011)
1. Similarity between Histology and Blockface
2. Similarity between consecutive Histologies
-
Efficient global metric*:
1. Similarity between Histology and Blockface
(#Displ · #Nodes · #Images) Data Terms
40 · 100 · 1000 = 4.000.000
2. Similarity between consecutive Histologies
* B. Glocker, Dense Image Registration through MRFs and efficient linear programming (Medical Image Analysis, 2008)
M. Feuerstein, Reconstruction of 3D Histology Images by Simultaneous Deformable Registration (MICCAI, 2011)
-
1. Similarity between Histology and Blockface
(#Displ · #Nodes · #Images) Data Terms
40 · 100 · 1000 = 4.000.000
2. Similarity between consecutive Histologies
(#Displ2 · #Nodes · #Gaps) Data Terms
402 · 100 · 999 ≈ 160.000.000
3. Optimizer (MRF*): Best node displacements
Efficient global metric*:
* B. Glocker, Dense Image Registration through MRFs and efficient linear programming (Medical Image Analysis, 2008)
M. Feuerstein, Reconstruction of 3D Histology Images by Simultaneous Deformable Registration (MICCAI, 2011)
-
1. Similarity between Histology and Blockface
(#Displ · #Nodes · #Images) Data Terms
40 · 100 · 1000 = 4.000.000
2. Similarity between consecutive Histologies
(#Displ2 · #Nodes · #Gaps) Data Terms
402 · 100 · 999 ≈ 160.000.000
3. Optimizer (MRF*): Best node displacements
Refinement
(few 100 iterations)
Efficient global metric*:
* B. Glocker, Dense Image Registration through MRFs and efficient linear programming (Medical Image Analysis, 2008)
M. Feuerstein, Reconstruction of 3D Histology Images by Simultaneous Deformable Registration (MICCAI, 2011)
-
Simultaneous (global) Registration:
Section-wise Registration:
(150 CPUs 14 hours)
-
Part 2
GPU-accelerated Implementation
-
1. Distribution of the image sections among mutliple GPUs
2. Each GPU delivers the data terms depending on its
assigned image sections
calcDataTerms_Horizontally( bf_image, histo_image, nodeDisp d)
• establishJointHistograms(d) 100 Joint Histograms
• establishMarginalHistograms() 300 Histograms
• calculate_MIValues() 100 MI values
calcDataTerms_Vertically( histo_1, histo_2, d1, d2 )
• establishJointHistograms(d1,d2) 100 Joint Histograms
• establishMarginalHistograms() 300 Histograms
• calculate_MIValues() 100 MI values
-
JuDGE (Westmere + Fermi) PSG-Cluster (Ivy Bridge + Kepler)
-
2 images with 1 GPU (PSG-Cluster)
M2090 K40
81 %
30 %
1. Multiple CUDA-streams (Double Buffering)
40 sec 33 sec (17.5 % on Kepler)
33 sec
342 sec
2. Incremental atomic operations
Non-atomic: 92 sec vs. 39 sec (2.4 x)
Atomic: 342 sec vs. 33 sec (10.3 x)
Transfer additional load to the GPU!
-
• Multiple GPUs offer the power to solve a simultaneous registration within a
reasonable time
• In the future: Optimization for microscopic images (memory limitations)
-
INM, Research Centre Jülich
Prof. Dr. Katrin Amunts
Dr. Markus Axer
Dr. Timo Dickscheid
David Graessel
Philipp Schlömer
Daniel Schmitz
Martin Schober
Nicole Schubert
JSC, Research Centre Jülich
Oliver Bücker
Andrew V. Adinetz
Nvidia Support
Jiri Kraus
Contact: Marcel Huysegoms, [email protected]
-
Appendix
-
Preparation
Imaging
Low Resolution:
Size: 3.000 × 3.000 pixel
Pixel size: 64 μm × 64 μm
File size: 10 MB (8 bit)
High Resolution:
Size: 100.000 × 100.000 pixel
Pixel size: 1.3 μm × 1.3 μm
File size: 10 GB (8 bit)
Analysis*
* M. Axer, A novel approach to the human connectome (NeuroImage, 2011)
-
1. Distribution of the image sections among mutliple GPUs
2. Each GPU delivers the data terms depending on its
assigned image sections
calcDataTerms_Horizontally( bf_image, histo_image, nodeDisp d)
• establishJointHistograms(d) 100 Joint Histograms
• establishMarginalHistograms() 300 Histograms
• calculate_MIValues() 100 MI values
1000
X
40
times
calcDataTerms_Vertically( histo_1, histo_2, d1, d2 )
• establishJointHistograms(d1,d2) 100 Joint Histograms
• establishMarginalHistograms() 300 Histograms
• calculate_MIValues() 100 MI values
999
X
1600
times
-
JuDGE (Westmere + Fermi) vs. PSG-Cluster (Ivy Bridge + Kepler)