Using Multiple GPUs To Reconstruct The Brain From ... › gtc › 2015 › ...2 images with 1 GPU...

29
INSTITUTE OF NEUROSCIENCE AND MEDICINE (INM-1) Using Multiple GPUs To Reconstruct The Brain From Histological Images Prof. Dr. Katrin Amunts Dr. Markus Axer Dr. Timo Dickscheid Jiri Kraus

Transcript of Using Multiple GPUs To Reconstruct The Brain From ... › gtc › 2015 › ...2 images with 1 GPU...

  • INSTITUTE OF NEUROSCIENCE AND MEDICINE (INM-1)

    Using Multiple GPUs To Reconstruct

    The Brain From Histological Images

    Prof. Dr. Katrin Amunts

    Dr. Markus Axer

    Dr. Timo Dickscheid

    Jiri Kraus

  • High Resolution

    Image Data

    Requirements:

    1 2 Accurate Reconstruction

    Algorithms

  • Part 1

    Data & Algorithm

  • Preparation

    Imaging

  • Preparation

    Imaging

    Analysis*

    * M. Axer, A novel approach to the human connectome (NeuroImage, 2011)

  • Preparation

    Imaging

    Analysis*

    Registration

    Blockface images (800 sections)

    Histologies (150 sections)

    * M. Axer, A novel approach to the human connectome (NeuroImage, 2011)

  • * D. Rueckert, Nonrigid registration using free-form deformations (IEEE Trans Med Imaging, 1999)

    1. Geometric Transformation (B-Spline model)*

  • 2. Metric function (Mutual Information, MI)**

    3. Optimizer

    * D. Rueckert, Nonrigid registration using free-form deformations (IEEE Trans Med Imaging, 1999)

    ** J. Pluim, Mutual information based registration of medical images (IEEE Trans Med Imaging, 2003)

    1. Geometric Transformation (B-Spline model)*

  • Histologies (b-spline):

    Blockfaces:

  • Histologies (affine):

    Blockfaces:

  • • Layers: 1000

    • Grid size: 10x10

    Assumptions:

    200.000 Parameters

    Solution:

    • Efficient global metric

    • Efficient optimizer (MRF)

  • Efficient global metric*:

    * B. Glocker, Dense Image Registration through MRFs and efficient linear programming (Medical Image Analysis, 2008)

    M. Feuerstein, Reconstruction of 3D Histology Images by Simultaneous Deformable Registration (MICCAI, 2011)

    1. Similarity between Histology and Blockface

    2. Similarity between consecutive Histologies

  • Efficient global metric*:

    1. Similarity between Histology and Blockface

    (#Displ · #Nodes · #Images) Data Terms

    40 · 100 · 1000 = 4.000.000

    2. Similarity between consecutive Histologies

    * B. Glocker, Dense Image Registration through MRFs and efficient linear programming (Medical Image Analysis, 2008)

    M. Feuerstein, Reconstruction of 3D Histology Images by Simultaneous Deformable Registration (MICCAI, 2011)

  • 1. Similarity between Histology and Blockface

    (#Displ · #Nodes · #Images) Data Terms

    40 · 100 · 1000 = 4.000.000

    2. Similarity between consecutive Histologies

    (#Displ2 · #Nodes · #Gaps) Data Terms

    402 · 100 · 999 ≈ 160.000.000

    3. Optimizer (MRF*): Best node displacements

    Efficient global metric*:

    * B. Glocker, Dense Image Registration through MRFs and efficient linear programming (Medical Image Analysis, 2008)

    M. Feuerstein, Reconstruction of 3D Histology Images by Simultaneous Deformable Registration (MICCAI, 2011)

  • 1. Similarity between Histology and Blockface

    (#Displ · #Nodes · #Images) Data Terms

    40 · 100 · 1000 = 4.000.000

    2. Similarity between consecutive Histologies

    (#Displ2 · #Nodes · #Gaps) Data Terms

    402 · 100 · 999 ≈ 160.000.000

    3. Optimizer (MRF*): Best node displacements

    Refinement

    (few 100 iterations)

    Efficient global metric*:

    * B. Glocker, Dense Image Registration through MRFs and efficient linear programming (Medical Image Analysis, 2008)

    M. Feuerstein, Reconstruction of 3D Histology Images by Simultaneous Deformable Registration (MICCAI, 2011)

  • Simultaneous (global) Registration:

    Section-wise Registration:

    (150 CPUs 14 hours)

  • Part 2

    GPU-accelerated Implementation

  • 1. Distribution of the image sections among mutliple GPUs

    2. Each GPU delivers the data terms depending on its

    assigned image sections

    calcDataTerms_Horizontally( bf_image, histo_image, nodeDisp d)

    • establishJointHistograms(d) 100 Joint Histograms

    • establishMarginalHistograms() 300 Histograms

    • calculate_MIValues() 100 MI values

    calcDataTerms_Vertically( histo_1, histo_2, d1, d2 )

    • establishJointHistograms(d1,d2) 100 Joint Histograms

    • establishMarginalHistograms() 300 Histograms

    • calculate_MIValues() 100 MI values

  • JuDGE (Westmere + Fermi) PSG-Cluster (Ivy Bridge + Kepler)

  • 2 images with 1 GPU (PSG-Cluster)

    M2090 K40

    81 %

    30 %

    1. Multiple CUDA-streams (Double Buffering)

    40 sec 33 sec (17.5 % on Kepler)

    33 sec

    342 sec

    2. Incremental atomic operations

    Non-atomic: 92 sec vs. 39 sec (2.4 x)

    Atomic: 342 sec vs. 33 sec (10.3 x)

    Transfer additional load to the GPU!

  • • Multiple GPUs offer the power to solve a simultaneous registration within a

    reasonable time

    • In the future: Optimization for microscopic images (memory limitations)

  • INM, Research Centre Jülich

    Prof. Dr. Katrin Amunts

    Dr. Markus Axer

    Dr. Timo Dickscheid

    David Graessel

    Philipp Schlömer

    Daniel Schmitz

    Martin Schober

    Nicole Schubert

    JSC, Research Centre Jülich

    Oliver Bücker

    Andrew V. Adinetz

    Nvidia Support

    Jiri Kraus

    Contact: Marcel Huysegoms, [email protected]

  • Appendix

  • Preparation

    Imaging

    Low Resolution:

    Size: 3.000 × 3.000 pixel

    Pixel size: 64 μm × 64 μm

    File size: 10 MB (8 bit)

    High Resolution:

    Size: 100.000 × 100.000 pixel

    Pixel size: 1.3 μm × 1.3 μm

    File size: 10 GB (8 bit)

    Analysis*

    * M. Axer, A novel approach to the human connectome (NeuroImage, 2011)

  • 1. Distribution of the image sections among mutliple GPUs

    2. Each GPU delivers the data terms depending on its

    assigned image sections

    calcDataTerms_Horizontally( bf_image, histo_image, nodeDisp d)

    • establishJointHistograms(d) 100 Joint Histograms

    • establishMarginalHistograms() 300 Histograms

    • calculate_MIValues() 100 MI values

    1000

    X

    40

    times

    calcDataTerms_Vertically( histo_1, histo_2, d1, d2 )

    • establishJointHistograms(d1,d2) 100 Joint Histograms

    • establishMarginalHistograms() 300 Histograms

    • calculate_MIValues() 100 MI values

    999

    X

    1600

    times

  • JuDGE (Westmere + Fermi) vs. PSG-Cluster (Ivy Bridge + Kepler)