Massively parallel implementation of Total-FETI DDM with application to medical image registration
description
Transcript of Massively parallel implementation of Total-FETI DDM with application to medical image registration
![Page 1: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/1.jpg)
Massively parallel implementation of Total-FETI DDM with application
to medical image registrationMichal Merta
Alena VašatováVáclav HaplaDavid Horák
DD21, Rennes, France
![Page 2: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/2.jpg)
solution of large-scale scientific and engineering problems possibly hundreds of millions DOFs linear problems non-linear problems
non-overlapping, FETI methods with up to tens of thousands of subdomains
usage of PRACE Tier-1 and Tier-0 HPC systems
Motivation
![Page 3: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/3.jpg)
developed by Argonne National Laboratory data structures and routines for the scalable parallel
solution of scientific applications modeled by PDE coded primarily in C language, but good FORTRAN
support, can also be called from C++ and Python codes current version is 3.2 www.mcs.anl.gov/petsc petsc-dev (development branch) is intensively evolving code and mailing lists open to anybody
PETSc(Portable, Extensible Toolkit for Scientific computation)
![Page 4: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/4.jpg)
PETSc components
seq. / par.
![Page 5: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/5.jpg)
developed by Sandia National Laboratories collection of relatively independent packages toolkit for basic linear algebra operations, direct and
iterative solvers for linear systems, PDE discretization utilities, mesh generation tools etc.
object oriented design, high modularity, use of modern C++ features (templating)
mainly in C++ (Fortran and Python bindings) current version 10.10 trilinos.sandia.gov
Trilinos
![Page 6: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/6.jpg)
Trilinos components
![Page 7: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/7.jpg)
are parallelized on the data level (vectors & matrices) using MPI
use BLAS and LAPACK – de facto standard for dense LA have their own implementation of sparse BLAS include robust preconditioners, linear solvers (direct and
iterative) and nonlinear solvers can cooperate with many other external solvers and
libraries (e.g. MATLAB, MUMPS, UMFPACK, …) support CUDA and hybrid parallelization are licensed as open-source
Both PETSc and Trilinos…
![Page 8: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/8.jpg)
Problem of elastostatics
... boundary with prescribed surface traction
... boundary with prescribed
... isotropic e
displacements ... body loads
lastic body
F
U
f
f
![Page 9: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/9.jpg)
TFETI decomposition
12 G
34 G
24 G13 G
... artificial boundariesbetween subdomains and with prescribed gluing conditions- enforced byLagrange multipliers
pqG
p q
![Page 10: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/10.jpg)
The FEM discretization with a suitable numbering of nodes results in the QP problem:
Primal discretized formulation
1min s. t.2
T T u
u Ku f u Bu c1diag( ) is a symmetric positive semidefinite (and so singular in general)block-diagonal global stiffness matrix
is a stiffness matrix of the subdomain is a
,...,
full rank cons r t t ain
NS
s
n
sm
n
n
B
KK K
K matrix, constraint RHS
1 is a load vectornfc
![Page 11: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/11.jpg)
Dual discretized formulation(homogenized)
1min s.t.2
T T λ
λ Aλ λ b Gλ o
1( ) (Im Ker (
( ) (
Im Ker )
Im Ker )
)
T
T T
T T T
K K
F BK
G R B
GG G Q GP I Q P GA
K KKR R K
B
Q
FP
G
P Q
10
0())
(
T
T T
f
λ G
d BK
e R f
G eb d F
GλP
QP problem again, but with lower dimension and simpler constraints
![Page 12: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/12.jpg)
Primal data distribution,F action
… straightforwardmatrix distribution,
given by a decomposition
*Fλ
very sparse
block diagonal embarrassingly parallel
![Page 13: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/13.jpg)
Coarse projector action
1( ) ,T T GG G PG IQ Q
*
… can easily take 85 % of computation time if not properly parallelized!
?
?
?
![Page 14: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/14.jpg)
G preprocessing and action
preprocessing
action
?
![Page 15: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/15.jpg)
Coarse problempreprocessing and action
preprocessing
action
? 3
1
2
Currently used variant: B2(PPAM 2011)
![Page 16: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/16.jpg)
Coarse problem
![Page 17: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/17.jpg)
the UK's largest, fastest and most powerful supercomputer supplied by Cray Inc., operated by EPCC
uses the latest AMD "Bulldozer" multicoreprocessor architecture
704 compute blades each blade with 4 compute nodes giving
a total of 2816 compute nodes each node with two 16-core AMD Opteron 2.3GHz Interlagos
processors → 32 cores per node total of 90 112 cores each 16-core processor shares 16Gb of memory, in total 60 Tb theoretical peak performance over 800 Tflops
HECToR phase 3 (XE6)
www.hector.ac.uk
![Page 18: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/18.jpg)
K+ implemented as direct solve (LU) of regularized K
built-in CG routine used(PETSc.KSP, Trilinos.Belos)
E = 1e6, = 0.3, g = 9.81 ms-2 computed @ HECToR
Benchmark
![Page 19: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/19.jpg)
Results
# subds = # cores 1 4 16 64 256 1024
Prim. dim. 31752 127 008 508 032 2 032 128
8 128 512
32 514 048
Dual dim. 252 1512 7056 30240 124992 508032Solution time Trilinos 1.39 3.01 4.80 6.25 10.31 28.05 PETSc 1.14 2.66 4.16 4.74 4.92 5.84# iterations Trilinos 34 63 96 105 105 102 PETSc 33 68 94 105 105 1021 iter. time Trilinos 4.48e-2 4.76e-2 5.00e-2 5.95e-2 9.81e-2 2.75e-1
PETSc 3.46e-2 3.92e-2 4.42e-2 4.52e-2 4.69e-2 5.73e-2
stopping criterion: ||rk|| / || r0|| < 1e-5 without preconditioning
![Page 20: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/20.jpg)
Process of integrating information from two (or more) different images
Images from different sensors, different angles or/and times
Application to image registration
![Page 21: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/21.jpg)
Application to image registration
In medicine: Monitoring of growth of a tumour Therapy valuation Comparison of patient data with anathomical atlas Data from magnetic resonance (MR), computer
tomography (CT), positron emission tomography (PET)
![Page 22: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/22.jpg)
The task is to minimize the distance between two images
Elastic registration
𝜑≔𝑥−𝑢 (𝑥 )→
𝑇 𝑅
![Page 23: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/23.jpg)
Parallelization using TFETI method
Elastic registration
![Page 24: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/24.jpg)
# of subdomains 1 4 16
Primal variables 20402 81608 326432
Dual variables 903 2641 8254
Solution time [s] 41 34.54 57.44
# of iterations 2467 990 665
Time/iteration [s] 0.01 0.03 0.08
Results
stopping criterion: ||rk|| / || r0|| < 1e-5
![Page 25: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/25.jpg)
Solution
![Page 26: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/26.jpg)
To consolidate PETSc & Trilinos TFETI implementation to the form of extensions or packages
To further optimize the codes using core-hours on Tier-1/Tier-0 systems (PRACE DECI Initiative, HPC-Europa2)
To extend image registration to 3D data
Conclusion and future work
![Page 27: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/27.jpg)
KOZUBEK T. et al. Total FETI domain decomposition method and its massively parallel implementation. Accepted for publishing in Advances in Engineering Software.
HORAK, D.; HAPLA, V. TFETI coarse space projectors parallelization strategies. Accepted for publishing in the proceedings of PPAM 2011, Springer LNCS, 2012.
Zitova B., Flusser J., Image registration methods: a survey, Image and Vision Computing, Vol.21, No.11, 2003, pp. 977-100.
References
![Page 28: Massively parallel implementation of Total-FETI DDM with application to medical image registration](https://reader035.fdocuments.net/reader035/viewer/2022062814/56816711550346895ddb7a9b/html5/thumbnails/28.jpg)
Thank you for your attention!