Cluster of Workstation Based Non-rigid Image Registration Using Free-Form Deformation

Cluster of Workstation Based Image Registration Using Free-Form Deformation

Xiaofen Zheng, Jayaram Udupa, Xinjian ChenMedical Image Processing Group Department of RadiologyUniversity of Pennsylvania

Feb 10, 2008 (4:30 4:50pm)

Cluster of Workstation Based Non-rigid Image Registration Using Free-Form Deformation

1Outline3D nonrigid registration method and its parallelizationLarge image data setsParallel computing: cluster of workstations (COW)

Results

Time analysis: sequential vs. parallelAs we all know, 3d nonrigid registration involves complex and expensive computation, the process can be very slow, this limits their clinical usability. To address this problem, especially on large image data, we present a methodology for parallelizing such algorithms.COW is commonly available, and its cheaper, easy to extend and has vast amount of memory. Considering these advantages, we choose COW for parallelizing registration. In this talk Ill also show some experimental results and time analysis of sequential implementation vs parallel computing.

2Registration AlgorithmB-spline coefficientsImage pyramidOptimizationOutput computingSuccessive 1-D filtering and reduction [Unser1993]Image pyramidRegistration algorithm includes a few steps. These steps are sequentially executed, but parallelization within each step can reduce computation time dramatically.In order to achieve model flexibility, reduce time cost and avoid local minima, we implemented hierarchical multiresolution optimization. The first step is computing image pyramids from fine to coarse level by level. We implemented cubic B-spline pyramids. 3

Taking advantage of the B-splines separable property, a 3-D image pyramid is obtained by sucessive 1D filtering and reduction along rows, columns, and slices of the image. We parallel this step by first breaking the image into equally sized partitions in Z. Each cluster node is assigned one of these chunks and it filters and decimates along X first and then Y. Then results are equally partitioned in Y for filtering along Z.Each node can keep these chunks for the next coarse level, starting from filtering and decimating z and then x and ywithout communication.4Registration AlgorithmB-spline coefficientsImage pyramidOptimizationOutput computingOn these output image pyramids, the deformation field is optimized level by level from coarse level to fine level. 5Registration AlgorithmB-spline coefficientsImage pyramidOptimizationOutput computingB-spline image representation and coefficients using 1-D recursive filters [Unser1991]Thevenaz and Unsers image model via cubic Bspline [Thvenaz 2000]In each level, we employ Thevenaz and Unsers image model via cubic Bspline. The 3D image B-spline coefficients are obtained by 1D recursive filtering successively along 3 directions of the image. The parallelization of this step is similar to 3D image pyramid computing.6Registration AlgorithmB-spline coefficientsImage pyramidOptimizationOutput computingAnalytic method of computing gradient of MI [Thvenaz 2000]Stochastic gradient descent optimization [Klein 2007]We choose MI as the similarity measure, using gradient descent method, the transformation parameters are optimized by calculating the gradient of mutual information. Since there are so many parameters, it takes very long to optimize. With Thevenaz and Unsers image model, both the deformation and mutual information are differentiable. they developed an analytic method of computing the gradient of mutual information on parsen window.7Optimization

Derivative of Mutual Information (MI) [Thvenaz 2000]

from the joint probability distribution of two images, the gradient of the joint probability distribution, and the marginal probability distribution of the test image. These two are the summation over the sample set, so in parallel algorithm, each node can compute a part of the joint probability of the samples on the chunk. Master machine collects all the results, computes the results and then sends it to each computer.

8

Only those 64 control points nearest a voxel contribute to the sums, so each cluster computes the part of dS then master machine collects these results and updates the control points. Because of the deformation, each node gets extra part of image. Gray is the main part of partitioning , and white is the overlap of extra part.

Wed need calculate MI on each move of the control points, and calculate the deformed image with the original image. This analytic method calculate derivate of MI and skipped calculate, but for large 3d image, its still slow. Therefore we used stochastic gradient descent method for optimization and the dMI is computed on random 2048 voxels in each iteration.9Registration AlgorithmB-spline coefficientsImage pyramidOptimizationOutput computingControl points refinement between two levels [Maurer 2000]In the multiresolution application, B-spline deformation can be refined by dividing the control point spacing in every dimension by two. Starting from the coarsest level, control point parameters are optimized in each iteration at each level and refined for the next finer level.10Registration AlgorithmB-spline coefficientsImage pyramidOptimizationOutput computingCubic B-spline Deformation [Mattes 2003]Thevenaz and Unsers image model via cubic Bspline [Thvenaz 2000]In last step, the image is deformed and interpolated using the optimized deformation parameters. To compute the output image, parallelization is achieved easily by computing on the previous distributedtest image chunk on each node. Each node only computes the voxels in the main part of the image chunks, and the extra part of the image chunk is used for the deformation and interpolation.11Experiment10 workstations (each has Pentium D 3.4 GHz CPU and 4 GB of main memory) through 1GB/s switch

Large CT imageSize : 512512459, voxel: 0.680.681.5 mm^3Control mesh: 272752 (113,724)100 iteration of optimization in each level

Regular brain MRI imageSize : 25625646, voxel: 0.980.983 mm^3Control mesh: 272715 (10,935)100 iteration of optimization in each level

We tested the sequential algorithm and parallel algorithm on a cluster with 10 workstations connected through1 GB/s switch. Each workstation has a Pentium D 3.4 GHz CPU and 4 GB of main memory.The reference image is the original image and the test image in each experiment was created by deforming thereference image by a known deformation field.2 image data set were used. Large image size is 512 by 512 by 459, and control spacing is set to 13.6 mm on the finest level, this is about 114 thousand parameters. Regular size image is 256 by 256 by 46, on finest level, theres bout 11 thousand parameters to optimize.Time analysis is based on large CT image.12

Time analysis (sequential vs. parallel)Scaled time comparison for sequential and parallel computing for each step on each level.This is the time comparison in each step at each level. The time axis is scaled to cube root to show the difference at all levels. From left to right is from coarse level to fine level. The parallel computing time is far less than the sequential computing time in all the processes except the optimization on the finest level. If the computer has enough memory to deal with the sequential optimization, using sequential optimization this step is prefered. But parallel method can be usefulwhen the memory is low for computing.13

Cumulative Time cost of sequential, parallel and combined solution in each step.

This figure shows cumulative time comparison. From left to right, it shows the algorithm step by step. The first step is computing image pyramids from fine to coarse level by level. Then from coarse to fine, deformation parameters are optimized. The last step is to compute the output image. The accumulative sequential time is 1 and the accumulative parallel time of 2,5,10 cluster is shown by these curves. The combined curve is choosing sequential optimization on the finest level and parallel on 10 nodes for all the rest of the processing. Our experiments of 10 cluster machines indicated that a speed up of 0.5 times number of computers is feasible14Results (large image)

Reference image (original CT image)Test image (known deformed image)Overlay test image with reference imageOutput imageOverlay output image with reference image

Result on large15Results (large image)

Reference image (original CT image)Test image (known deformed image)Overlay test image with reference imageOutput imageOverlay output image with reference imageResult on large16

Results (regular image)Reference image (original brain MRI image)

Test image (deformed image) Overlay reference image with test imageOutput imageOverlay reference image with output imageResult on regular17ConclusionImportant to tackle time-critical clinical applications

A general parallel strategy

Complex interplay

Implemented in CAVASS softwareNonrigid registration is very time consuming. Parallel registration approaches are important to tackle many time-critical clinical applications.We also examined an orchestration of a general parallel strategy for nonrigid image registration by utilizingthe various efficient techniques that have been reported for the component stepsComplex interplay of the various steps involved in nonrigid image registration is analyzed from the point view of parallelizing registration. (There is a complex interplay of the various steps involved in non-rigid image registration)This general parallel schema is based on portable distributed computing on a cluster of PCs and its implementation in the CAVASS software via distributed computing on a COW is portable.18Reference[Klein 2007] Stefan Klein, Marius Staring, Josien P.W. Pluim, Evaluation of Optimization Methods for Nonrigid Medical Image Registration using Mutual Information and B-splines, IEEE Transactions on Image Processing, vol. 16, pp. 2879-2890, 2007.[Thvenaz 2000] Philippe Thvenaz, Michael Unser, Optimization of Mutual Information for Multiresolution Image Registration, IEEE Transactions on Image Processing, vol. 9, no. 12, pp. 2083-2099, December 2000.[Unser1993] Michael Unser, Akram Aldroubi, Murray Eden, The L2 Polynomial Spline Pyramid, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 4, pp. 364-379, April 1993[Unser1991] Michael Unser, Akram Aldroubi, Murray Eden, Fast B-Spline Transforms for Continuous Image Representation and Interpolation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 3, pp. 277-285, March 1991.[Maurer2003] Torsten Rohlfing, Calvin R. Maurer, Nonrigid Image Registration in Shared-Memory Multiprocessor Environments with Application to Brains, Breasts, and Bees, IEEE Transactions on Information Technology in Biomedicine, vol. 7, no. 1, pp. 16-25, March 2003.[Rohlfing2001] Torsten Rohlfing, Calvin R. Maurer, Walter G. ODell, Jianhui Zhong, Modeling liver motion and deformation during the respiratory cycle using intensity-based free-form registration of gated MR images, SPIE Medical Imaging Conference Proceedings vol. 4319, pp. 337-348, 2001.[Mattes 2003] Mattes, D., Haynor, D. R., Vesselle, H., Lewellen, T. K., and Eubank, W., PET-CT image registration in the chest using free-form deformations, IEEE Transactions on Medical Imaging 22(1), pp.120128, 2003.[Maurer 2001] Rohlfing, T., Maurer, C. R., ODell, W. G., and Zhong, J., Modeling liver motion and deformation during the respiratory cycle using intensity-based free-form registration of gated MR images, Medical Imaging, Proc. SPIE 4319, pp. 337348, 2001.19

Cluster of Workstation Based Non-rigid Image Registration Using Free-Form Deformation

Documents

Transcript of Cluster of Workstation Based Non-rigid Image Registration Using Free-Form Deformation