Bayesian image reconstruction for emission tomography

5
Proc. Natl. Acad. Sci. USA Vol. 88, pp. 3223-3227, April 1991 Medical Sciences Bayesian image reconstruction for emission tomography incorporating Good's roughness prior on massively parallel processors (penalized likelihood/expectation-maxiization algorithm/paraliel computers) MICHAEL I. MILLER* AND BADRINATH ROYSAMt *Department of Electrical Engineering and Institute for Biomedical Computing, Washington University, St. Louis, MO 63130; and tDepartment of Electrical, Computer and Systems Engineering, Rennselaer Polytechnic Institute, Troy, NY 12180 Communicated by William D. Phillips, December 28, 1990 ABSTRACT Since the introduction by Shepp and Vardi [Shepp, L. A. & Vardi, Y. (1982) IEEE Trans. Med. Imaging 1, 113-121] of the expectation-maximiztion algorithm for the generation of maximum-likelihood images in emission tomog- raphy, a number of investigators have applied the maximum- likelihood method to imaging problems. Though this approach is promising, it is now well known that the unconstrained maximum-likelihood approach has two major drawbacks: (i) the algorithm is computationally demanding, resulting in re- construction times that are not acceptable for routine clinical application, and (ii) the unconstrained maximum-likelihood estimator has a fundamental noise artifact that worsens as the iterative algorithm climbs the likelihood hill. In this paper the computation issue is addressed by proposing an implementa- tion on the class of massively parallel single-instruction, mul- tiple-data architectures. By restructuring the superposition integrals required for the expectation-maximization algorithm as the solutions of partial differential equations, the local data passage required for efficient computation on this class of machines is satisfied. For dealing with the "noise artifact" a Markov random field prior determined by Good's rotationally invariant roughness penalty is incorporated. These methods are demonstrated on the single-instruction multiple-data class of parallel processors, with the computation times compared with those on conventional and hypercube architectures. 1. Introduction The image reconstruction problems addressed here arise in a variety of contexts in emission tomography where A(x), the image to be reconstructed, is an unknown positive intensity. Because of the physics of the detectors used in emission tomography, the measurements are fundamentally different in two ways from the conventional line-projection measure- ments of transmission tomography. The first is that in positron-emission tomography (PET) (1) and single-photon- emission tomography (SPET) (2) the blurring due to collima- tor geometry and detector electronics result in Gaussian shaped projection functions through the image. Second, the measurements are photon limited and well modeled as Pois- son counting processes with means given by the line- projections through the underlying tracer concentration. Since the measurements are a random process, any attempt to simply invert the line-projection operator (as is done by means of the Radon transform in transmission tomography) may yield inconsistent estimates. The reconstruction approach adopted here is based on the maximum-likelihood (ML) method, which produces esti- mates that maximize the Poisson likelihood subject to the constraint A(x) 2 0. This is a difficult nonlinear optimization problem and until the application of the iterative expectation- maximization (EM) algorithm by Shepp and Vardi in 1982 (3), there was no method with known convergence properties for generating the ML solution. Following its introduction, there have been a host of investigators applying the ML method (4-11). The ML method is promising for clinical applications as it has the potential to yield higher resolution and lower noise reconstructions than conventional methods. For ex- ample, our group has shown decreases in variances in brain phantom images of upwards of a factor of 4 for constrained versions of the ML approach when compared to conventional linear approaches (10). This method has significant potential for physiological studies requiring low radioactive tracer dosages [see Wagner (12)]. Though the ML method has great promise, it suffers from two major limitations. (i) The kernel computation in the iterative algorithm requires regeneration of the linear superposition integrals corresponding to the generalized-projection measurements, twice per view angle per iteration, requiring several hundred superposition inte- grals per iteration of the EM algorithm, a formidable com- putational hurdle. (ii) It is now well known that application of unconstrained nonparametric ML estimation results in im- ages that exhibit noise-like artifacts in the form of highly concentrated peaks and valleys that worsen as the EM algorithm climbs the "likelihood-hill" toward the ML solution (6). The objectives of this paper are to demonstrate an efficient implementation of the EM reconstruction algorithm and to implement Good's roughness penalty for stabilizing the ML reconstructions. With the recent advent of the massively parallel processors based on mesh-connected arrays of bit- serial processor elements on single integrated circuits, it is now possible by proper restructuring of the computations to implement imaging algorithms with computation times that are several orders of magnitude lower than that obtained with conventional processors and scaling rules that are indepen- dent of the number of pixels in the image. By reformulating the Gaussian weighted superposition integrals as solutions of partial differential diffusion equations (PDEs) the speed-up required for clinical application may be attained. The key to the speed-up resides in the fact that the PDE solution exploits the most important requirement of mesh-connected proces- sors, local data passage. The noise artifact associated with the unconstrained ML estimator is addressed by means of the introduction of Good's rotationally invariant roughness prior (13) into the likelihood functional, which leads to a maximum a posteriori (MAP) estimator that solves a set of nonlinear Abbreviations: PET, positron-emission tomography; SPET, single- photon-emission tomography; ML, maximum-likelihood; EM, ex- pectation-maximization; PDE, partial differential diffusion equation; TOF, time-of-flight; 2D, two-dimensional; iD, one-dimensional; MRF, Markov random field; MAP, maximum a posteriori. 3223 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Transcript of Bayesian image reconstruction for emission tomography

Page 1: Bayesian image reconstruction for emission tomography

Proc. Natl. Acad. Sci. USAVol. 88, pp. 3223-3227, April 1991Medical Sciences

Bayesian image reconstruction for emission tomographyincorporating Good's roughness prior on massivelyparallel processors

(penalized likelihood/expectation-maxiization algorithm/paraliel computers)

MICHAEL I. MILLER* AND BADRINATH ROYSAMt*Department of Electrical Engineering and Institute for Biomedical Computing, Washington University, St. Louis, MO 63130; and tDepartment of Electrical,Computer and Systems Engineering, Rennselaer Polytechnic Institute, Troy, NY 12180

Communicated by William D. Phillips, December 28, 1990

ABSTRACT Since the introduction by Shepp and Vardi[Shepp, L. A. & Vardi, Y. (1982) IEEE Trans. Med. Imaging1, 113-121] of the expectation-maximiztion algorithm for thegeneration of maximum-likelihood images in emission tomog-raphy, a number of investigators have applied the maximum-likelihood method to imaging problems. Though this approachis promising, it is now well known that the unconstrainedmaximum-likelihood approach has two major drawbacks: (i)the algorithm is computationally demanding, resulting in re-construction times that are not acceptable for routine clinicalapplication, and (ii) the unconstrained maximum-likelihoodestimator has a fundamental noise artifact that worsens as theiterative algorithm climbs the likelihood hill. In this paper thecomputation issue is addressed by proposing an implementa-tion on the class of massively parallel single-instruction, mul-tiple-data architectures. By restructuring the superpositionintegrals required for the expectation-maximization algorithmas the solutions of partial differential equations, the local datapassage required for efficient computation on this class ofmachines is satisfied. For dealing with the "noise artifact" aMarkov random field prior determined by Good's rotationallyinvariant roughness penalty is incorporated. These methodsare demonstrated on the single-instruction multiple-data classof parallel processors, with the computation times comparedwith those on conventional and hypercube architectures.

1. Introduction

The image reconstruction problems addressed here arise in avariety of contexts in emission tomography where A(x), theimage to be reconstructed, is an unknown positive intensity.Because of the physics of the detectors used in emissiontomography, the measurements are fundamentally differentin two ways from the conventional line-projection measure-ments of transmission tomography. The first is that inpositron-emission tomography (PET) (1) and single-photon-emission tomography (SPET) (2) the blurring due to collima-tor geometry and detector electronics result in Gaussianshaped projection functions through the image. Second, themeasurements are photon limited and well modeled as Pois-son counting processes with means given by the line-projections through the underlying tracer concentration.Since the measurements are a random process, any attemptto simply invert the line-projection operator (as is done bymeans of the Radon transform in transmission tomography)may yield inconsistent estimates.The reconstruction approach adopted here is based on the

maximum-likelihood (ML) method, which produces esti-mates that maximize the Poisson likelihood subject to the

constraint A(x) 2 0. This is a difficult nonlinear optimizationproblem and until the application of the iterative expectation-maximization (EM) algorithm by Shepp and Vardi in 1982 (3),there was no method with known convergence properties forgenerating the ML solution. Following its introduction, therehave been a host of investigators applying the ML method(4-11). The ML method is promising for clinical applicationsas it has the potential to yield higher resolution and lowernoise reconstructions than conventional methods. For ex-ample, our group has shown decreases in variances in brainphantom images of upwards of a factor of 4 for constrainedversions ofthe ML approach when compared to conventionallinear approaches (10). This method has significant potentialfor physiological studies requiring low radioactive tracerdosages [see Wagner (12)]. Though the ML method has greatpromise, it suffers from two major limitations. (i) The kernelcomputation in the iterative algorithm requires regenerationof the linear superposition integrals corresponding to thegeneralized-projection measurements, twice per view angleper iteration, requiring several hundred superposition inte-grals per iteration of the EM algorithm, a formidable com-putational hurdle. (ii) It is now well known that application ofunconstrained nonparametric ML estimation results in im-ages that exhibit noise-like artifacts in the form of highlyconcentrated peaks and valleys that worsen as the EMalgorithm climbs the "likelihood-hill" toward the ML solution(6).The objectives of this paper are to demonstrate an efficient

implementation of the EM reconstruction algorithm and toimplement Good's roughness penalty for stabilizing the MLreconstructions. With the recent advent of the massivelyparallel processors based on mesh-connected arrays of bit-serial processor elements on single integrated circuits, it isnow possible by proper restructuring of the computations toimplement imaging algorithms with computation times thatare several orders ofmagnitude lower than that obtained withconventional processors and scaling rules that are indepen-dent of the number of pixels in the image. By reformulatingthe Gaussian weighted superposition integrals as solutions ofpartial differential diffusion equations (PDEs) the speed-uprequired for clinical application may be attained. The key tothe speed-up resides in the fact that the PDE solution exploitsthe most important requirement of mesh-connected proces-sors, local data passage. The noise artifact associated withthe unconstrained ML estimator is addressed by means oftheintroduction ofGood's rotationally invariant roughness prior(13) into the likelihood functional, which leads to a maximuma posteriori (MAP) estimator that solves a set of nonlinear

Abbreviations: PET, positron-emission tomography; SPET, single-photon-emission tomography; ML, maximum-likelihood; EM, ex-pectation-maximization; PDE, partial differential diffusion equation;TOF, time-of-flight; 2D, two-dimensional; iD, one-dimensional;MRF, Markov random field; MAP, maximum a posteriori.

3223

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Page 2: Bayesian image reconstruction for emission tomography

3224 Medical Sciences: Miller and Roysam

differential equations. These are solved using local Jacobi-like iterations that are directly incorporated into the EMalgorithm with negligible increase in computation.

2. Unconstrained ML Reconstruction

For emission tomography, the set of generalized projectionsis determined by the physics of the data-collection scheme,with each imaging device having its own point-spread func-tion po(ulx) determining the positions in the image thatcontribute to the measurement. In PET the external mea-surements are of the line-of-flight of annihilation photons.Due to the finite detector size, the measurements do notlocate the emissions along perfect lines, but rather corre-spond to Gaussian weighted surfaces through the intensity.For the newer PET systems, the differential propagationtimes of the annihilation photons are also measured [time-of-flight (TOF)-PET] (1), resulting in the measurement havingan inherent depth-resolving feature. The point-spread func-tions are cigar-shaped asymmetric Gaussians with standarddeviations of3.19 cm in the TOF direction and 0.489 cm in thetransverse direction.The resulting intensity pu(u) of the measurement process is

determined by the point-spread function po(ulx) and given bygo(u) = fpe(ulx)A(x)dx. Here, x denotes the two-dimensional(2D) coordinates in the underlying emission space over whichthe radioactive tracer density A is defined, and u is the 2Dcoordinate in the space over which the measurements aremade. Physically, pO(ulx)118uJJ is the fraction of points mea-sured in voxel [u, u + Su) emanating from location x indirection 6. The statistical model adopted describes themeasurements at a particular angle 6, denoted as M6(du), asPoisson with mean intensity A,.(u)du. The estimation problembecomes one of maximizing the Poisson likelihood of the setof measurements {MO; 1 c i ' NH}, subject to the constraintthat A(x) 2 0. The log-likelihood is given by

No

>JMe(du)log[ Po(uIx)A(x)dx] - JA(x)dx. [2.1]

As proven by Vardi et al. (14) for the discrete version of theproblem, the ML estimate is the convergence point of thesequence of iterates given by

No

Ak+l(X) = Ak(X)i=1

IPoi(ulx)Me.(du)

fpoj(ulz)Ak(z)dz

That the iteration defined by Eq. 2.2 is reasonable may beseen by imagining the reconstruction problem without noise.Then, the true image A is a fixed point of Eq. 2.2, and if thepoint-spread operator has a zero null-space then the iterationconverges to the true A maintaining the positivity constraint.The core computations for the iteration of Eq. 2.2 are the

computation of the 2D integrals at each angle 6. For a typical100-angle system for which 50 iterations are required, 104such integrations must be computed. The most direct ap-proach for parallel implementation is to allocate a singleprocessor for each ofthe point-spread function superpositionintegrals, for which the hypercube class of machines is anideal candidate. Although this was the first approach we havetaken for which results are shown in the ensuing Table 1, theparallelism is determined by the number of view anglesmeasured. For applications where there are fewer angles butthe image remains large, this offers little parallelism withoutbreaking the image into smaller subblocks and then recom-bining at the boundaries of the blocks. More important, forimaging applications where random field priors such asGood's roughness measure are introduced as constraints, the

processing required does not divide up well across a smallnumber of powerful processors.An alternative approach relies on the allocation of a single

processor to each image element. The superposition integralsrequired at each angle are computed serially, with the entirearray dedicated at any one time to the computation of a singleintegral. For this approach, the processor configuration mustbe consistent with the image topology so that at the comple-tion of the computation of the integral each processor has thepixel value required for the superposition sum across angles.This implementation strategy removes the communicationrequired for the superposition across angles but forces arestructuring of the computations so that the array of pro-cessors is involved in parallel on the computation of each ofthe superposition integrals.The method of choice for this approach is the now well-

known systolic convolution method. The key to the "classic"systolic implementation is the generation ofthe superpositionintegrals by means of a parallel, pipelined sequence ofmultiplications and additions. Though this is well suited forour application, we have preferred to implement the super-position integrals as solutions of PDEs as they map verynaturally onto mesh-connected processor arrays by means offinite-difference approximations (15). Since the computa-tions required for a differential equation description involvenearest neighbors, this makes such an approach ideal forlocally connected mesh architectures. The core computa-tions required for the iteration are as follows. The integral at0° to the grid requires 2 one-dimensional (iD) Gaussianconvolutions with standard deviations of 3.19 and 0.489 cmfor the TOF and transverse directions. That the iD diffusion

ap,(x) a2p,(x)equation - = D2 with initial condition p0(x) =

at ax2i(x) performs the convolution follows since the Green'sfunction of the diffusion equation is p,(x) = 1/V'2irtexp(-x2/2t), and by linear superposition its solution at time twith initial conditions po(x) = i(x) becomes fp,(u - x)i(x)dx.This is solved by means of finite differences according to

1 1 1Pn+i(I) = - Pn(j + 1) + - Pn(j) + - Pn(j - 1), [2.3]4 2 4

with each iteration of the difference equation "diffusing" theinitial condition. The initial condition is set according to po(j)= i(j), where j is the discrete space variable denotingprocessor location, with n iterations of Eq. 2.3 resulting in aconvolution with a Gaussian kernel of variance n/2.The 2D convolutions lying along the coordinate axes (00,

90°) are separated as two iD convolutions. For arbitraryangle convolutions for which the 2D density is not separableon the fixed axes of the array, we have previously demon-strated (15) that they may be obtained by successive iDGaussian convolutions along a set of predefined axes, leadingto a more general method of performing the systolic convo-lutions of densities at arbitrary angles. As an example, aTOF-PET point-spread function at 300 to the mesh is gener-ated by convolving three iD Gaussian densities with axes (1,0), (1, 1), and (2, 1) to the grid and variances of 4, 19, and 26pixels, respectively (1 pixel = 0.35 cm). In ref. 15, the methodis derived for generating all 96 kernels for TOF-PET.The PDE approach has been compared with implementa-

tion on conventional and hypercube architectures. Shown inTable 1 are the results of a collaborative study in our groupfor which the computation times of the EM algorithm imple-mentation was compared on a number of different architec-tures, including a SUN 4/280 and the InMOS Transputer inwhich the algorithm was parallelized with one processor perview angle. Table 1 also shows implementations on thesingle-instruction multiple-data machines, the National Cash

Proc. Natl. Acad. Sci. USA 88 (1991)

Page 3: Bayesian image reconstruction for emission tomography

Proc. Natl. Acad. Sci. USA 88 (1991) 3225

Table 1. Analysis of computation times for the EM algorithm onvarious processors

EM algorithm time

Iteration, Convergence,Processor sec min

SUN 4/280 117 195InMOS T800 1.59 2.65NCR-GAPP 0.50 0.83AMT-DAP 0.65 1.10

Register geometric arithmetic parallel processor (GAPP) andthe Active Memory Technology distributed array processor(DAP) architectures, for which the parallelism scales linearlywith the number of pixels in the image (see ref. 15 for abreakdown of the shift, multiply, and add times). These timescorrespond to 96 view angle reconstructions, on a 256 x 256image array and convergence time for 100 iterations.

3. Good's Roughness Prior for Constrained ML

Having addressed the computations required for the EMalgorithm on the mesh processor, we proceed to the secondissue on the "noise" artifacts that are fundamental to anyalgorithm generating unconstrained ML estimates. To illus-trate, assume the measurements are from a iD Poissonprocess with intensity A(x). Then maximizing the Poissonlog-likelihood fN(dx) log [A(x)] with respect to the imageintensity A(x) with count preservation constraint yields a setof Dirac delta functions centered at the points of the obser-vations, an extremely rough estimate of A. For the actualTOF-PET imaging problem this results in the reconstructionsof Fig. 1. Fig. 1 Upper Left shows the 128 x 128 phantom,with Fig. 1 Center and Right showing the Poisson data from2 view angles at 790 (Center) and 340 (Right) out of a total of16 views with 100,000 counts. Fig. 1 Lower depicts theevolution of the algorithm for 500 (Left), 1000 (Center), and3000 iterations. After 3000 iterations (Fig. 1 Lower Right), thereconstruction is continuing to increase in likelihood as wellas roughness. These were implemented on the DAP archi-tecture using the fully parallel method of implementationpreviously described.The simple Poisson model of Eq. 3.1 illustrates the funda-

mental difficulty. Grenander (16) notes that in ML problemssuch as these, the parameter space (positive, finite measur-able functions) is too large. He proposes maximizing thelikelihood over a constrained subspace; we describe theconstraint space by means of the class of Markov randomfield (MRF) priors induced by Good's roughness penalty.

U..FIG. 1. (Upper) Two of 16 angular views. (Lower) A series of

reconstructions at 500, 1000, and 3000 iterations ofthe EM algorithm,with roughness values plotted below.

As suggested by Good (13), the roughness of a distributionmay be measured by determining the difficulty of discrimi-nating it from a shifted version of itself and is quantified interms of the amount of information associated with theprocess of discrimination. Following Kullback (17), the in-formation divergence J(x, x + E) between a distribution A(x)and its shifted version A(x + E) is given by

J(x, x + E) = J (A(x)-k(x + 6))log (x) d(x)Ak(x + e)

cJ A dx= fI -(X)I2dX, [3.1]

where fy(x) = V'i7x) and the right side is the first term in aTaylor series expansion of the divergence. Now for twodimensions define the shift vector eO = [ex, ey] as the vectorof length Ie4,I at angle 4 to the gradient vector V'y(x, y) =[ayiax, ay/ay]. Then the divergence J(x, x + eO) at point x =[x, y] for shift eOt is proportional to

ff[82(i+ 2ex(d± (d2) + 2(d\ adxdyJJ x axJ ~ ax) \ay/ Y ay)

= ffrleI21vvY2cos21dxdy. [3.2]

For the 2D imaging application for which there are nopreferred directions, the reconstruction should not depend onthe orientation of the coordinate system, requiring the pro-cessing to be rotationally invariant. Applying this to thedivergence measure J(x, x + eO) amounts to averaging overe4, with le#I1 = 1 for all 4 E [0, 27r), yielding the followingroughness measure:

J Ivyl~e X [ax ayd-IIIdxdy [3.3]

This is the 2D penalty used throughout. The roughnessmeasure is sensitive to discontinuities in A on sets of largemeasure and scales inversely with the size of the intensity.An alternative view of the roughness measure may be for-mulated by examining the detection limit of the position of a1D waveform of known shape with unknown position. As-suming that the position of the pulse is uniformly distributedover the observation interval, the Cramer-Rao mean-squareerror bound on the position of the waveform when givenPoisson data is given by the inverse ofthe Fisher information,which is precisely Good's 1D roughness measure of Eq. 3.1(see p. 72 of ref. 18, for example). Viewing the positioninguncertainty as a measure of resolution, the introduction ofGood's penalty becomes a direct tradeoff between smooth-ness and resolution in the reconstruction.The roughness measure is introduced into the estimation as

a penalty that is added to the Poisson log-likelihood accordingto

UN(dx)log[A(x)]- a2fA(x)dx- arl A dx, [3.4]J j J ~~~~~~~~A(x)

with N(dx) the measured point process and the Lagrangemultipliers a,, a2 chosen to yield a given total roughness andintegral, respectively. For two dimensions the rotationallyinvariant penalty of Eq. 3.3 is used. On the discrete lattice ofthe mesh this induces a class of MRF priors on the image A.Combining the roughness with the log-likelihood according toexpression 3.4 and maximizing yields the MAP estimator.Since the MAP estimator maximizes expression 3.4 subject to

Medical Sciences: Miller and Roysam

Page 4: Bayesian image reconstruction for emission tomography

3226 Medical Sciences: Miller and Roysam

the nonnegativity constraint A - 0, the constraint is enforcedby setting A = yy2, solving for an optimal ', and choosing A =9 as the solution. In ref. 6 we showed that the solution of the1D continuous differential equation is a nonlinear exponentialspline of the input data.For the mesh implementation the solution is generated by

substituting in terms of y = HA and discretizing the log-posterior. The 2D posterior (see ref. 6 for 1D analogue)becomes

S.nij log y?~j a2 I, TijTJ -a2Y Y1,J

- al( li+lij - )iji + E lIYij+l - yi~jl )\~lJ

[3.5]

with n,,j the number of measured photons in voxel i, j. Thenecessary conditions for the MAP solution are the followingset of simultaneous nonlinear equations:

al(yi+i,j + yi-i,j + yi,j+1 + yi,j-1 - 4yij) - a2Yi=[3.6]

For their solution the 2D nonlinear version of the Jacobiiteration method for linear partial differential equations isused:

(YiJ)2 = 1 (nij + alyi, j(y 1i +4a, + a2

+ Yj + [3.7]

The appropriateness of the finite-difference solution of thenonlinear difference equation is clear in that it has the localdata-passage properties required for the mesh-connectedarchitecture. For generating the result of the k + 1st iterationof Eq. 3.7 on the mesh processor, we assume that nij and theprevious result 'yi are resident at processor (i, j) of the array.Every processor in parallel performs the operations of Eq.3.7 on its neighboring values to generate the results yk+1. Theiteration is continued until by stops changing.A second equation arises coupling a, and a2 by means of

constraints on the solution having total integral equal to thenumber of measurement points. We use straightforwardgradient descent for setting the second Lagrange multipliera2. In practice, however, the algorithm is found to be weaklysensitive to a2, and in all simulations that follow it has beenpreset to a2 = 1 delivering a normalization within 5% of thedesired value over a broad range of stimulus conditions. Thisremoves the need for a second iteration step.

4. Tomography Results

The noise artifact of the unconstrained ML solution in theemission tomography problem is illustrated by means of theconstantly increasing roughness of over 500% from 500 to3000 iterations plotted below each panel in Fig. 1. This isaddressed by means of Good's prior. In one dimension, theMAP solution with the prior is given by the intensity A, whichmaximizes the log-likelihood of expression 2.1 with the prioradded

j=Me1(du)log fPei,(ulx)A(x)dx] - a2fA(X)dX

FIG. 2. MAP solutions with two different roughness parameters.

The 2D version is identical with the rotationally invariantversion of Good's prior substituted for the last term above.The EM algorithm may be combined with a prior for derivingthe MAP estimator by simply adding the prior to the maxi-mization at each stage of the algorithm. The k + 1st iteratemaximizes the following complete-data log-posterior:

fvk(dx)logA(x) - a2 fbA(x)dx - a, bA'(x)I' [4.2]

where Nk(dx) is the conditional mean of the number ofemissions in pixel [x, x + dx), given the projection measure-ments {Mei(du); 1 c i ' N0}, and is given by

7qk(d) = Ak(x)llIdXlI(f Peo(ulx)Me,(du) [4.3]

The 2D results are identical except the 2D roughness must besubstituted into expression 4.2.Notice the similarity of expression 4.2 to expression 3.4

implying that the discrete solution at each iteration of theconstrained EM algorithm is none other than the solution ofthe nonlinear difference Eq. 3.6 at each iteration, with thedriving function for the right side of the nonlinear differenceequation given by the conditional mean of Eq. 4.3. Since theconditional mean of Eq. 4.3 is the unconstrained solution ofthe EM algorithm corresponding directly to Eq. 2.2, theaddition of Good's prior corresponds to smoothing the un-constrained EM algorithm solution at each iteration by run-ning the Jacobi-like iteration of Eq. 3.7 to smooth the EMestimate.To demonstrate, on the PET data shown in Fig. 2 are

reconstructions from the identical data of Fig. 1 with Good'sroughness added. Fig. 2 Left and Right shows the results ofgenerating the MAP estimate by applying the EM algorithmwith Good's roughness prior at each iteration; plotted beloweach panel is the roughness.

Figs. 3 and 4 show a variance study on a six-slice piephantom comparing a postfiltering of the unconstrained MLsolution and the MAP estimator. Fig. 3 shows the tracerdistribution for the pie phantom studied. Fig. 4 Left shows the3000th iteration ofthe unconstrained ML algorithm applied to16 view angle data having 460,000 (Fig. 4 Upper) and1,200,000 (Fig. 4 Lower) total counts. Fig. 4 Center shows theresult of postfiltering with Good's roughness, the final MLresult on the left and the MAP solution on the right. The MAPsolution was generated by adding Good's penalty to each

[4.1]FIG. 3. Tracer distribution for the pie phantom.

JIA'(x)2I A(x)

Proc. Natl. Acad Sci. USA 88 (1991)

Page 5: Bayesian image reconstruction for emission tomography

Proc. Natl. Acad. Sci. USA 88 (1991) 3227

FIG. 4. (Left) Unconstrained ML solution. (Center) Result ofpostfiltering with Good's roughness. (Right) MAP solution, withsample variances superimposed, multiplied by factor 10-3. Note: toprow, middle column, pie wedge at 6 o'clock should read 1.41, not14.10.

iteration of the EM algorithm according to Eq. 3.6. Theparameters were chosen so that Fig. 4 Center and Right haveidentical roughness. Superimposed inside of each of the piewedges are the sample variances scaled by a factor of lo-3.These results show that for the same resolution the MAPsolution shows variability that is lower by a factor of 2 thanthe postfilter approach.Shown in Fig. 5 is a comparison ofcuts through the 460,000

count reconstructions of the pie phantom shown in Fig. 4.The dashed line is the cut through the original pie, with thesolid line the postfilter solution, and the bold line the MAPsolution. Fig. 5 demonstrates quite convincingly that thepostfilter solution has much larger oscillations around thetrue distribution than the MAP estimator.

Politte and Snyder (10) have shown that regularizationmethods similar to that shown in the center panel for the piereconstruction deliver as much as four times a decrease invariance over the confidence-weighted linear approach usedin TOF systems. The results shown here would predict aneven greater increase in performance by the MAP solution.We conclude by summarizing results on the computation timefor the reconstructions of Figs. 1-4 on our 32 x 32 DAParray, with and without Good's smoothing. The EM algo-rithm alone for 16 view angles on the 128 x 128 imagesrequired 12.2 msec per iteration in 32-bit real arithmetic and1.87 msec per iteration in 32-bit integer arithmetic. Good'ssmoothing with 20 updates of the smoothing algorithm ap-plied to each EM iteration required 0.03 msec, comprising<5% of the total computation due to the fact that thearchitecture yields parallelism that scales with the number ofpixels.

0 16 32 48 64 80 96 112 128

FIG. 5. Cuts through the pie (dotted), postfilter (solid), and MAP

solution (bold).

5. Condusions

One of the major objectives of this paper has been to providea systematic method for removing the noise artifact thatarises in unconstrained ML tomographic reconstructions.We first proposed the incorporation of Good's prior into theestimation procedure (6) as the smoothness properties of theclass of objects being reconstructed were not incorporated inthe unconstrained algorithm. The rotationally invariantroughness measure is closely related to the work of Poggio(19) on regularization of images in Gaussian noise where abound is placed on the integral ofthe derivative squared. Thesmoothness constraint on the square root of the image isprecisely this surface interpolation function. Good andGaskins (20) have also proposed the incorporation of asecond derivative curvature constraint on the intensity pro-file, which can be straightforwardly added to the MAPsolution. The second derivative would require communica-tion of pixel values across two nearest neighbors.The second result has been to demonstrate that with the

advent of the massively parallel processor class of architec-tures it is now possible to implement algorithms requiringlarge numbers of superposition integrals so that communi-cations and computations are performed synchronously withparallelism that grows as the number of processors. Thepivotal property of such a restructuring is locality of com-munication. By posing the Gaussian convolutions andGood's prior as the solution of PDEs, the locality of datamovement is ensured. For implementations such as in non-TOF-PET where the line integrals are iD in nature and thekernels may not be the solutions of differential equations, amore natural approach would be based on combining thesystolic method of Kung with the difference equations in-duced by the random field priors.We are indebted to Anders McCarthy, who formulated the PDE

mapping of the convolutions; to Donald L. Snyder, who has playeda central role in the algorithm development; to David Politte and KurtSmith for contributions during the preparation ofthe manuscript; andto Jay Shrauner for DAP programing assistance. This research wassupported by National Science Foundation Grant PYIA ECE-8552518 to M.I.M. and by National Institutes of Health GrantRR01380.1. Snyder, D. L., Thomas, L. J., Jr., & Ter-Pogossian, M. M. (1981) IEEE

Trans. Nucl. Sci. 28, 3575-3583.2. Rollo, F. D. (1977) Nuclear Medicine Physics, Instrumentation, and

Agents (Mosby, St. Louis).3. Shepp, L. A. & Vardi, Y. (1982) IEEE Trans. Med. Imaging 1, 113-121.4. Snyder, D. L. & Politte, D. G. (1983) IEEE Trans. Nucl. Sci. 30,

1843-1849.5. Lange, K. & Carson, R. (1984) J. Comput. Assist. Tomography 2,

306-316.6. Snyder, D. L. & Miller, M. I. (1985) IEEE Trans. Nucl. Sci. 32, 3864-

3872.7. Miller, M. I., Snyder, D. L. & Miller, T. (1985) IEEE Trans. Nucl. Sci.

32, 769-778.8. Floyd, C. E., Jr., Jaszczak, R. J., Greer, K. L. & Coleman, R. E. (1986)

J. Nucl. Med. 27, 1577-1585.9. Veklerov, E. & Llacer, J. (1987) IEEE Trans. Med. Imaging 6,.

10. Politte, D. G. & Snyder, D. L. (1988) IEEE Trans. Nucl. Sci. 35,608-610.

11. Chornoboy, E. S., Chen, C. J., Miller, M. I., Miller, T. R. & Snyder,D. L. (1990) IEEE Trans. Med. Imaging 38, 99.

12. Wagner, H. N., Jr., (1986) J. Nucl. Med. 27, 1227-1231.13. Good, I. J. (1971) Nature (London) 229, 29-30.14. Vardi, Y., Shepp, L. A. & Kaufman, L. (1985) J. Am. Stat. Assoc. 80,

8-35.15. McCarthy, A. W., Barrett, R. C. & Miller, M. I. (1988) in Proceedings

of the 22nd Conference on Information Sciences (Princeton Univ.), pp.373-374.

16. Grenander, U. (1981) Abstract Inference (Wiley, New York).17. Kullback, S. (1968) Information Theory and Statistics (Dover, New

York).18. Van Trees, H. L. (1971) Detection, Estimation, and Modulation Theory

(Wiley, New York), Vol. 1.19. Poggio, T., Torre, V. & Koch, C. (1985) Nature (London) 317, 314-319.20. Good, I. J. & Gaskins, R. A. (1980) J. Am. Stat. Assoc. 75, 42-73.

Medical Sciences: Miller and Roysam