Newmultiscaletransforms,minimumtotalvariationsynthesis:...

Signal Processing 82 (2002) 1519–1543www.elsevier.com/locate/sigpro

New multiscale transforms, minimum total variation synthesis:applications to edge-preserving image reconstruction

Emmanuel J. Cand+es ∗, Franck GuoApplied and Computational Mathematics, California Institute of Technology, Pasadena, CA 91125, USA

Received 1 March 2002

Abstract

This paper describes newly invented multiscale transforms known under the name of the ridgelet (Appl. Comput. HarmonicAnal. 6 (1999) 197) and the curvelet transforms (In: Cohen et al. (Eds.), Curves and Surfaces, Vanderbilt UniversityPress, Nashville, TN, 2000, pp. 105–120; Cand+es and Donoho available at http://www-stat.stanford.edu/donoho/Reports/1998/curvelets.zip, 1999). These systems combine ideas of multiscale analysis and geometry. Inspired by some recent work ondigital Radon transforms (Averbuch et al. available at http://www-stat.stanford.edu/∼donoho/Reports/index.html), we thenpresent very e>ective and accurate numerical implementations with computational complexities of at most N logN . In thesecond part of the paper, we propose to combine these new expansions with the Total Variation minimization principlefor the reconstruction of an object whose curvelet coe@cients are known only approximately: quantized, thresholded, noisycoe@cients, etc. We set up a convex optimization problem and seek a reconstruction that has minimum Total Variationunder the constraint that its coe@cients do not exhibit a large discrepancy from the data available on the coe@cients of theunknown object. We will present a series of numerical experiments which clearly demonstrate the remarkable potential ofthis new methodology for image compression, image reconstruction and image ‘de-noising’.? 2002 Elsevier Science B.V. All rights reserved.

Keywords: Ridgelets; Curvelets; Pseudo-polar FFT; Radon transform; Minimum total-variation; Edges

1. Introduction

1.1. The multiscale=multiresolution revolution

The title of this special issue is “Image andVideo Coding Beyond Standards”. Claiming that the‘standards’, at least in image coding, have evolvedover the last few years is quite an understatement.In fact, the last two decades or so witnessed an ex-traordinary revolution in the way we process, store,

∗ Corresponding author.E-mail address: [email protected] (E.J. Cand+es).

visualize and transmit digital data; a revolution onemight call the triumph of the multiscale approach.The pioneering work of Burt and Adelson [5] on

Laplacian pyramids, of Galand [19] on quadraturemirror Llters, and of Grossman and Morlet [20] pavedthe way to the powerful concept of multiresolution,which today is—perhaps—best known under thegeneric name of wavelets. Multiresolution providesus with tools to describe mathematical objects suchas functions, signals and more generally datasets atdi>erent level of resolutions. The celebrated orthogo-nal wavelet pyramids of Mallat and Meyer [24] andDaubechies’ construction of compactly supported

0165-1684/02/$ - see front matter ? 2002 Elsevier Science B.V. All rights reserved.PII: S 0165 -1684(02)00300 -6

http://www-stat.stanford.edu/donoho/Reports/1998/curvelets.zip

mailto:curvelets.zip

http://www-stat.stanford.edu/~donoho/Reports/index.html

1520 E.J. Cand(es, F. Guo / Signal Processing 82 (2002) 1519–1543

wavelets [14] are among the greatest accomplish-ments of this generation of researchers who movedthe Leld away from classical Fourier analysis.Today, multiscale=multiresolution ideas permeate

many Lelds of contemporary science and technology.In signal processing, they led to convenient toolsto navigate through large datasets, to transmit com-pressed data rapidly, to remove noise from signalsand images, and to identify crucial transient featuresin such datasets. By now, wavelets have made theirway into our livelihood as they were included inJPEG 2000, the new still-picture compression stan-dard, and are being considered for future generationsof standard video coders such as the ‘MPEG family’.

1.2. New multiscale geometric transforms

In this paper, we will describe some ideas whichtake the multiscale approach in a whole new direc-tion, perhaps justifying the title “Beyond Standards”.Indeed, Cand+es and Donoho [10,9,7] argue that theconcept of multiscale is much broader than what clas-sical multiresolution ideas or wavelets imply. Theydeveloped new systems of representation which arevery di>erent from wavelet-like systems. These sys-tems combine ideas of multiscale analysis with ideasof geometric features and structures.For instance, a newly invented representation is the

ridgelet system [6] which allows the representationof arbitrary bivariate functions by superpositions ofelements of the form

a−1=2 ((x1cos �+ x2sin �− b)=a); (1.1)

is an oscillatory univariate function (a oneone-dimensional wavelet if you wish), a¿ 0 is ascale parameter, � is an orientation parameter, andb is a location scalar parameter. Thus, ridgelets oc-cur at all scales, locations, and orientations. Unlikewavelets, ridgelets are nonlocal as they are inLnitelyelongated in the codirection �. In the context of imageprocessing where one would restrict objects to ‘live’on the unit square, ridgelet functions are roughlyof unit length and have arbitrary width, so that onewould have all possible aspect ratios. If we think ofwavelets as ‘fat’ points, then we may view ridgeletsas ‘fat’ lines.Motivated by the problem of Lnding e@cient

representation of objects with discontinuities along

curves and of compression of image data, Cand+es andDonoho introduced yet another representation system,the curvelet transform. Like ridgelets, curvelets occurat all scales, locations, and orientations. However,whereas ridgelets have unit length, curvelets occur atall dyadic lengths and exhibit an anisotropy increasingwith decreasing scale like a power law. Section 2 willintroduce the curvelet transform. In short, curveletsobey a scaling relation which says that the width ofa curvelet element is about the square of its length;width ∼ length2. Conceptually, we may think of thecurvelet transform as a multiscale pyramid with manydirections and positions at each length scale, andneedle-shaped elements (or ‘fat’ segments) at Lnescales. Cand+es and Donoho [8,9] build tight framesof curvelets (��) obeying

f =∑�

〈f; ��〉�� (1.2)

and

‖f‖2L2(R2) =∑�

|〈f; ��〉|2: (1.3)

1.3. Curvelets and image coding

New transforms may be very signiLcant for practi-cal concerns. For instance, the potential for sparsity ofwavelet expansions paved the way to very successfulapplications in areas such as signal=image compres-sion or denoising and feature extraction=recognition.The numerical examples presented in this paper willmake clear that the curvelet transform is very relevantfor image processing applications. Beyond those ex-amples, there is now a substantial amount of evidencesupporting our claim:• Sparse representations by curvelets: The curveletrepresentation is far more e>ective for represent-ing objects with edges than wavelets or more tradi-tional representations. In fact, Cand+es and Donoho[8] proves that curvelets provide optimally sparserepresentations of C2 objects with C2 edges. In gen-eral, improved sparsity leads to improved compres-sion performance, at least at high compression rates.Hence, transform coders based on the quantizationof curvelet coe@cients may beneLt from provablysuperior asymptotic properties.

• Sparse component analysis: In computer vision,there has been an interesting series of experiments

E.J. Cand(es, F. Guo / Signal Processing 82 (2002) 1519–1543 1521

whose aim is to describe the ‘sparse components’ ofimages. Of special interest is the work of Field andOlshausen [27] who set up a computer experimentfor empirically discovering the basis that best rep-resents a database of 16 by 16 image patches. Al-though this experiment is limited in scale, they dis-covered that the best basis is a collection of needleshaped Llters occurring at various scales, locationsand orientations. The reader will Lnd a stimulatingdiscussion about the connection between data analy-sis and harmonic analysis in [17]. We will not detailthis connection and simply wonder, with Donoho, atthe striking resemblance between curvelets, whichderive from mathematical analysis, and these em-pirical basis elements arising from data analysis.

• Numerical experiments: Donoho and Huo [21]study sparse decompositions of images in a dic-tionary of waveforms including wavelet bases andmultiscale ridgelets. They apply the Basis Pursuit(BP) in this setting and obtain sparse syntheses bysolving

min‖a‖‘1 subject tof =∑m

am’m;

where (’m) is our collection of waveforms. BPgives an ‘equal’ chance to every member of thedictionary and yet, Donoho and Huo observe thatBP preferably selects multiscale ridgelets overwavelets, except for possibly the very coarse scalesand the Lnest scale. This experiment seems to in-dicate that multiscale ridgelets are better for repre-senting image data than pre-existing mathematicalrepresentations. There is a natural connection withour previous point which showed that empiricallyselected bases do not look like wavelets.

1.4. New partial synthesis rules: total-variationminimization

Let T be a linear transformation such as the Fourier,wavelet, ridgelet or curvelet transforms, for example,or the amalgamation of a few of these. Suppose thatone is given partial information about f in the sensethat only a subset of the coe@cients Tf are knownor approximately known. To make things concrete,we might consider a ‘compression scenario’ wherethe partial information consists in the value of them-largest coe@cients, possibly subject to quantiza-

tion error. The classical approach to partial synthesisconsists in setting the other coe@cients to zero andinverting the transform T .There are, of course, more sophisticated ap-

proaches. For instance, in the case where T is thewavelet transform, Buccigrossi and Simoncelli [4]develops a wavelet embedded predictive image coder;the idea is to use statistical properties of natural scenesto predict the absolute value of neighboring waveletcoe@cients. In a somewhat similar spirit, Crouse et al.[13] attempts to model the joint probability density ofthe wavelet coe@cients using Hidden Markov Modelsand exploit this model for e@cient partial synthesis.In this paper, we introduce a very di>erent idea

based on the minimization of a complexity penalty.Of special interest is the Total Variation norm ‖f‖BV ,which is roughly equal to the integral of the Euclideannorm of the gradient. We introduce some notations.We let (Tf)�∈M be the coe@cient sequence of an ob-ject f and let M ′ denote a subset of the coe@cientindex M . Suppose we are given possibly approximatevalues �� of (Tf)�, for �∈M ′. We propose the fol-lowing rule for partial synthesis:

min ‖g‖BV subject to |��(g)− ��|6 e�; �∈M ′:(1.4)

In a nutshell, given a vector of tolerance (e�)—some ofthe e�’s may be zero—we seek a solutionf∗ with min-imum Total Variation norm whose coe@cients ��(g)are within e� of ��. In passing, note that Eq. (1.4) maybe seen as a very interesting special case of the moregeneral rule

min J (g); subject toTg∈Cf; (1.5)

where J (g) is a functional measuring the complexityof the Lt, e.g. ‖g‖BV , and Cf is a convex set of con-straints depending on available information about f,e.g. |Tg− �|6 e.Reconstruction based on (1.4) were brieSy sug-

gested in [29] and have been independently intro-duced by Durand and Froment [18] in the context ofwavelet-based signal denoising. This paper will de-ploy (1.4) in our newly constructed geometric multi-scale systems, namely ridgelets and curvelets. We willpresent a series of preliminary experiments which willclearly demonstrate the potential for image compres-sion, image reconstruction and ‘de-noising’.


1.5. Contents

Section 2 will review the basic components of boththe ridgelet and curvelet constructions. In Section 3,we will present a strategy for developing approximatedigital transforms which can be used e>ectively onn by n Cartesian arrays. Section 4 will discuss newpartial synthesis rules introduced in Section 1.4. InSection 5, we will present applications in the area ofimage reconstruction and speciLcally edge-preservingtomography. Finally, we will close with some remarksand future directions for research.

2. The Ridgelet transform

There are several ridgelet transforms. For instance,Cand+es introduced both a continuous and a discreteridgelet transform. Letting a;�;b(x) being a ridgeletas in (1.1), he deLnes a ridgelet coe@cient by

Rf(a; �; b) =∫

f(x1; x2) a;�;b(x1; x2) dx1 dx2:

Then provided that is oscillatory, Cand+es [6]shows that we have a stable and concrete reproducingformula just as we have a stable and concrete repro-ducing formula for the wavelet transform. He thendevelops a discretization of the continuous transformand develops ridgelet frames. We give an overviewof the main components of the discrete transform.• Dyadic scale: a= 2−j; j¿ 0,• dyadic location: b= k · 2−j; k ∈Z,• angular resolution proportional to scale: � =2#$ · ‘ · 2−j; l= 0; 1; 2; : : : ; �6 2#.In an image processing context, this work sug-

gests taking line-like measurements on image dataand shows how to combine these measurements toreconstruct an image.In dimension 2, Donoho [16] introduced a

new orthonormal basis whose elements he called‘orthonormal ridgelets’. We quote from [10]:“Such a system can be deLned as follows: let( j;k(t): j∈Z; k ∈Z) be an orthonormal basis ofMeyer wavelets for L2(R) [22], and let (w0

i0 ;‘(�); ‘=0; : : : ; 2i0 −1;w1

i; ‘(�); i¿ i0; ‘=0; : : : ; 2i−1) be anorthonormal basis for L2[0; 2#) made of periodizedLemariTe scaling functions w0

i0 ;‘ at level i0 and peri-odized Meyer wavelets w1

i; ‘ at level i¿ i0. (We sup-

pose a particular normalization of these functions.)Let j; k(!) denote the Fourier transform of j;k(t),and deLne ridgelets &'(x); ' = (j; k; i; ‘; () as func-tions of x∈R2 using the frequency-domain deLnition

&'()) = |)|−1=2( j; k(|)|)w(i;‘(�)

+ j; k(−|)|)w(i;‘(�+ #))=2: (2.1)

Here the indices run as follows: j; k ∈Z; ‘ =0; : : : ; 2i−1 − 1; i¿ i0; i¿ j. Note the restrictionson the range of ‘ and on i. Let ' denote the set of allsuch indices '. It turns out that (&')'∈* is a completeorthonormal system for L2(R2)”.Fig. 1 represents an object and that ridgelet which at

a given scale, best correlate with this object. Observethat the ridgelet is aligned with the discontinuity.

3. The curvelet transform

We now brieSy discuss the curvelet frame; for moredetails, see [8]. The construction combines several in-gredients, which we brieSy review.• Ridgelets, a method of analysis very suitable for ob-jects which are discontinuous across straight lines.

• Multiscale Ridgelets, a pyramid of analyzing ele-ments which consists of ridgelets renormalized andtransported to a wide range of scales and locations.

• Bandpass Filtering, a method of separating an ob-ject out into a series of disjoint scales.The last section introduced ridgelets and we Lrst

brieSy discuss multiscale ridgelets and bandpass Llter-ing. We then describe the combination of these threecomponents. There is a di>erence between this con-struction and the one given in [8] at large scales.

3.1. Multiscale ridgelets

Think of orthoridgelets as objects which have a“length” of about 1 and a “width” which can bearbitrarily Lne. The multiscale ridgelet system renor-malizes and transports such objects, so that one has asystem of elements at all lengths and all Lner widths.The construction begins with a smooth partition

of energy function w(x1; x2)¿ 0; w∈C∞0 ([− 1; 1]2)

obeying∑

k1 ;k2 w2(x1 − k1; x2 − k2)≡ 1. DeLne

a transport operator, so that with index Q in-dicating a dyadic square Q = (s; k1; k2) of the


Fig. 1. A Ridgelet.

form [k1=2s; (k1 + 1)=2s) × [k2=2s; (k2 + 1)=2s), by(TQf)(x1; x2) = f(2sx1 − k1; 2sx2 − k2). The Multi-scale Ridgelet with index � = (Q; ') is then

� = 2s TQ(w &'):

In short, one transports the normalized, windowedorthoridgelet.Letting Qs denote the dyadic squares of side 2−s, we

can deLne the subcollection of Monoscale Ridgeletsat scale s:

Ms = {(Q; '): Q∈Qs; '∈*}:It is immediate from the orthonormality of theridgelets that each system of monoscale ridgeletsmakes tight frame, in particular obeying the Parsevalrelation∑�∈Ms

〈 �; f〉2 = ‖f‖2L2 :

It follows that the dictionary of multiscale ridgelets atall scales, indexed by

M=⋃s¿1

Ms

is not frameable, as we have energy blow-up:∑�∈M

〈 �; f〉2 =∞: (3.1)

TheMultiscale Ridgelets dictionary is simply too mas-sive to form a good analyzing set. It lacks inter-scaleorthogonality- (Q;') is not typically orthogonal to (Q′ ; '′) if Q and Q′ are squares at di>erent scalesand overlapping locations. In analyzing a functionusing this dictionary, the repeated interactions withall di>erent scales causes energy blow-up (3.1).The construction of curvelets solves this problem by

in e>ect disallowing the full richness of the MultiscaleRidgelets dictionary. Instead of allowing all di>erentcombinations of ‘lengths’ and ‘widths’, we allow onlythose where width ≈ length2.

3.2. Subband 9ltering

Our remedy to the ‘energy blow-up’ (3.1) is todecompose f into subbands using standard Llter-bank ideas. Then we assign one speciLc monoscaledictionary Ms to analyze one speciLc (and speciallychosen) subband.We deLne coronae of frequencies |)| ∈ [22s; 22s+2],

and subband Llters -s extracting components of f inthe indicated subbands; a Llter P0 deals with frequen-cies |)|6 1. The Llters decompose the energy exactlyinto subbands:

‖f‖22 = ‖P0f‖22 +∑s

‖-sf‖22:


The construction of such operators is standard [32];the coronization oriented around powers 22s isnonstandard—and essential for us. Explicitly, we builda sequence of Llters /0 and 02s = 24s0(22s·); s =0; 1; 2; : : : with the following properties: /0 is a low-pass Llter concentrated near frequencies |)|6 1;02s

is bandpass, concentrated near |)| ∈ [22s; 22s+2]; andwe have

|/0())|2 +∑s¿0

|V(2−2s))|2 = 1; ∀):

Hence, -s is simply the convolution operator -sf =02s ∗ f.

3.3. De9nition of curvelet transform

Assembling the above ingredients, we are able tosketch the deLnition of the Curvelet transform. We letM ′ consist ofM merged with the collection of integraltriples (s; k1; k2; e) where s6 0; e∈{0; 1}, indexingall dyadic squares in the plane of side 2s ¿ 1.

The curvelet transform is a map L2(R)2 �→ ‘2(M ′),yielding Curvelet coe@cients (1�: �∈M ′). Thesecome in two types.At coarse scales we have wavelet coe@cients.

1� = 〈Ws;k1 ;k2 ;e; P0f〉; � = (s; k1; k2)∈M ′ \M;

where each Ws;k1 ;k2 ;e is a Meyer wavelet, while at 9nescale we have Multiscale Ridgelet coe@cients of thebandpass Lltered object:

1� = 〈-sf; �〉; �∈Ms; s= 1; 2; : : : :

Note well that for s¿ 0, each coe@cient associatedto scale 2−s derives from the subband Lltered versionof f−-sf—and not from f.Several properties are immediate;

• Tight frame:‖f‖22 =

∑�∈M ′

|1�|2:

• Existence of coe;cient representers (frame ele-ments): There are �� ∈L2(R2) so that

1� ≡ 〈f; ��〉:• L2 reconstruction formula:

f =∑�∈M ′

〈f; ��〉��:

• Formula for frame elements: For s6 0; �� =P03s;k1 ;k2 , while for s¿ 0,

�� = -s �; �∈Qs: (3.2)

In short, Lne-scale curvelets are obtained by band-pass Lltering of Multiscale Ridgelets coe@cientswhere the passband is rigidly linked to the scale ofspatial localization.

• Anisotropy scaling law: By linking the Llter pass-band |)| ≈ 22s to the scale of spatial localization2−s imposes that (1) most curvelets are negligiblein norm (most multiscale ridgelets do not survivethe bandpass Lltering -s); (2) the nonnegligiblecurvelets obey length ≈ 2−s while width ≈ 2−2s.In short, the system obeys approximately the scal-ing relationship

width ≈ length2:

Note: It is at this last step that our 22s coronizationscheme comes fully into play.

• Oscillatory nature: Both for s¿ 0 and s6 0, eachframe element has a Fourier transform supported inan annulus away from 0.

4. Digital transforms

4.1. Ridgelet and Radon transforms

There is an intimate relationship between the Radontransform and the ridgelet transform. In some sense,ridgelet analysis is a kind of wavelet analysis in theRadon domain.Let a;�;b be a ridgelet

a;�;b(x1; x2)

=a−1=2 (−x1 sin �+ x2 cos �− b)=a) (4.1)

and deLne a ridgelet coe@cient by

Rf(a; �; b) =∫

f(x1; x2) a;�;b(x1; x2) dx1 dx2: (4.2)

A simple change of variables shows that

Rf(a; �; b) =∫

Rf(t; �)a−1=2 ((t − b)=a) dt;

where Rf is the Radon transform [15] deLned by

Rf(t; �) =∫

f(x1; x2)$(−x1sin �

+x2cos �− t) dx1 dx2: (4.3)


In short, the ridgelet transform is precisely the appli-cation of a one-dimensional wavelet transform to theslices of the Radon transform where the angular vari-able � is constant and t is varying.A similar connection exists with the orthonormal

ridgelet transform as well. Letting -+ be the fractionalderivative operator

-+h=12#

∫ ∞

−∞|!|1=2h(!)ei!t d!; (4.4)

we introduce the fractionally di>erentiated Radontransform deLned as

R= (-+ ⊗ I)R; (4.5)

where R is the Radon transform (4.3), and with -+

acting on t and the identity I on �. It is well known[15] that R is an isometry

[Rf; Rg] = 〈f; g〉; (4.6)

where the notation [·; ·] denote the normalized innerproduct in L2(dt d�)

[F;G] =14#

∫F(t; �)G(t; �) dt d�:

Let (&') be an orthobasis of ridgelets (2.1) and W'

be the tensor wavelet basis

W'(t; �) = ( j;k ⊗ w(i;‘)(t; �); '= (j; k; i; ‘; ();

(4.7)

where the range of ' is as in (2.1). With these nota-tions, it is an easy calculation to show that

R&' = PRW';

where PRW' is the antipodally-symmetrized versionof W'

(PRW')(t; �) = (W'(t; �) +W'(−t; �+ #))=2:

Then

�':=〈f; &'〉= [Rf; PRW']: (4.8)

In addition, we may pass the operator (-+ ⊗ I) toPRW' and obtain

�' = [Rf; (-+ ⊗ I)PRW']:

This gives

�' = [Rf; (-+ ⊗ I)PRW'] = [Rf; PRW+' ]; (4.9)

with

W+' = +

j; k ⊗ w(i;‘; +

j; k = -+ j;k : (4.10)

Identities (4.8) and (4.9) also justify our interpretationof ridgelet analysis as some kind wavelet analysis inthe Radon domain.

4.2. Strategy

Following upon the previous section, a sensiblestrategy for a digital ridgelet transform would in-volve the development of digital wavelet and Radontransforms. Once, we have a digital ridgelet trans-form, digital curvelet transforms follow rather easily,see [30]. Digital wavelet transforms are now routine.However, developing a highly accurate and fast dig-ital Radon transform is challenging as making senseof geometrical concepts such as line integrals on dis-crete Cartesian arrays is not straightforward. Indeed,these di@culties together with the importance ofthe Radon transform in many problems of scientiLcinterest spurred an extensive literature on this subject.Many of the proposed methods rely on the Pro-

jection Slice Theorem which states that the Radontransform may be obtained by inverting the two-dimensional Fourier transform along radial lines

f(−' sin �; ' cos �) =∫

Rf(�; t)e−i't dt:

Continuing on this line of thinking, one would needto introduce a collection 8 of ‘polar samples’ andevaluate the Fourier transform

F()) =∑n1 ; n2

f(n1; n2)e−i()1n1+)2n2)

at each point )∈8. For a polar grid of size N , a naiveevaluation would require of the order of N 2 operationswhich is prohibitive for large values of N . Anotherissue is that image data are often given a Cartesiangrids and classical polar grids may not be well-adaptedto this data structure.In this paper, we follow a recent approach devel-

oped in [1] which goes by the name of Pseudo-PolarFFT. The Pseudo-Polar FFT deLnes a Rectopolar gridwhich is adapted to Cartesian grids and introduces aninterpolation scheme allowing a fast evaluation of theFourier transform on that grid.


Fig. 2. Rectopolar lines.

4.3. Pseudo-Polar FFT

The approach [1] starts with a Rectopolar grid alsoknown under the name of theConcentric Squares grid.Given a Cartesian array, (k1; k2);−n=26 k1; k2 ¡n=2,we deLne 2n radial lines joining pairs of symmetricpoints from the boundary of the square. Fig. 2 showsradial lines in Rectopolar coordinates; we will refer tothese as Rectopolar lines and it will be convenient todistinguish between basically vertical and horizontallines as illustrated in our Lgure.We then introduce the Rectopolar grid )p

‘;m where ‘indexes Rectopolar lines, m the position on that line,and p is a gender token equal to 1 if the Rectopolarline is basically vertical, and 2 otherwise. DeLne• )1‘;m as the intersection of the ‘th ‘vertical’ Rec-topolar line with the mth horizontal line (see Fig.2(a)), and

• )2‘;m is the intersection of the ‘th ‘horizontal’Rectopolar line with the mth vertical line (seeFig. 2(b)).

The meaning of the index ‘ is as follows: the coordi-nates of the points )p

‘;m are given by

)1‘;m = (2‘m=n; m); )2‘;m = (m; 2‘m=n);

−n=26 ‘; m¡n=2: (4.11)

From now on, we will let 8 denote the full Rectopolargrid. This grid is pictured in Fig. 3.Obviously, the sampled angular directions are not

equispaced; the sampling is a little Lner near the

Fig. 3. Rectopolar grid.

‘corners’. To be speciLc, let f()) be the Fouriertransform of f(x) and put

fp‘;m = f(2#)p

‘;m=n):

With these notations and forp=1, say, f1 are sampledvalues of f(−' tan �; ') on the Cartesian grid

'= 2#m=n; tan �= ‘=n; −n=26 ‘; m¡n=2:

A similar statement applies to f2. The sampling isirregular in � but equispaced in the slope tan �.


4.4. Interpolation

In this section, we work with a Cartesian arrayf(n1; n2); 06 n1; n26 n−1. The FFT of (f(n1; n2))gives Fourier samples f(k1; k2) on the Cartesian gridk=(k1; k2); −n=26 k1; k2 ¡n=2 and we now need toaddress the problem of resampling the Fourier trans-form on the Pseudo-Polar grid 8. Interpolating theFourier transform is a delicate matter as this is a veryoscillatory object.We view f(k1; k2) as samples from the trigonomet-

ric polynomial F deLned by

F(!1; !2) =n−1∑n1=0

n−1∑n2=0

f(n1; n2)e−i(!1n1+!2n2): (4.12)

Then

f(k1; k2) =n−1∑n1=0

n−1∑n2=0

f(n1; n2)e−i2#(k1n1+k2n2)=n

= F(2#k1=n; 2#k2=n): (4.13)

Consider the restriction of this two-dimensionaltrigonometric polynomial to the line !2 = 2# · m=n

Pm(!):=F(!;

2#mn

): (4.14)

Of course, Pm is a trigonometric polynomial of degreen which might be expressed as

Pm(!) =n−1∑u=0

cue−iu!:

Now, because Pm is the restriction of F , its coe@cientsare, of course, given by

cn1 =n−1∑n2=0

f(n1; n2)e−i(2#m=n)n2 ; 06 n16 n− 1:

We then need to evaluate this polynomial at thepoints

!‘ = 1 · ‘; 1= (2#=n) · 2m=n; −n=26 ‘¡n=2;

see (4.11). This gives

f1‘;m = Pm(1 · ‘) =

n−1∑u=0

cue−i1u‘:

In short, our interpolation calls for the resamplingof a one-dimensional trigonometric polynomial at a

sampling rate, which is below the Nyquist rate, exceptat the ‘borders’ for which m=−n=2.Introduce the Fractional Fourier Transform (FrFT)

of a vector x

xk = (F1x)k =n−1∑j=0

xje−i1j k ; 06 k6 n− 1:

For Lxed p and m, the sample values of (f1‘;m)‘ may

naturally be interpreted as an FrFT. Fix m, then

f1k−n=2;m = (F1d)k ; 06 k6 n− 1;

where d is the vector

du = cuei1un=2; 06 u6 n− 1:

Bailey and Swartztrauber [2] have proposed an al-gorithm to compute the FrFT of a vector of length nin O(n log n) multiplications and additions. Therefore,for each horizontal line or vertical line m, the eval-uation of f

p‘;m requires a number of multiplications

and additions of the order n log n. There are a totalof 2n such lines and, therefore, the complexity of thePseudo-Polar FFT is N logN for a Cartesian array ofsize N = n2.This approach is that taken in [1] and we reiterate its

main property; the Pseudo-Polar FFT gives an exactevaluation of F (4.12) at each )∈8.

4.5. Interpretation: Slant Stack transform

DeLne the Slant Stack transform of an object f asfollows:

(Sf)(t; �) =∫

f(u; t + u tan �) du;

−#=46 �6 #=4: (4.15)

That is, the Slant Stack is a line integral along Lt; � ={(u; t + u tan �); u∈R}. There is a similar deLnitionfor � in the range [#=4; 3#=4], namely,

(Sf)(t; �) =∫

f(u cot �+ t; u) du;

#=46 �6 3#=4: (4.16)

Not surprisingly there is a relationship between theone-dimensional Fourier transform of S and the


two-dimensional Fourier transforms of f. Indeed, thefollowing equality holds for �∈ [− #=4; #=4]

(Sf)(t; �) =12#

∫ ∫f(−' tan �; ')ei't d': (4.17)

In other words, inverting the one-dimensional Fouriertransform along the Rectopolar line (' tan �; '); '∈Rgives the Slant Stack. Of course, the Slant Stack trans-form is also directly related to the Radon transform as

Sf(t; �) = cos � · Rf(t cos �; �):We let T denote the Pseudo-Polar FFT which maps

a Cartesian array (f(n1; n2)) into fp‘;m = F()p

‘;m)(4.12) and F−1

1 be the one-dimensional inverseFourier transform applied to each of the 2n Rectopolarlines. The identity (4.17) suggests that the compo-sition F−1

1 T is a discrete analog of the Slant Stacktransform for Cartesian arrays. In fact, it is possibleto deLne a discrete Slant Stack Sn transform with thisproperty

Sn =F−11 T: (4.18)

4.6. Discrete slant stack

Put

(Snf)(u; �) =∑u

f(u; v+ u tan �)

where for a Lxed u∈{−n=2; : : : ; n=2 − 1}, the func-tion f(u; t); t ∈R interpolates the discrete vectorf(u; v); v = −n=2; : : : ; n=2 − 1. Let u be Lxed andlet g(−n=2); g(−n=2+ 1); : : : ; g(n=2− 1) be the sam-ples values of f(u; v). We pad this vector with twozero-vectors of size n=2 each, thereby deLning g(v)=0for v∈{−n; : : : ; −n=2− 1} and v∈{n=2; : : : ; n− 1}.Let G(t);−n6 t ¡n be that trigonometric polyno-mial of degree 2n− 1 such that

G(v) = g(v); v=−n; : : : ; n− 1:

Then G is given by

G(t) =n−1∑k=−n

gkei#kt=n; gk =

12n

∑v

gve−i#kv=n;

or equivalently,

G =∑v

g(v)K2n(t − v);

where K2n is the Dirichlet kernel

K2n(t) = e−i#t=2n sin #t2n sin(#t=2n)

:

Then [1] shows that the identity (4.18) holds withthis special choice of trigonometric interpolant.Note that we used zero-padding to avoid periodic

e>ects. Had we ignored this, our trigonometric inter-polation would warp lines around the square as it as-sumes periodicity. This point is illustrated in Fig. 4(a).The padding gives a rectangular array of size n by 2n

or 2n by n depending on whether we consider integralson lines whose angle � with the horizontal axis is inthe range �∈ [ − #=4; #; 4] or �∈ [#=4; 3#; 4]. DeLnethe discrete Slant Stack grid as follows:

tk = k; −n6 k ¡n;

tan �‘ = ‘=n; −n=26 ‘¡n=2: (4.19)

Then letting Sn be the Discrete Slant Stack which maps

f(n1; n2) �→ (Snf)(tk ; �‘);

[1] proves that Sn obeys (4.18). Fig. 4 provides agraphical representation of the identity (4.18): for each�‘, taking the inverse FFT along the Rectopolar lines(we have 2n equispaced points) gives the Slant Stackfor t = −n; : : : ; n − 1, and corresponding slope. Inshort, the discrete Slant Stack Sn takes a Cartesian ar-ray of size n by n and returns an array of size 2n by2n in two batches of size 2n by n each.In Fig. 4, we chose to display ‘long’ rectangles

merely reSecting the size of the zero-padded Cartesianarrays. We hope that this does not introduce confusionin the reader’s mind as this Lgure seems to indicatethat we only consider slopes in the range [− 1

2 ;12 ] and

(−∞;−2)∪ (2;∞): The correct geometrical interpre-tation of the Pseudo-Polar FFT is that we oversamplethe Rectopolar grid by a factor 2, namely, we evaluateF(!1; !2) at the samples

(!1; !2) =(

#n · ‘m′

n ; #n · m′

);

(!1; !2) =(

#n · m′; #

n · ‘m′n

);

−n=26 ‘¡n=2; −n6m′ ¡n: (4.20)


Fig. 4. Slant Stack and polar FFT.

In comparison, the sampling suggested by Fig. 3 isof the type (!1; !2) = (2#=n · ‘m=n; 2#n · m), where−n=26 ‘; m¡n=2.

4.7. Wavelet transforms

If one wishes to calculate a table of discrete ridgeletcoe@cients as in (4.2), we simply apply a discretewavelet transform to the columns of Snf which givesthe array of coe@cients 1= (1p

j;k(‘)); this is an arrayof size 2n by 2n in two batches of size 2n by n each.The discrete implementation of the orthonormal

ridgelet transform proceeds a little di>erently. Withthe notations of the last section, we compute the‘fractionally’ di>erentiated Slant Stack

S+n f =F−1DTf;

whereD and T are, respectively, the density weightingand Pseudo-Polar FFT matrices. We then apply dis-crete wavelet transforms both to the rows (t-variable)and columns (�-variable) of S+

n f. The result is anarray of coe@cients 1 = (1p

j1 ;k1;j2 ;k2 ) with (j1; k1) and(j2; k2) denoting the scale and location variables of thewavelet transforms in the t-variable and �-variable,respectively.

4.8. Pseudo-Ridgelets

By substituting the Radon transform with the SlantStack, we slightly changed the deLnition of the ridgelettransform. The discrete transform computes a discreteanalog of the pseudo-ridgelet transform deLned asfollows:

Rf(a; �; b)

= cos �∫

f(x1; x2)

×a−1=2 (−x1tan �+ x2 − b

a

)dx1 dx2

=∫

Sf(t; �)a−1=2 ((t − b)=a) dt

for �∈ (−#=4; #=4) and likewise for �∈ (#=4; 3#=4).In other words, with a�;�;b as in (4.1), we have

Rf(a; �; b) =√cos � 〈f; a�;�;b〉; a� = a=cos �:

We close this section with an important point. Thediscrete ‘pseudo-ridgelets’, i.e. the Riesz representersof the discrete transform, are not ridge functions. Theyare not of the form &(a1n1 + a2n2).


4.9. Inversion

We now brieSy discuss the strategy for inverting thepseudo-Polar FFT. A naive approach would consist ininterpolating the sample values of Pm(1 · ‘) back toPm(‘); that is, one would determine the coe@cients ofPm and apply the one-dimensional FFT to Lnd Pm(‘).The problem with this approach is that it is ill-posed.In fact, that is not really an interpolation, but rather,an extrapolation problem. For small values of 1, tinypertubations of the samples P(1‘) near the origin maycause large variations away from the origin.Let L be the Cartesian to polar conversion which

maps f(k1; k2) into fp‘;m so that with the nota-

tions of this section, T = LF2, where F2 is thetwo-dimensional FFT. The linear mapping L corre-sponds to a change of coordinates and has eigenval-ues decaying like a power-law. Indeed, the continu-ous analog maps f into f(r; �) = f(r cos �; r sin �).Multiplying f with the square root of the Jacobian,i.e. r1=2, gives an isometry and we then introduce adensity weighting matrix D which mimics this renor-malization. Recall that for samples f

p(‘; m); m may

be thought of as a radial index and ‘ as an angular orslope index. Put

fp(‘; m)= fp(‘; m)

√wp

‘;m; wp‘;m=

{ |m|2n2 ; m �=0;18n2 ; m= 0:

(4.21)

We refer the reader to [1] for a discussion of theseweights. In the vocabulary of numerical linear algebra,D is a preconditioning matrix so that DL is nearly anisometry.We then invert the Cartesian to Polar conversion

y=Lx using a classical iterative numerical algorithm.Starting with x(0) = 0, we inductively deLne

r(n) = D(y − Lx(n−1));

x(n) = '(n)L∗r(n) + x(n−1):

The matrix DL is well conditioned as one can observeC1‖x‖6 ‖DLx‖6 C2‖x‖ with C2=C1 6 1:1, see thediscussion in [1]. In our experiments and for arrayswith dimensions ranging from 32 by 32 to 1024 by1024, we need about 4 iterations to achieve a relativeerror of order 10−4.

5. New nonlinear synthesis rules

5.1. Osher’s criticism

Roughly speaking, there are twomain lines of think-ing in the Leld of mathematical image processing. TheLrst is inspired by recent development in Computa-tional Harmonic Analysis (CHA) such as wavelets,wavelet packets, local cosine bases, etc. The secondschool has a very di>erent viewpoint, namely, that ofpartial di>erential equations (PDE). In recent years,following upon the work of Mumford and Shah, thelatter school pioneered new concepts and processingmethods which became widely popular. We singleout anisotropic di>usion, mean curvature motion, andTotal-Variation denoising to name just a few of theseideas. Morel and Solimini [26] give an excellent re-view of the main results and developments in this Leld.Our intention is not to oppose both the PDE and CHAapproches. Rather, we merely observe that there existtwo distinct sources of inspiration.Despite their great achievements, image process-

ing methods based on CHA have been thus far re-ceived with skepticism by the latter group. This isespecially true in signal or image denoising. In fact,we have often heard that Total-Variation approachesare vastly superior to CHA methods for recoveringedges from noisy data. For instance, Osher and hiscolleagues tend to oppose CHA methods arguing that,to put it mildly, everything that has to do with com-putational harmonic analysis tends to produce arti9-cial oscillations near discontinuities even though theoriginal signal=image may be Sat on both sides of thediscontiunity; they would refer to this phenomenonas a ‘pseudo-Gibbs phenomenon’. The terminology‘side-band e>ect’ seems more appropriate.We will not argue with this objection. Rather, we

will develop nonlinear rules for partial synthesis whichare not subject to the above phenomenon. These non-linear rules combine CHA ideas and functionals incommon use in PDE community.

5.2. Minimum total variation synthesis

Roughly speaking, the Total Variation of a functionis the integral of the Euclidean norm of the gradient

‖f‖BV =∫

|∇f(x)| dx: (5.1)


For discrete data fi;j; 16 i; j6 n, this would take theform

‖f‖BV =∑i; j

√|($1f)i; j|2 + |($2f)i; j|2;

where D1 is the Lnite di>erence ($1f)ij=fi;j−fi−1; j

and similarly for $2.Letting T be a linear transform, we propose solving

minf

‖g‖BV subject to |(Tg− b)�|6 e�: (5.2)

From now on, we will take T to be the curvelet trans-form as to make the discussion more concrete.As suggested in the introduction we may deploy

this synthesis rule in a variety of contexts. We givetwo examples:• Compression: Here, we think of b as beingquantized values of Tf. We then use (5.2) to syn-thesize a ‘decompressed’ image. In this case, thevector of tolerance would depend on the propertiesof the Quantizer. For instance, in the case of ascalar quantizer with quantization q, we may wantto use something like e = q=2, or q.

• Noise removal: We wish to recover an object ffrom noisy data

yi; j = fi;j + Dzi; j ; zi; j i:i:d: N (0; 1): (5.3)

Here we think of b as the noisy coe@cients off; b = Ty, with b� ∼ N (��; D2

�). We may use athresholding rule to identify a set M ′ of signiLcantcoe@cients, |y�|¿ 'D�, say, and set a vector of tol-erance e�=1� ·D�, where 1� may take di>erent val-ues depending on whether or not the index � cor-responds to a signiLcant coe@cient; e.g. 1� = 1 for�∈M ′ and 1� = ' for � �∈ M ′.In [28], the authors proposed aminimumTotal Vari-

ation approach for denoising image data. Given dataof the form (5.3), they suggest solving

ming

‖g‖BV subject to ‖g− y‖2‘2 =n2 = D2: (5.4)

The approach (5.2) is very di>erent and combinesthresholding rules and minimum variation ideas. Un-like classical thresholding rules, it does not set thenonsigniLcant coordinates to zero. Instead, (5.2) willtypically input small values to cancel oscillations near

the discontinuities and eliminate ripples while preserv-ing the strong features of the image such as edges.The formulation (5.2) was suggested in [29]. Other

researchers proposed to combine Total Variation withthe wavelet transform, see [18] and more recently [23].For related ideas although in a very di>erent direction,check [12].

5.3. A tantalizing perspective

Consider the following model of images with edges:F is the class of C2 objects which may be discontinu-ous along C2 edges. Then, the thresholding of curveletcoe@cients is provably optimal for estimating objectsof that class as this procedure yields near-minimaxrates of convergence over F, see [11] for details.

Conjecture 1. The combined curvelet and minimumTotal Variation approach (5.2) is asymptoticallynearly optimal over F.

We view the challenge of proving=disprovingConjecture 1 as intellectually signiLcant. Indeed, ifConjecture 1 were true, this would establish the exis-tence of a procedure enjoying nearly optimal statisti-cal properties, and giving visually pleasing practicalresults at the same time.In a di>erent direction, however, we have accu-

mulated evidence pointing to the fact that unlikecurvelet-based procedures, TV-denoising is far frombeing optimal in such setting which leads to a secondconjecture.

Conjecture 2. The Minimum Total Variation ap-proach (5:4)—[28] achieves markedly suboptimalasymptotic rates in that setting. These rates are com-parable with those attainable by wavelet shrinkageprocedures.

5.4. Current implementation strategy

Although the minimization problem (5.2) is con-vex, one must frankly admit that it looks daunting.The issue is the dimension of the problem; (5.2) isan optimization in Rn2 with of the order of n2 log nconstraints. When n= 512, say, these dimensions areindeed very large!


A common approach for solving problems ofthe type (5.2) is to the so-called gradient descentmethod with projection; one iteratively alternatesunconstrained minimization problems and orthogo-nal projections onto the set of feasible constraints.This approach is realistic whenever the transform Tis orthogonal so that the projection may be easilydetermined, see [18]. When T is non-orthogonal,this projection will, in general, be the solution of aQuadratic Problem (QP) and each iteration may bevery expensive. More generally, we will also have torule out many other iterative methods such as Inte-rior Points methods because of the sheer size of theproblem and the cost of evaluating the constraints Tg.

5.5. Dual problem

Our approach for solving (5.2) or better said, ap-proximately solving (5.2), will rely on duality theory.Let L be the Lagrangian

L(f;*) = ‖f‖BV + *TKf; (5.5)

where * is a vector of Lagrange multipliers

*= (*±)¿ 0; Kf = (K+f;K−f);K±f = ±(Tf − b)− e:

The dual problem is of the form

sup*

G(*) (5.6)

where

G(*) = inffL(f;*): (5.7)

For strictly feasible constraints, strong duality holdsand one way to go about the primal problem is to Lnda saddle point (f∗; *∗) of the Lagrangian L(f;*)

inff

sup*

L(f;*) =L(f∗; *∗) = sup*

inff

L(f;*)

=G(*∗): (5.8)

The Usawa algorithm appraoches this problem with aniterative procedure. From *0¿ 0, inductively deLne

L(fn; *n) = minf

L(f;*n) (5.9)

*n+1 = P(*n + $Kfn); (5.10)

where P is the projection on the positive cone,P*=max(*; 0), and $ is a positive parameter.

The Usawa algorithm is, in fact, a gradient ‘descent’with projection algorithm applied to the dual problem.The word ‘descent’ is to be taken cautiously as oneseeks to maximize—and not minimize—the dual func-tion G. In our case, this procedure is relatively sim-ple as our optimization problem reduces to a sequenceof unconstrained optimization problems (5.9). Indeed,unconstrained Total-Variation minimization problemssuch as (5.9) are now classical. Motivated by stabil-ity issues, however, we will in fact apply the Usawaalgorithm to a modiLed version of the Lagrangian, aswe shall see below.From our viewpoint, the dual approach presents sev-

eral advantages: Lrst, the projection step is straight-forward; second, it only requires one evaluation of thecurvelet transform per iteration; and third, we neverneed to invert the curvelet transform.

5.6. Practical issues

5.6.1. Stability and convergenceVarious theorems are known about the conver-

gence of the Usawa algorithm to the saddle-point ofthe Lagrangian. Most of these results, however, as-sume 1-convexity of the functional to minimize. TheTotal Variation norm is not 1-convex as it grows onlylinearly at inLnity.In addition to convergence issues, we also need to

address stability. Indeed, the unconstrained minimiza-tion problem

minf

L(f;*) = minf

‖f‖BV + *TKf

may be unstable in the following sense: there mightbe directions along which the decay of the linear con-straints is faster than the growth of the Total-Variationfunctional!To get around these di@culties, we proposed to pe-

nalize the Lagrangian with a small quadratic term.We substitute the sequence of unconstrained problems(5.7) with a sequence of penalized minimization prob-lems

minfn

L(f;*n) + �n‖f‖2; (5.11)

where �n is a sequence of positive parameters decayingto 0. The penalty restores 1-convexity. Under suitableconditions on the sequence �n, the Usawa algorithmwith (5.11) in place of (5.7) can be shown to converge


to a saddle-point (f∗; *∗) of the Lagrangian (5.5),see [3].

5.6.2. Burn-inThe speed of convergence depends, of course, upon

the initial choice of parameters *0. The closer *0 to*∗ the better. Finding good guesses of the initial pa-rameter values is more an art than anything else, al-though there are some guiding principles.Let J (f) be the Total Variation norm ‖f‖BV . At

the saddle-point, the derivative of the Lagrangian withrespect to f vanishes

(∇fL)(f?;*?) = 0

and, therefore,

− (∇J )(f?) = K∗*?: (5.12)

Let f0 be an initial guess of the solution, e.g. obtainedafter applying the inverse transform to b (5.2). In theapplications considered, f0 might be the object ob-tained after inversion of the quantized coe@cients, orthe reconstruction obtained via classical thresholding.The optimum f∗ is in some sense close to f0 and(5.12) suggests Lnding * such that

−(∇J )(f0) = K∗*= T ∗*+ − T ∗*−

as this would—hopefully—give a parameter value‘close’ to *?. We use the near orthogonality of Tand set

*+0 − *−

0 =−T (∇J )(f0);

*+0 = max(−T (∇J )(f0); 0):

(5.13)

In our numerical experiments, the choice (5.13)turns out to be a good guess although it is possible toreLne it further by using a priori knowledge about theside-band e>ects we wish to eliminate and analyzingthe curvelet transform of (∇J )(f0). Because of spacelimitations, we do not detail these reLnements.

5.6.3. PracticeFrom a practical viewpoint only four or Lve itera-

tions are truly needed if one selects a reasonable ini-tial guess. In our numerical experiments, we did notobserve much change in the sequence (fn) after theLrst Lve iterations. After only one or two iterations,most of the wrinkles (side-band e>ects) near the dis-continuities are removed while the sharpness of edgesis preserved.

6. Numerical experiments

6.1. Ridgelet partial reconstruction

The Lrst experiment compares the quality ofwavelet, ridgelet and TV-post-processed ridgelet par-tial reconstructions. In this experiment, the originalLgure is a picture of a 512 by 512 sliced-Gaussian.We use only the 100 largest coe@cients for eachreconstruction. Unlike the wavelet reconstruction,the ridgelet reconstruction of the discontinuity isnear-perfect. On the ridgelet picture, however, one canstill distinguish oscillations near the edge. The min-imum Total Variation ridgelet reconstruction erasesthose side-bands and enhances the ‘decompressed’object. The ridgelet-TV reconstruction and the origi-nal Lgure are nearly indistinguishable.We may term the remaining approximation error

‘pixelization error’. This is a kind of error which doesnot have anything to do with real physical phenom-ena but rather, occuring because of issues such as dis-cretization and pixelization (Figs. 5 and 6).

6.2. Noisy Radon inversion

The second experiment is about reconstructing animage from noisy Radon data. This is a problem ofconsiderable interest in the literature of medical imag-ing (tomography).The data for this example are simulated (we do not

have access to real datasets). We start with the famousLogan–Shepp phantom and calculate the sinogram,i.e. the discrete Radon transform of the phantom. Weuse the tools introduced in Section 4 to compute thissinogram. The sinogram is then contaminated withGaussian white noise. Inverting the discrete Radontransform gives the noisy picture shown in Fig. 7(a).The noise is colored; that is, noisy pixel values are cor-related and the noise level increases with frequency.We then apply level dependent thresholding to the

wavelet and curvelet coe@cients. The noise level ineach coordinate is estimated by Monte-Carlo simu-lations. The choice of the threshold parameter is 3times the noise level except for the last dyadic sub-band where it is 4 times the noise level. We also ap-plied cycle-spinning to remove some artifacts whichare well known to occur with wavelet thresholdingprocedures. Cycle spinning is a kind of translation


Fig. 5. Partial reconstructions of a sliced-Gaussian.

invariant thresholding rule; this technique computesseveral individual reconstructions by applying shiftsto the noisy data and averages them out—after apply-ing the reverse shifts. We applied cycle-spinning inboth cases. (To be exact, the wavelet reconstruction isan average over 64 shifts while we used only 16 shiftsfor the curvelet reconstruction.) The results are pre-sented in Figs. 7, 8 and 9. Finally, these Lgures alsodisplay the combined curvelet and minimum TV re-

construction; this reconstruction does not involve anycycle-spins. With the notations of Section 5, we useda vector of tolerance with e� = D� for the signiLcantcoe@cients and e�=4D� for the others. We report herea PSNR of 35.2 because it is that of the results pre-sented in the various Lgures. We would like to pointout that for other iteration counts and initial choiceof parameters, we have been able to obtain PSNRs ashigh as 37.


Fig. 6. Reconstruction details.

Details are shown in Figs. 8 and 9. Edges are verysharp in both the curvelet and the curvelet-TV recon-structions. The curvelet-TV reconstruction is free ofartifacts and side-bands are removed as evidenced inFig. 10, which shows the middle horizontal scanlineof the phantom. The noisy and reconstructed scanlinesare plotted in Figs. 10(a)–10(c). Fig. 10(d) shows asuperposition of both the true and the curvelet-TV re-constructed scanlines. The reconstructed line is veryclose to the truth.Figs. 7–9 show some residual plots. These pictures

clearly demonstrate that curvelets are able to extractmeaningful signal at very high-frequency. This is pos-sible because unlike wavelets, curvelets correlate verywell with signal structures (edges) at high frequency,yielding improved signal to noise ratio.Finally, we also report the PSNR for each method

which we interpret as an ‘objective’ measure of per-formance. The values are given in Table 1. Curveletreconstructions exhibit a higher PSNR.

6.3. Denoising

After these encouraging results on synthetic exam-ples, the last experiment is about ‘denoising’ a realphotograph. The original image is contaminated withGaussian white noise and the processing steps arenearly the same as in the previous experiments (sameparameter values). We do not apply cycle-spinning.The results and details are shown in Figs. 11 and 12,respectivelyIn some sense, this numerical example downplays

the objection raised near the beginning of Section 5.1.It is certainly true that on sharp geometrical objectssuch as a white square on black background, one might

Table 1Table of PSNR values for the reconstruction of the Logan–Sheppphantom

Wavelets Curvelets Curvelets and TV

PSNR 32.23 34.46 35.2


Fig. 7. Reconstruction of the Logan–Shepp phantom.

perceive spurious oscillations when applying CHAmethods. However, on more, complicated and realis-tic imagery such as photographs with textures, theseartifacts are often barely visible.

7. Perspective

7.1. Second generation of curvelets

We believe that there is a simpler way to constructtight frames of curvelets. This construction does notuse ridgelets.

7.1.1. Tight framesFor each j¿ 0, consider a family of orthogonal and

compactly supported windows Gj;‘ obeying

|G0())|2 +∑j;‘

|Gj;‘())|2 = 1: (7.1)

We will use the windows Gj;‘ to localize the Fouriertransform near wedges of length 22j and width 2j. Inpolar coordinates we let

Gj;‘()) = w(2−2j|)|)H(2j�− #‘)


Fig. 8. Details.

and put G0()) = w0(|)|). Then, decomposition (7.1)would hold provided

|w0(|)|)|2 +∑j

|w(2−2j|)|)|2 = 1 (7.2)

and

2j+1−1∑‘=0

|H(2j�− #‘)2 = 1: (7.3)

As for Meyer wavelets [25], we will assume that w issupported on [2#=3; 8#=3] and H on [−5#=6; 5#=6], say.The window Gj;0 is then localized near the horizontalwedge

[#22j 6 |)|6 #22(j+1)]

×[−#=2 · 2−j6 �6 #=2 · 2−j]: (7.4)

For each ‘; Gj;‘ is obtained from Gj;0 by applying arotation. It is not di@cult to check that for each j¿ 1,the support of Gj;0 is contained in a rectangle of length


Fig. 9. Details.


0

Fig. 10. Scanline.

2#$122j and width 2#$22j with $1 = 1 + O(2−j) and$2 = 10#=9.This type of tiling of the frequency plane was

introduced by Fe>erman and used very success-fully thereafter to study properties of Fourier Inte-gral Operators, see [31, Chapter 9] and referencestherein.For the special pair (j; 0), we introduce the Carte-

sian lattice k = (k12−2j; $ · k22−j) with $ as before.Then for each pair J = (j; ‘), we let kJ be the lat-tice rotated by an angle equal to �J = # · ‘ · 2−j.Formally,

kJ = RJ k; RJ =

(cos �J −sin �J

sin �J cos �J

):

Observe that with these notations, Gj;l()) = Gj;0(R∗J )).

DeLne

��()) =1√$1$2

2−3j=2 e−i〈kJ ;)〉Gj;‘())

� = (j; ‘; k):

(7.5)

With the same notation as in Section 3, we also deLnecoarse scale curvelets ��0 as being proportional to themodulated blobs e−ik)G0()).Then with obvious notations, (��′)�′∈M ′ is a tight

frame:

∑�′

|〈f; ��′〉|2 = ‖f‖2L2(R2); f =∑�′

〈f; ��′〉��′ :

(7.6)


Fig. 11. Denoising Barbara.

This merely follows from the fact that 12#

√$1$2

eikJ ) is

an orthonormal basis of square integrable functionsdeLned over a rectangle containing the support of Gj;‘.Then,∑kJ

|〈f; ��〉|2 = 1(2#)4

∑kJ

|〈f; ��〉|2

=1

(2#)2

∫|f())|2 |Gj;‘())|2 d);

and (7.6) follows from (7.1).

In a more reLned construction, one may want tomerge symetric wedges, namely, Gj;l and Gj;l+2j thanksto the symmetry Gj;l(−)) = Gj;l+2j ()). Details willappear in the forthcoming paper.

7.1.2. Space-side pictureNext, we let �j;l be the inverse Fourier transform of1√$1$2

2−3jGj;l()). Then

��(x) = 23j=2�j;l(x − kJ ): (7.7)


Fig. 12. Details of Barbara.

The relationship Gj;l()) = Gj;0(R∗J )) gives

��(x) = 23j=2�j;0(R∗J x − k):

Now, the envelope of �j;0 is concentrated near a ver-tical ridge of length about 2−j and width 2−2j. DeLne�(j) by

�j;0(x) = �(j)(Djx)

where Dj is the diagonal matrix

Dj =

(22j 0

0 22j

): (7.8)

In other words, the envelope �(j) is supported near adisk of radius about one, and owing to the fact that Gj;0

is supported away from the axis )1 = 0; �(j) oscillatesalong the horizontal direction. In short, �(j) resemblesa 2-dimensional wavelet of the form (x1)’(x2) where and ’ are respectively father and mother-gendered

wavelets. Equation (7.7) becomes

��(x) = 23j=2�(j)(Dj(R∗J x − k)): (7.9)

Hence, we deLned a tight frames of elements whichare obtained by anisotropic dilations, rotations andtranslations of a single function. We list a few of theirproperties:• The parabolic scaling (7.8) yields anisotropy scal-ing law: The system is well-localized in space andobeys approximately the relationships

length ≈ 2−j; width ≈ 2−2j

and, therefore,

width ≈ length2:

• Directional sensitivity: The elements are orientedin the co-direction �J = # · ‘ · 2−j. Identifying thecurvelet length 2−j with the scale, we see that thereare about 2j directions at scale 2−j.

• Oscillatory nature. Curvelets elements display os-cillatory components across the ‘ridge’.


Fig. 13. Split every other two.

In short, this system exhibit all the geometrical andmultiscale features of the curvelet transform. We be-lieve that this is new system is an alternative to thecurvelet construction reviewed in Section 3. This leadsto a last conjecture.

Conjecture 3. This new system and ‘classical’curvelets enjoy nearly the same optimality propertiesfor representing C2 images with C2 edges [9;8].

We will conclude with a brief summary of themain points of the construction of second generationcurvelets:• Decompose the frequency domain into dyadicannuli |x| ∈ [2j; 2j+1);

• decompose each annulus into wedges �=#‘ ·2−j=2.That is, divide every other scale as shown inFig. 13;

• use oriented local Fourier bases on each wedge.

7.2. Ridgelet packets?

The previous section paves the way to a wealth ofnew multiscale systems. By arbitrary segmentation ofdyadic coronae, we can design tight frames with arbi-trary aspect ratios at arbitray scales. Splitting at everyscale would essentially give tight frames of ridgelets,at every other scales, tight frames of curvelets, and nosplitting essentially yields wavelet frames.

The reader familiar with the literature on waveletpackets, for example, may see that there is an oppor-tunity to develop trees of tight frames by applying Re-cursive Dyadic Partitioning ideas to each frequencycorona. Along this line of research, it would be ofinterest to develop fast algorithm +a 1a Coifman andWickerhauser for searching sparse decompositions inthese trees.

7.3. Three dimensions

This paper focused on new multiscale representa-tions of bivariate functions and their potential for prac-tical applications in image processing. However, mostof the ideas, tools and algorithms we presented in thispaper may easily be extended to higher dimensions,and especially to the three-dimensional case. We maydevelop 3D ridgelets, 3D curvelets and 3D digitaltransforms. This may lead to the development of con-venient tools to represent, analyze, store and transmitthree-dimensional datasets. The situation where thethird dimension is time as in digital video would beof special interest.

7.4. Epilogue

We have shown some of our numerical results toprominent researchers in the PDE-based image pro-cessing community. These results have been receivedwith a lot of enthusiasm and we take this as a sign ofencouragement.

Acknowledgements

We would like to acknowledge fruitful conversa-tions with David L. Donoho. This work was partiallysupported by an Alfred P. Sloan Fellowship.

References

[1] A. Averbuch, D.L. Donoho, R.R. Coifman, M. Israeli, J.Walden, Fast slant stack: a notion of random transformfor data on a cartesian grid which is rapidly computable,algebraically exact, geometrically faithful, and invertible,http://www-stat.stanford.edu/∼donoho/Reports/index.html.

[2] D.H. Bailey, P.N. Swartztrauber, The fractional fouriertransform and applications, SIAM Rev. 33 (1991) 389–404.

http://www-stat.stanford.edu/~donoho/Reports/index.html.


[3] D.P. Bertesekas, Nonlinear Programming, Athena ScientiLc,Belmont, MA, 1999.

[4] R.W. Bucccigrossi, E.P. Simoncelli, Image compression viajoint statistical characterization in the wavelet domain, IEEETrans. Image Process. 8 (12) (1999) 1688–1701.

[5] P.J. Burt, A.E. Adelson, The Laplacian pyramid as a compactimage code, IEEE Trans. Comm. 31 (1983) 532–540.

[6] E.J. CandTes, Harmonic analysis of neural netwoks, Appl.Comput. Harmonic Anal. 6 (1999) 197–218.

[7] E.J. CandTes, D.L. Donoho, Curvelets, Manuscript,http://www-stat.stanford.edu/ donoho/Reports/1998/curvelets.zip, 1999.

[8] E.J. CandTes, D.L. Donoho, Curvelets—a surprisingly e>ectivenonadaptive representation for objects with edges, in: A.Cohen, C. Rabut, L.L. Schumaker (Eds.), Curves andSurfaces, Vanderbilt University Press, Nashville, TN, 2000,pp. 105–120.

[9] E.J. Candes, D.L. Donoho, Ridgelets: the key tohigher-dimensional intermittency? Phil. Trans. Roy. Soc.London A 357 (1999) 2495–2509.

[10] E.J. CandTes, D.L. Donoho, Curvelets, multiresolutionrepresentation, and scaling laws, in: SPIE Conference onSignal and Image Processing: Wavelet Applications in Signaland Image Processing, Vol. VIII, San Diego, 2000.

[11] E.J. CandTes, D.L. Donoho, Recovering edges in ill-posedinverse problems: optimality of curvelet frames, TechnicalReport, Department of Statistics, Stanford University, 2000,Ann. Statist. 30 (3) (2002).

[12] T.F. Chan, H.M. Zhou, Optimal construction of waveletcoe>cients using total variation regularization in imagecompression, Technical Report, UCLA, 2000, CAMTechnical Report.

[13] M. Crouse, R. Nowak, R. Baraniuk, Wavelet-based statisticalsignal processing using hidden Markov models, IEEE Trans.Signal Process. 46 (1998) 886–902.

[14] I. Daubechies, Orthogonal bases of compactly supportedwavelets, Comm. Pure. Appl. Math. 41 (1988) 909–996.

[15] S.R. Deans, The Radon Transform and Some of itsApplications, Wiley, New York, 1983.

[16] D.L. Donoho, Orthonormal ridgelets and linear singularities,Technical Report, Department of Statistics, StanfordUniversity, 1998, submitted for publication.

[17] D.L. Donoho, Sparse components analysis and optimal atomicdecomposition, Technical Report, Department of Statistics,Stanford University, 1998.

[18] S. Durand, J. Froment, Artifact free signal denoising withwavelets, in: Proceedings of ICASSP 2001, 2001, IEEE 26thInternational Conference on Acoustics, Speech, and SignalProcessing.

[19] D. Esteban, C. Galand, Application of quandrature mirrorLlter to split-band voice coding schemes, in: Proceedingsof the IEEE International Conference on Acoustics Speech,Signal Processing, 1977, pp. 191–195.

[20] J. Grossman, A. Morlet, Decomposition of Hardy functionsinto square integrable wavelets of constant shape, SIAM J.Math. Anal. 15 (1984) 723–736.

[21] X. Huo, Sparse image representation via combinedtransforms, PhD Thesis, Stanford University, August 1999.

[22] P.G. LemariTe, Y. Meyer, Ondelettes et bases Hilbertiennes,Rev. Mat. Iberoamericana 2 (1986) 1–18.

[23] F. Malgouyers, A uniLed framework for image restoration,Technical Report, UCLA, CAM Technical Report, 2001.

[24] S.G. Mallat, A theory for multiresolution signaldecomposition: the wavelet representation IEEE Trans.Pattern Anal. Mach. Intell. 11 (1989) 674–693.

[25] Y. Meyer, Wavelets: Algorithms and Applications, SIAM,Philadelphia, 1993.

[26] J.M. Morel, S. Solimini, Variational Methods in ImageSegmentation, Progress in Nonlinear Di>erential Equationsand Their Applications, Vol. 14, Birkh]auser, Boston, 1995.

[27] B.A. Olshausen, D.J. Field, Emergence of simple-cellreceptive Leld properties by learning a sparse code for naturalimages, Nature 381 (1996) 607–609.

[28] L.I. Rudin, S. Osher, E. Fatemi, Nonlinear total variationbased noise removal algorithm, Physica D 60 (1992) 259–268.

[29] J.L. Starck, E.J. Cand+es, D.L. Donoho, Very high qualityimage restoration, in: M.A. Unser, A. Aldroubi, A.F. Laine(Eds.), Wavelet Applications in Signal and Image Processing,Vol. IX, Proceedings of the SPIE 4478, 2001.

[30] J.L. Starck, E. Cand+es, D.L. Donoho, The curvelet transformfor image denoising, IEEE Trans. Image Process. 11 (6)(2002) 670–684.

[31] E.M. Stein, Harmonic Analysis: Real-Variable Methods,Orthogonality, and Oscillatory Integrals, Vol. 30, PrincetonUniversity Press, Princeton, NJ, 1993.

[32] M. Vetterli, J. Kovacevic, Wavelets and Subband Coding,Prentice-Hall, Englewood Cli>s, NJ, 1995.

http://www-stat.stanford.edu/ donoho/Reports/1998/curvelets.zip

Newmultiscaletransforms,minimumtotalvariationsynthesis:...

Documents

Transcript of Newmultiscaletransforms,minimumtotalvariationsynthesis:...