ChangYV00a Wavelet Imagenes Denoising

8/14/2019 ChangYV00a Wavelet Imagenes Denoising

1/15

1532 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000

Adaptive Wavelet Thresholding for Image Denoisingand Compression

S. Grace Chang, Student Member, IEEE, Bin Yu, Senior Member, IEEE, and Martin Vetterli, Fellow, IEEE

AbstractThe first part of this paper proposes an adaptive,data-driven threshold for image denoising via wavelet soft-thresh-olding. The threshold is derived in a Bayesian framework, and theprior used on the wavelet coefficients is the generalized Gaussiandistribution (GGD) widely used in image processing applications.The proposed threshold is simple and closed-form, and it is adap-tive to each subband because it depends on data-driven estimatesof the parameters. Experimental results show that the proposedmethod, called BayesShrink, is typically within 5% of the MSEof the best soft-thresholding benchmark with the image assumedknown. It also outperforms Donoho and Johnstones SureShrinkmost of the time.

The second part of the paper attempts to further validaterecent claims that lossy compression can be used for denoising.The BayesShrink threshold can aid in the parameter selectionof a coder designed with the intention of denoising, and thusachieving simultaneous denoising and compression. Specifically,the zero-zone in the quantization step of compression is analogousto the threshold value in the thresholding function. The remainingcoder design parameters are chosen based on a criterion derivedfrom Rissanens minimum description length (MDL) principle.Experiments show that this compression method does indeed re-move noise significantly, especially for large noise power. However,it introduces quantization noise and should be used only if bitratewere an additional concern to denoising.

Index TermsAdaptive method, image compression, image de-noising, image restoration, wavelet thresholding.

I. INTRODUCTION

AN IMAGE is often corrupted by noise in its acquisition or

transmission. The goal of denoising is to remove the noise

while retaining as much as possible the important signal fea-

tures. Traditionally, this is achieved by linear processing such

as Wiener filtering. A vast literature has emerged recently on

Manuscript received January 22, 1998; revised April 7, 2000. This workwas supported in part by the NSF Graduate Fellowship and the Univer-sity of California Dissertation Fellowship to S. G. Chang; ARO GrantDAAH04-94-G-0232 and NSF Grant DMS-9322817 to B. Yu; and NSF Grant

MIP-93-213002 and Swiss NSF Grant 20-52347.97 to M. Vetterli. Part of thiswork was presented at the IEEE International Conference on Image Processing,Santa Barbara, CA, October 1997. The associate editor coordinating thereview of this manuscript and approving it for publication was Prof. Patrick L.Combettes.

S.G. Changwas with theDepartmentof ElectricalEngineering andComputerSciences, University of California, Berkeley, CA 94720 USA. She is now withHewlett-Packard Company, Grenoble, France (e-mail: [email protected]).

B. Yu is with the Department of Statistics, University of California, Berkeley,CA 94720 USA (e-mail: [email protected])

M. Vetterli is with the Laboratory of Audiovisual Communications, SwissFederal Institute of Technology (EPFL), Lausanne, Switzerland and also withthe Department of Electrical Engineering and Computer Sciences, UniversityofCalifornia, Berkeley, CA 94720 USA.

Publisher Item Identifier S 1057-7149(00)06914-1.

signal denoising using nonlinear techniques, in the setting of

additive white Gaussian noise. The seminal work on signal de-

noising via wavelet thresholding or shrinkage of Donoho and

Johnstone ([13][16]) have shown that various wavelet thresh-

olding schemes for denoising have near-optimal properties in

the minimax sense and perform well in simulation studies of

one-dimensional curve estimation. It has been shown to have

better rates of convergence than linear methods for approxi-

mating functions in Besov spaces ([13], [14]). Thresholding is

a nonlinear technique, yet it is very simple because it operates

on one wavelet coefficient at a time. Alternative approaches to

nonlinearwavelet-based denoising can be found in, for example,[1], [4], [8][10], [12], [18], [19], [24], [27][29], [32], [33],

[35], and references therein.

On a seemingly unrelated front, lossy compression has been

proposed for denoising in several works [6], [5], [21], [25],

[28]. Concerns regarding the compression rate were explicitly

addressed. This is important because any practical coder must

assume a limited resource (such as bits) at its disposal for repre-

senting the data. Other works [4], [12][16] also addressed the

connection between compression and denoising, especially with

nonlinear algorithms such as wavelet thresholding in a mathe-

matical framework. However, these latter works were not con-

cerned with quantization and bitrates: compression results from

a reduced number of nonzero wavelet coefficients, and not froman explicit design of a coder.

The intuition behind using lossy compression for denoising

may be explained as follows. A signal typically has structural

correlations that a good coder can exploit to yield a concise rep-

resentation. White noise, however, does not have structural re-

dundancies and thus is not easily compressable. Hence, a good

compression method can provide a suitable model for distin-

guishing between signal and noise. The discussion will be re-

stricted to wavelet-based coders, though these insights can be

extended to other transform-domain coders as well. A concrete

connection between lossy compression and denoising can easily

be seen when one examines the similarity between thresholding

and quantization, the latter of which is a necessary step in a prac-tical lossy coder. That is, the quantization of wavelet coefficients

with a zero-zone is an approximation to the thresholding func-

tion (see Fig. 1). Thus, provided that the quantization outside

of the zero-zone does not introduce significant distortion, it fol-

lows that wavelet-based lossy compression achieves denoising.

With this connection in mind, this paper is about wavelet thresh-

olding for image denoising and also for lossy compression. The

threshold choice aids the lossy coder to choose its zero-zone,

and the resulting coder achieves simultaneous denoising and

compression if such property is desired.

10577149/00$10.00 2000 IEEE
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


2/15

CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1533

Fig. 1. Thresholding function can be approximated by quantization with azero-zone.

The theoretical formalizationof filtering additive iidGaussiannoise (of zero-mean and standard deviation ) via thresholding

wavelet coefficients was pioneered by Donoho and Johnstone

[14]. A wavelet coefficient is compared to a given threshold and

is setto zero if itsmagnitude is less than thethreshold;otherwise,

it is kept or modified (depending on the thresholding rule). The

threshold acts as an oracle which distinguishes between the

insignificant coefficients likely due to noise, and the significant

coefficients consisting of important signal structures. Thresh-

olding rules are especially effective for signals with sparse or

near-sparserepresentationswhereonlyasmallsubsetofthecoef-

ficients represents all or most of the signal energy. Thresholding

essentially creates a region around zero where the coefficients

are considered negligible. Outside of this region, the thresholded

coefficients are kept to full precision (that is, without quanti-

zation). Their most well-known thresholding methods include

VisuShrink [14] and SureShrink [15]. These threshold choices

enjoy asymptotic minimax optimalities over function spaces

such as Besov spaces. For image denoising, however, VisuShrink

is known to yield overly smoothed images. This is because its

threshold choice, (called the universal thresholdand

is the noise variance), can be unwarrantedly large due to its

dependence onthenumber ofsamples, , whichis morethanfor a typical test image of size . SureShrinkusesahybrid

of the universal threshold and the SURE threshold, derived from

minimizing Steins unbiased risk estimator [30], and has been

showntoperformwell.SureShrinkwillbethemaincomparisonto

the method proposed here, and, as will be seen later in this paper,

ourproposedthresholdoftenyieldsbetterresult.

Since the works of Donoho and Johnstone, there has been

much research on finding thresholds for nonparametric estima-

tion in statistics. However, few are specifically tailored for im-

ages. In this paper, we propose a framework and a near-op-

timal threshold in this framework more suitable for image de-

noising. This approach can be formally described as Bayesian,

but this only describes our mathematical formulation, not ourphilosophy. The formulation is grounded on the empirical ob-

servation that the wavelet coefficients in a subband of a natural

image canbe summarized adequately by a generalized Gaussian

distribution (GGD). This observation is well-accepted in the

image processing community (for example, see [20], [22], [23],

[29], [34], [36]) and is used for state-of-the-art image coders

in [20], [22], [36]. It follows from this observation that the av-

erage MSE (in a subband) can be approximated by the corre-

sponding Bayesian squared error risk with the GGD as the prior

applied to each in an iidfashion. That is, a sum is approximated

by an integral. We emphasize that this is an analytical approx-

imation and our framework is broader than assuming wavelet

coefficients are iid draws from a GGD. The goal is to find the

soft-threshold that minimizes this Bayesian risk, and we call our

method BayesShrink.

The proposed Bayesian risk minimization is subband-depen-

dent. Given the signal being generalized Gaussian distributed

and the noise being Gaussian, via numerical calculation a nearly

optimal threshold for soft-thresholding is found to be

(where is the noise variance and the signal vari-ance). This threshold gives a risk within 5% of the minimal risk

over a broad range of parameters in the GGD family. To make

this threshold data-driven, the parameters and are esti-

mated from the observed data, one set for each subband.

To achieve simultaneous denoising and compression, the

nonzero thresholded wavelet coefficients need to be quantized.

Uniform quantizer and centroid reconstruction is used on the

GGD. The design parameters of the coder, such as the number of

quantizationlevelsand binwidths,aredecidedbasedon a criterion

derived from Rissanens minimum description length (MDL)

principle [26]. This criterion balances the tradeoff between the

compression rate and distortion, and yields a nice interpretation

ofoperatingatafixedslopeontherate-distortioncurve.The paper is organized as follows. In Section II, the wavelet

thresholding idea is introduced. Section II-A explains the

derivation of the BayesShrink threshold by minimizing a

Bayesian risk with squared error. The lossy compression based

on the MDL criterion is explained in Section III. Experimental

results on several test images are shown in Section IV and

compared with SureShrink. To benchmark against the best

possible performance of a threshold estimate, the compar-

isons also include OracleShrink, the best soft-thresholding

estimate obtainable assuming the original image known, and

OracleThresh, the best hard-thresholding counterpart. The

BayesShrink method often comes to within 5% of the MSEs

of OracleShrink, and is better than SureShrink up to 8% most

of the time, or is within 1% if it is worse. Furthermore, the

BayesShrink threshold is very easy to compute. BayesShrink

with the additional MDL-based compression, as expected,

introduces quantization noise to the image. This distortion may

negate the denoising achieved by thresholding, especially when

is small. However, for larger values of , the MSE due to the

lossy compression is still significantly lower than that of the

noisy image, while fewer bits are used to code the image, thus

achieving both denoising and compression.

II. WAVELET THRESHOLDING AND THRESHOLD SELECTION

Let the signal be , where is someinteger power of 2. It has been corrupted by additive noise and

one observes

(1)

where are independent and identically distributed ( )

as normal and independent of . The goal is to

remove the noise, or denoise , and to obtain an estimate

of which minimizes the mean squared error (MSE),

MSE (2)
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


3/15


Fig. 2. Subbands of the 2-D orthogonal wavelet transform.

Let , , and ; that is,

the boldfaced letters will denote the matrix representation of the

signals under consideration. Let denote the matrix

of wavelet coefficients of , where is the two-dimensional

dyadic orthogonal wavelet transform operator, and similarly

and . The readersare referred toreferences suchas[23], [31] for details of the two-dimensional orthogonal wavelet

transform. It is convenient to label the subbands of the transform

as in Fig. 2. The subbands , , ,

are called the details, where is the scale, with being the

largest (or coarsest) scale in the decomposition, and a subband

at scale has size . The subband is the low

resolution residual, and is typically chosen large enough such

that and . Note that since the transform

is orthogonal, are also iid .

The wavelet-thresholding denoising method filters each co-

efficient from the detail subbands with a threshold function

(to be explained shortly) to obtain . The denoised estimate is

then , where is the inverse wavelet transform

operator.

There are two thresholding methods frequently used. The

soft-thresholdfunction (also called the shrinkage function)

sgn (3)

takes the argument and shrinks it toward zero by the threshold

. The other popular alternative is the hard-threshold function

(4)

which keeps the input if it is larger than the threshold ; oth-

erwise, it is set to zero. The wavelet thresholding procedure re-

moves noise by thresholding only the wavelet coefficients of thedetail subbands, while keeping the low resolution coefficients

unaltered.

The soft-thresholding rule is chosen over hard-thresholding

for several reasons. First, soft-thresholding has been shown to

achieve near-optimal minimax rate over a large range of Besov

spaces [12], [14]. Second, for the generalized Gaussian prior

assumed in this work, the optimal soft-thresholding estimator

yields a smaller risk than the optimal hard-thresholding esti-

mator (to be shown later in this section). Lastly, in practice, the

soft-thresholding method yields more visually pleasant images

over hard-thresholding because the latter is discontinuous and

yields abrupt artifacts in the recovered images, especially when

the noise energy is significant. In what follows, soft-thresh-

olding will be the primary focus.

While the idea of thresholding is simple and effective,

finding a good threshold is not an easy task. For one-dimen-

sional (1-D) deterministic signal of length , Donoho and

Johnstone [14] proposed for VisuShrinkthe universal threshold,

, which results in an estimate asymptotically

optimal in the minimax sense (minimizing the maximumerror over all possible -sample signals). One other notable

threshold is the SURE threshold [15], derived from minimizing

Steins unbiased risk estimate [30] when soft-thresholding

is used. The SureShrink method is a hybrid of the universal

and the SURE threshold, with the choice being dependent

on the energy of the particular subband [15]. The SURE

threshold is data-driven, does not depend on explicitly,

and SureShrink estimates it in a subband-adaptive manner.

Moreover, SureShrink has yielded good image denoising

performance and comes close to the true minimum MSE of the

optimal soft-threshold estimator (cf. [4], [12]), and thus will be

the main comparison to our proposed method.

In the statistical Bayesian literature, many works have con-centrated on deriving the best threshold (or shrinkage factor)based on priors such as the Laplacian and a mixture of Gaus-

sians (cf. [1], [8], [9], [18], [24], [27], [29], [32], [35]). With an

integral approximation to the pixel-wise MSE distortion mea-

sure as discussed earlier, the formulation here is also Bayesian

for finding the best soft-thresholding rule under the general-

ized Gaussian prior. A related work is [27] where the hard-

thresholding rule is investigated for signals with Laplacian and

Gaussian distributions.

The GGD has been used in many subband or wavelet-based

image processing applications [2], [20], [22], [23], [29], [34],

[36]. In [29], it was observed that a GGD with the shape param-

eter ranging from 0.5 to 1 [see (1)] can adequately describe the

wavelet coefficients of a large set of natural images. Our expe-

rience with images supports the same conclusion. Fig. 3 shows

the histogram of the wavelet coefficients of the images shown in

Fig. 9, against the generalized Gaussian curve, with the param-

eters labeled (the estimation of the parameters will be explained

later in the text.) A heuristic can be set forward to explain why

there are a large number of small coefficients but relativelyfew large coefficients as the GGD suggests: the small ones

correspond to smooth regions in a natural image and the large

ones to edges or textures.

A. Adaptive Threshold for BayesShrinkThe GGD, following [20], is

(5)

, , , where

and
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


4/15


(a) (b)

(c) (d)

Fig. 3. Histogram of the wavelet coefficients of four test images. For each image, from top to bottom it is fine to coarse scales: from left to right, they are theH H , H L , and L H subbands, respectively.

and is the gamma function. The pa-

rameter is the standard deviation and is the shape pa-

rameter. For a given set of parameters, the objective is to find

a soft-threshold which minimizes the Bayes risk,

(6)

where , and .

Denote the optimal threshold by ,

(7)

which is a function of the parameters and . To our knowl-

edge, there is no closed form solution for for this chosen

prior, thus numerical calculation is used to find its value. 1

Before examining the general case, it is insightful to consider

two special cases of the GGD: the Gaussian ( ) and the

1It was observed that for the numerical calculation, it is more robust to obtainthe value of T from locating the zero-crossing of the derivative, r ( T ) , thanfromminimizing r ( T ) directly. Inthe Laplaciancase,a recentwork[17] derivesan analytical equation for T to satisfy and calculates T by solving such anequation.

Laplacian ( ) distributions. The Laplacian case is particu-

larly interesting, because it is analytically more tractable and is

often used in image processing applications.

Case 1: (Gaussian) with . It is

straightforward to verify that

(8)

(9)

where

(10)

with the standard normal density function

and the survival function of the

standard normal .
http://-/?-http://-/?-


5/15
http://-/?-http://-/?-


6/15


(a)

(b)

Fig. 5. Thresholding for the Laplacian prior, with = 1 . (a) Compare theoptimal soft-threshold T ( ; 1 ) (), the BayesShrink threshold T ( )( 1 1 1 ), the optimal hard-threshold T ( ; 1 ) (-- -),and thethreshold T h (- 1 - 1 )against the standard deviation on the horizontal axis. (b) Their correspondingrisks.

With these insights from the special cases, the discussion now

returns to the general case of GGD.

Case 3: (Generalized Gaussian) The proposed thresholdin (12) has been found to work well for the general

case. Let . In Fig. 6(a), each dotted line ( ) is the op-

timal threshold for a given fixed , plotted against

on the horizontal axis. The values are

shown. The threshold, , is plotted with the

solid line (). The curve of the optimal threshold that lies

closest to is for , the Laplacian case,

while other curves deviate from as moves away from 1.

Fig. 6(b) shows the corresponding risks. The deviation between

the optimal risk and grows as moves away from

1, but the error is still within 5% of the optimal for the

curves shown in Fig. 6(b). Because the threshold depends

(a)

(b)

Fig. 6. Thresholding for the generalized Gaussian prior, with = 1 . (a)Compare the approximation T ( ) = = () with the optimalthreshold T ( ; ) for = 0 : 6 ; 1 ; 2 ; 3 ; 4 ( 1 1 1 ). The horizontal axisis the standard deviation, . (b) The optimal risks are in ( 1 1 1 ), and theapproximation in ().

only on the standard deviation and not on the shape parameter ,

it may not yield a good approximation for values of other than

the range tested here, and the threshold may need to be modifiedto incorporate . However, since for the wavelet coefficients

typical values of falls in the range [0.5, 1], this simple form of

the threshold is appropriate for our purpose. For a fixed set of

parameters, the curve of the risk (as a function of the threshold

) is very flat near the optimal threshold , implying that the

error is not sensitive to a slight perturbation near .

B. Parameter Estimation for Data-Driven Adaptive Threshold

This section focuses on the estimation of the GGD parame-

ters, and , which in turn yields a data-driven estimate of

that is adaptive to different subband characteristics.


7/15
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


8/15
http://-/?-http://-/?-http://-/?-


9/15


Now we state the model selection criterion, MDLQ:

with (31)

To find the best model, (32) is minimized over and to

find the corresponding set of quantized coefficients. In the im-

plementation, which is by no means optimized, this is done by

searching over , and for each , minimizing (32)

over a reasonable range of [such as

]. Typically, once a minimum (in )

has been reached, the MDLQ cost increase monotonically, thus

the search can terminate soon after a minimum has been de-

tected.

This MDLQ compression with BayesShrinkzero-zone selec-

tion is applied to each subband independently. The steps dis-cussed in this section are summarized as follows.

Estimate the noise variance , and the GGD standard

deviation .

Calculate the threshold , and soft-threshold the wavelet

coefficients.

To quantize the nonzero coefficients, minimize (32) over

and to find the corresponding quantized coefficients

, which is the compressed, denoised estimate of .

The coarsest subband is quantized differently in that it is

not thresholded, and the quantization with (32) assumes the uni-

form distribution. The coefficients are essentially local av-

erages of the image, and are not characterized by distributionswith a peak at zero, thus the uniform distribution is used for

generality. With the mean subtracted, the uniform distribution

is assumed to be symmetric about zero. Every quantization bin

(including the zero-zone) is of width , and the reconstruction

values are the midpoints of the intervals.

The MDLQ criterion in (32) has the additional interpretation

of operating at a specified point on the rate-distortion (R-D)

curve, as also pointed out by Liu and Moulin [21]. For a given

coder, one can obtain a set of operational rate-distortion points

. When there is a rate or distortion constraint, the con-

straint problem can be formulated into a minimization problem

with a Lagrange multiplier, . In this case, (32) can be

interpreted as operating at . Natarajan [25]and Liu and Moulin [21] both proposed to use compression

for denoising. The former operates at a constrained distortion,

, and the latter operates at on the

R-D curve. Both works recommend the use of any reasonable

coder while our coder is designed specifically with the purpose

of denoising.

IV. EXPERIMENTAL RESULTS AND DISCUSSION

The grayscale images goldhill, lena, bar-

bara and baboon are used as test images with different noise

levels . The original images are shown in

Fig. 9. Original images. From top left, clockwise: goldhill, lena, barbara andbaboon.

Fig. 9. The wavelet transform employs Daubechies least asym-

metric compactly-supported wavelet with eight vanishing mo-

ments [11] with four scales of orthogonal decomposition.

To assess the performance of BayesShrink, it is compared

with SureShrink [15]. The 1-D implementation of SureShrink

can be obtained from the WaveLab toolkit [3], and the 2-D

extension is straightforward. To gauge the best possible per-

formance of a soft-threshold estimator, these methods are also

benchmarked against what we call OracleShrink, which is thetruly optimal soft-thresholding estimator assuming the original

image is known. The threshold ofOracleShrinkin each subband

is

(32)

with known. To further justify the choice of soft-thresh-

olding over hard-thresholding, another benchmark, Ora-

cleThresh, is also computed. OracleThresh is the best possible

performance of a hard-threshold estimator, with subband-adap-

tive thresholds, each of which is defined as

(33)

with known. The MSEs from the various methods are com-

pared in Table I, and the data are collected from an average

of five runs. The columns refer to, respectively, OracleShrink,

SureShrink,BayesShrink,BayesShrinkwith MDLQ-based com-

pression, OracleThresh, Wiener filtering, and the bitrate (in bpp,

or bits-per-pixel) of the MDLQ-compressed image. Since the

main benchmark is against SureShrink, the better one of the

SureShrinkand BayesShrinkis highlighted in bold font for each

test set. The MSEs resulting from BayesShrinkcomes to within
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


10/15


TABLE IFOR VARIOUS TEST IMAGES AND VALUES, LISTS MSE OF (1) OracleShrink, (2) SureShrink, (3) BayesShrink, (4) BayesShrinkWITH MDLQ-COMPRESSION, (5) OracleThresh, AND (6) WIENER FILTERING THE LAST COLUMN SHOWS THE BITRATE (BITS PER

PIXEL) OF THE COMPRESSED IMAGE OF (4). AVERAGED OVER FIVE RUNS

5% ofOracleShrinkfor the smoother images goldhill and lena,

and are most of the time within 6% for highly detailed images

such as barbara and baboon (though it may suffer up to 20%

for small ). BayesShrinkoutperforms SureShrink most of the

time, up to approximately 8%. We observed in the experiments

that using solely the SURE threshold yields excellent perfor-

mance (sometimes yielding even lower MSE than BayesShrink

by up to 12%). However, the hybrid method ofSureShrink re-

sults at times in the choice of the universal threshold which can

be too large. As illustrated in Table I, all three soft-thresholding

methods outperforms significantly the best hard-thresholding

rule, OracleThresh.

It is not surprising that the SURE threshold and the

BayesShrink threshold yield similar performances. The SURE

threshold can also be viewed as an approximate optimal

soft-threshold in terms of MSE. For a particular subband of

size , following [15],

Sure

(34)

where denotes min , and the SURE threshold is de-

fined to be the value of minimizing Sure .

Recall

and s are iid . Conditioning on , by Steins

result,

Sure (35)

Moreover, as we have done before, if the distribution of s is

approximated by a GGD, then the distribution of s is approx-

imated by the mixture distribution of GGD and ; or

while follows a GGD and is independent of .By the Law of Large Numbers,

Sure (36)

Taking expectation with respect to the GGD on both sides of

(35), the risk can be written as

Sure

(37)
http://-/?-http://-/?-


11/15


Fig. 10. Comparing the performance of the various methods on goldhill with = 2 0 . (a) Original. (b) Noisy image, = 2 0 . (c) OracleShrink. (d) SureShrink.(e) BayesShrink. (f) BayesShrinkfollowed by MDLQ compression.

Comparing (37) with (36), one can conclude that

Sure is a data-based approximation to ,

and the SURE threshold, which minimizes Sure , is an

alternative to BayesShrink for minimizing the Bayesian risk.

We have also made comparisons with the Wiener filter, the

best linear filtering possible. The version used is the adaptive

filter, wiener2, in the MATLAB image processing toolbox,

using the default settings ( local window size, and the

unknown noise power is estimated). The MSE results are

shown in Table I, and they are considerably worse than the

nonlinear thresholding methods, especially when is large. The

image quality is also not as good as those of the thresholding

methods.

The MDLQ-based compression step introduces quantization

noise which is quite visible. As shown in the last column of

Table I, the coder achieves a lower bitrate, but at the expense


12/15


of increasing the MSE. The MSE can be even worse than the

noisy observation for small values of , especially for the highly

detailed images. This is because the quantization noise is sig-

nificant compared to the additive Gaussian noise. For larger ,

the compressed images can achieve noise reduction up to ap-

proximately 75% in terms of MSE. Furthermore, the bitrates

are significantly less than the original 8 bpp for grayscale im-

ages. Thus, compression does achieve denoising and the pro-posed MDLQ-based compression can be used if simultaneous

denoising and compression is a desired feature. If only the best

denoising performance were the goal, obviously using solely

BayesShrinkis preferred.

Note that the first-order entropy coding, , for

the bitrate of the quantized coefficients is a rather loose es-

timate. With more sophisticated coding methods (e.g. predic-

tive coding, pixel classification), the same bitrate could yield a

higher number of quantization level , thus resulting in a lower

MSE and enhancing the performance of the MDLQ-based com-

pression-denoise.

A fair assessment of the MDLQ scheme for quantization afterthresholding is the R-D curve used in Hansen and Yu [17] (see

http://cm.bell-labs.com/stat/binyu/publications.html). This R-D

curve is calculated using noiseless coefficients, and yields the

best possible in terms of R-D tradeoff when the quantization is

restricted to equal-binwidth. It thus gives an idea on how ef-

fective MDLQ is in choosing the tradeoff with respect to the

optimal. The closeness of the MDLQ point to this R-D lower-

bound curve indicates that MDLQ chooses a good R-D tradeoff

without the knowledge of the noiseless coefficients required in

deriving this R-D curve.

Fig. 10 shows the resulting images of each denoising method

for goldhill and (a zoomed-in section of the image

is displayed in order to show the details). Table II compares

the threshold values for each subband chosen by OracleShrink,

SureShrink and BayesShrink, averaged over five runs. It is clear

that the BayesShrink threshold selection is comparable to the

SURE threshold and to the true optimal threshold . Some

of the unexpectedly large threshold values in SureShrink comes

from the universal threshold, not the SURE threshold, and these

are placed in parentheses in the table. Table II(c) lists the thresh-

olds of BayesShrink, and the thresholds in parentheses corre-

spond to the case when , and all coefficients

have been set to zero. Table III tabulates the values of chosen

by for each subband of the goldhill image, , av-

eraged over five runs. TheMDLQ criterion allocates more levelsin the coarser, more important levels, as would be the case in a

practical subband coding situation. A value of indicates

that the coefficients have already been thresholded to zero, and

there is nothing to code.

The results for lena and are also shown. Fig. 11

shows the same sequences for a zoomed-in portion of

lena with noise . The corresponding results of

threshold selections and MDLQ parameters for lena with

noise are listed in Tables IV and V. Interested

readers can obtain a better view of the images at the website,

http://www-wavelet.eecs.berkeley.edu/~grchang/compressDe-

noise/.

TABLE IITHE THRESHOLD VALUES OF OracleShrink, SureShrink, AND BayesShrink,

RESPECTIVELY, (AVERAGED OVER FIVE RUNS) FOR THE DIFFERENTSUBBANDS OF GOLDHILL, WITH NOISE STRENGTH = 2 0

TABLE IIITHE VALUE OF m (AVERAGED OVER FIVE RUNS) FOR THE DIFFERENT

SUBBANDS OF GOLDHILL, WITH NOISE STRENGTH = 2 0

V. CONCLUSION

Two main issues regarding image denoising were addressed

in this paper. Firstly, an adaptive threshold for wavelet thresh-

olding images was proposed, based on the GGD modeling of

subband coefficients, and test results showed excellent perfor-

mance. Secondly, a coder was designed specifically for simul-

taneous compression and denoising. The proposed BayesShrink

threshold specifies the zero-zone of the quantization step of this

coder, and this zero-zone is the main agent in the coder which

removes the noise. Although the setting in this paper was in the
http://-/?-http://-/?-


13/15


Fig. 11. Comparing the performance of the various methods on lena with = 1 0 . (a) Original. (b) Noisy image, = 1 0 . (c) OracleShrink. (d) SureShrink.(e) BayesShrink. (f) BayesShrinkfollowed by MDLQ compression.

wavelet domain, the idea can be extended to other transform do-

mains such as DCT, which also relies on the energy compaction

and sparse representation properties to achieve good compres-

sion.

There are several interesting directions worth pursuing. The

current compression selects the threshold (i.e. zero-zone size)

and the quantization bin size in a two-stage process. In

typical image coders, however, the zero-zone is chosen to be

either the same size or twice the size as other bins. Thus it would

be interesting to jointly select these two values and analyze their

dependencies on each other. Furthermore, a more sophisticated

coder is likely to produce better compressed images than the

current scheme, which uses the first order entropy to code the

bin indices. With an improved coder, an increase in the number


14/15


TABLE IVTHE THRESHOLD VALUES OF OracleShrink, SureShrink, AND BayesShrink,

RESPECTIVELY, (AVERAGED OVER FIVE RUNS) FOR THE DIFFERENTSUBBANDS OF LENA, WITH NOISE STRENGTH = 1 0

TABLE VTHE VALUE OF m (AVERAGED OVER FIVE RUNS) FOR THE DIFFERENT

SUBBANDS OF LENA, WITH NOISE STRENGTH = 1 0

of quantization bins would not increase the bitrate penalty by

much, and thus the coefficients would be quantized at a finer

resolution than the current method. Lastly, the model family

could be expanded. For example, one could use a collection of

wavelet bases for the wavelet decomposition, rather than using

just one chosen wavelet, to allow possibly better representations

of the signals.

In our other work [7], it was demonstrated that spatially adap-

tive thresholds greatly improves the denoising performance over

uniform thresholds. That is, thethreshold valuechanges for each

coefficient. The threshold selection uses the context-modeling

idea prevelant in coding methods, thus it would be interesting

to extend this spatially adaptive threshold to the compression

framework, without incuring too much overhead. This would

likely improve the denoising performance.

REFERENCES

[1] F. Abramovich, T. Sapatinas, and B. W. Silverman, Wavelet thresh-olding via a Bayesian approach, J. R. Statist. Soc., ser. B, vol. 60, pp.725749, 1998.

[2] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, Image codingusing wavelet transform, IEEE Trans. Image Processing, vol. 1, no. 2,pp. 205220, 1992.

[3] J. Buckheit,S. Chen, D.Donoho, I. Johnstone, andJ. Scargle,WaveLabToolkit,, http://www-stat.stanford.edu:80/~wavelab/.

[4] A. Chambolle, R. A. DeVore, N. Lee, and B. J. Lucier, Nonlinearwavelet image processing: Variational problems, compression, andnoise removal through wavelet shrinkage, IEEE Trans. Image Pro-cessing, vol. 7, pp. 319335, 1998.

[5] S. G. Chang, B. Yu, and M. Vetterli, Bridging compression to waveletthresholding as a denoising method, in Proc. Conf. Information Sci-ences Systems, Baltimore, MD, Mar. 1997, pp. 568573.

[6] , Image denoising via lossy compression and wavelet thresh-olding, in Proc. IEEE Int. Conf. Image Processing, vol. 1, Santa

Barbara, CA, Nov. 1997, pp. 604607.[7] , Spatially adaptive wavelet thresholding with context modeling

for image denoising, IEEE Trans. Image Processing, vol. 9, pp.15221531, Sept. 2000.

[8] H. Chipman, E. Kolaczyk, and R. McCulloch, Adaptive bayesianwavelet shrinkage, J. Amer. Statist. Assoc., vol. 92, no. 440, pp.14131421, 1997.

[9] M. Clyde, G. Parmigiani, and B. Vidakovic, Multiple shrinkage andsubset selection in wavelets, Biometrika, vol. 85, pp. 391402, 1998.

[10] M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, Wavelet-based sta-tistical signal processing using hidden Markov models, IEEE Trans.Signal Processing, vol. 46, pp. 886902, Apr. 1998.

[11] I. Daubechies, Ten Lectures on Wavelets, Vol. 61 of Proc. CBMS-NSF Re-gional Conference Series in Applied Mathematics. Philadelphia, PA:SIAM, 1992.

[12] R. A. DeVoreand B. J. Lucier, Fast wavelet techniques fornear-optimalimage processing, in IEEE Military Communications Conf. Rec. San

Diego, Oct. 1114, 1992, vol. 3, pp. 11291135.[13] D. L. Donoho, De-noising by soft-thresholding, IEEE Trans. Inform.

Theory, vol. 41, pp. 613627, May 1995.[14] D. L. Donoho and I. M. Johnstone, Ideal spatial adaptation via wavelet

shrinkage, Biometrika, vol. 81, pp. 425455, 1994.[15] , Adapting to unknown smoothness via wavelet shrinkage,

Journal of the American Statistical Assoc., vol. 90, no. 432, pp.12001224, December 1995.

[16] , Wavelet shrinkage: Asymptopia?, J. R. Stat. Soc. B, ser. B, vol.57, no. 2, pp. 301369, 1995.

[17] M. Hansen and B. Yu, Wavelet thresholding via MDL: Simultaneousdenoising and compression, , 1999, submitted for publication.

[18] M. Jansen, M. Malfait, and A. Bultheel, Generalized cross validationfor wavelet thresholding, Signal Process., vol. 56,pp. 3344,Jan.1997.

[19] I. M. Johnstone and B. W. Silverman, Wavelet threshold estimators fordata with correlated noise, J. R. Statist. Soc., vol. 59, 1997.

[20] R. L. Joshi, V. J. Crump, and T. R. Fisher, Image subband coding using

arithmetic and trellis coded quantization, IEEE Trans. Circuits Syst.Video Technol., vol. 5, pp. 515523, Dec. 1995.

[21] J. Liu and P. Moulin, Complexity-regularized image denoising, Proc.IEEE Int. Conf. Image Processing, vol. 2, pp. 370373, Oct. 1997.

[22] S. M. LoPresto, K. Ramchandran, and M. T. Orchard, Image codingbased on mixture modeling of wavelet coefficients and a fast estimation-quantization framework, in Proc. Data Compression Conf., Snowbird,UT, Mar. 1997, pp. 221230.

[23] S. Mallat, A theory for multiresolution signal decomposition: Thewavelet representation, IEEE Trans. Pattern Anal. Machine Intell.,vol. 11, pp. 674693, July 1989.

[24] G. Nason, Choice of the threshold parameter in wavelet function es-timation, in Wavelets in Statistics, A. Antoniades and G. Oppenheim,Eds. Berlin, Germany: Springer-Verlag, 1995.

[25] B. K. Natarajan, Filtering random noise from deterministic signalsvia data compression, IEEE Trans. Signal Processing, vol. 43, pp.25952605, Nov. 1995.
http://-/?-http://-/?-


15/15


[26] J. Rissanen, Stochastic Complexity in Statistical Inquiry. Singapore:World Scientific, 1989.

[27] F. Ruggeri and B. Vidakovic, A Bayesian decision theoretic approachto wavelet thresholding, Statist. Sinica, vol. 9,no. 1, pp. 183197,1999.

[28] N. Saito,Simultaneous noisesuppression and signal compression usinga library of orthonormal bases and the minimum description length cri-terion, in Wavelets in Geophysics, E. Foufoula-Georgiou and P. Kumar,Eds. New York: Academic, 1994, pp. 299324.

[29] E. Simoncelli and E. Adelson, Noise removal via Bayesian wavelet

coring, Proc. IEEE Int. Conf. Image Processing, vol. 1, pp. 379382,Sept. 1996.[30] C. M. Stein, Estimation of the mean of a multivariate normal distribu-

tion, Ann. Statist., vol. 9, no. 6, pp. 11351151, 1981.[31] M. Vetterl i and J. Kovacevic, Wavelets and Subband

Coding. Englewood Cliffs, NJ: Prentice-Hall, 1995.[32] B. Vidakovic, Nonlinear wavelet shrinkage with Bayes rules and Bayes

factors, J. Amer. Statist. Assoc., vol. 93, no. 441, pp. 173179, 1998.[33] Y. Wang, Function estimation via wavelet shrinkage for long-memory

data, Ann. Statist., vol. 24, pp. 466484, 1996.[34] P. H. Westerink, J. Biemond, and D. E. Boekee, An optimal bit alloca-

tion algorithm for sub-band coding, in Proc. IEEE Int. Conf. Acoustics,Speech, Signal Processing, Dallas, TX, Apr. 1987, pp. 13781381.

[35] N. Weyrich and G. T. Warhola, De-noising using wavelets and cross-validation, Dept.of Mathematicsand Statistics, Air Force Inst.of Tech.,AFIT/ENC, OH, Tech. Rep. AFIT/EN/TR/94-01, 1994.

[36] Y. Yoo, A. Ortega, and B. Yu, Image subband coding using context-

based classification and adaptive quantization,IEEE Trans. Image Pro-cessing, vol. 8, pp. 17021715, Dec. 1999.

S. Grace Chang (S95) received the B.S. degreefrom the Massachusetts Institute of Technology,Cambridge, in 1993 and the M.S. and Ph.D. degreesfrom the University of California, Berkeley, in 1995and 1998, respectively, all in electrical engineering.

She was with Hewlett-Packard Laboratories, PaloAlto, CA, and is now with Hewlett-Packard Co.,Grenoble, France. Her research interests includeimage enhancement and compression, Internet appli-cations and content delivery, and telecommunicationsystems.

Dr. Chang was a recipient of the National Science Foundation Graduate Fel-lowship and a University of California Dissertation Fellowship.

Bin Yu (A92SM97) received the B.S. degree inmathematics fromPeking University, China,in 1984,and the M.S. and Ph.D. degrees in statistics from theUniversity of California, Berkeley, in 1987 and 1990,respectively.

She is an Associate Professor of statistics withthe University of California, Berkeley. Her researchinterests include statistical inference, informationtheory, signal compression and denoising, bioin-formatics, and remote sensing. She has publishedover 30 technical papers in journals such as IEEE

TRANSACTIONS ON INFORMATION THEORY, IEEE TRANSACTIONS ON IMAGEPROCESSING, IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING,The Annals of Statistics, Annals of Probability, Journal of American Statistical

Association, and Genomics. She has held faculty positions at the Universityof Wisconsin, Madison, and Yale University, New Haven, CT, and was aPostdoctoral Fellow with the Mathematical Science Research Institute (MSRI),University of California, Berkeley.

Dr. Yu wasin the S. S. Chern Mathematics Exchange Program between Chinaand the U.S. in 1985. She is a Fellow of the Institute of Mathematical Statistics(IMS), and a member of The American Statistical Association (ASA). She isserving on the Board of Governors of the IEEE Information Theory Society,and as an Associate Editor for The Annals of Statistics and for Statistica Sinica.

Martin Vetterli (S86M86SM90F95) re-ceived the Dipl. El.-Ing. degree from ETH Zrich(ETHZ), Zrich, Switzerland, in 1981, the M.S.degree from Stanford University, Stanford, CA, in1982, and the Dr.Sci. degree from EPF Lausanne(EPFL), Lausanne, Switzerland, in 1986.

He was a Research Assistant at Stanford Univer-sity and EPFL, and was with Siemens and AT&TBell Laboratories. In 1986, he joined Columbia

University, New York, where he was an AssociateProfessor of electrical engineering and Co-Directorof the Image and Advanced Television Laboratory. In 1993, he joined theUniversity of California, Berkeley, where he was a Professor in the Departmentof Electrical Engineering and Computer Sciences until 1997, and holds nowAdjunct Professor position. Since 1995, he has been a Professor of communica-tion systems at EPFL, where he chaired the Communications Systems Division(19961997), and heads the Audio-Visual Communications Laboratory. Heheld visiting positions at ETHZ in 1990 and Stanford University in 1998.He is on the editorial boards of Annals of Telecommunications, Applied andComputational Harmonic Analysis, and the Journal of Fourier Analysis and

Applications. He is the co-author, with J. Kovacevic, of the book Waveletsand Subband Coding (Englewood Cliffs, NJ: Prentice-Hall, 1995). He haspublished about 75 journal papers on a variety of topics in signal and imageprocessing and holds five patents. His research interests include wavelets,multirate signal processing, computational complexity, signal processing fortelecommunications, digital video processing, and compression and wireless

video communications.Dr. Vetterli is a member of SIAM and was the Area Editor for Speech,Image, Video, and Signal Processing for the IEEE TRANSACTIONS ONCOMMUNICATIONS. He received the Best Paper Award of EURASIP in 1984for his paper on multidimensional subband coding, the Research Prize of theBrown Bovery Corporation, Switzerland, in 1986 for his doctoral thesis, theIEEE Signal Processing Societys Senior Award in 1991 and 1996 (for paperswith D. LeGall and K. Ramchandran, respectively). He was a IEEE SignalProcessing Distinguished Lecturer in 1999. He received the Swiss NationalLatsis Prize in 1996 and the SPIE Presidential award in 1999. He has been aPlenary Speaker at various conferences including the 1992 IEEE InternationalConference on Acoustics, Speech, and Signal Processing.

ChangYV00a Wavelet Imagenes Denoising

Documents

Transcript of ChangYV00a Wavelet Imagenes Denoising