ChangYV00a Wavelet Imagenes Denoising
Transcript of ChangYV00a Wavelet Imagenes Denoising
-
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
1/15
1532 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000
Adaptive Wavelet Thresholding for Image Denoisingand Compression
S. Grace Chang, Student Member, IEEE, Bin Yu, Senior Member, IEEE, and Martin Vetterli, Fellow, IEEE
AbstractThe first part of this paper proposes an adaptive,data-driven threshold for image denoising via wavelet soft-thresh-olding. The threshold is derived in a Bayesian framework, and theprior used on the wavelet coefficients is the generalized Gaussiandistribution (GGD) widely used in image processing applications.The proposed threshold is simple and closed-form, and it is adap-tive to each subband because it depends on data-driven estimatesof the parameters. Experimental results show that the proposedmethod, called BayesShrink, is typically within 5% of the MSEof the best soft-thresholding benchmark with the image assumedknown. It also outperforms Donoho and Johnstones SureShrinkmost of the time.
The second part of the paper attempts to further validaterecent claims that lossy compression can be used for denoising.The BayesShrink threshold can aid in the parameter selectionof a coder designed with the intention of denoising, and thusachieving simultaneous denoising and compression. Specifically,the zero-zone in the quantization step of compression is analogousto the threshold value in the thresholding function. The remainingcoder design parameters are chosen based on a criterion derivedfrom Rissanens minimum description length (MDL) principle.Experiments show that this compression method does indeed re-move noise significantly, especially for large noise power. However,it introduces quantization noise and should be used only if bitratewere an additional concern to denoising.
Index TermsAdaptive method, image compression, image de-noising, image restoration, wavelet thresholding.
I. INTRODUCTION
AN IMAGE is often corrupted by noise in its acquisition or
transmission. The goal of denoising is to remove the noise
while retaining as much as possible the important signal fea-
tures. Traditionally, this is achieved by linear processing such
as Wiener filtering. A vast literature has emerged recently on
Manuscript received January 22, 1998; revised April 7, 2000. This workwas supported in part by the NSF Graduate Fellowship and the Univer-sity of California Dissertation Fellowship to S. G. Chang; ARO GrantDAAH04-94-G-0232 and NSF Grant DMS-9322817 to B. Yu; and NSF Grant
MIP-93-213002 and Swiss NSF Grant 20-52347.97 to M. Vetterli. Part of thiswork was presented at the IEEE International Conference on Image Processing,Santa Barbara, CA, October 1997. The associate editor coordinating thereview of this manuscript and approving it for publication was Prof. Patrick L.Combettes.
S.G. Changwas with theDepartmentof ElectricalEngineering andComputerSciences, University of California, Berkeley, CA 94720 USA. She is now withHewlett-Packard Company, Grenoble, France (e-mail: [email protected]).
B. Yu is with the Department of Statistics, University of California, Berkeley,CA 94720 USA (e-mail: [email protected])
M. Vetterli is with the Laboratory of Audiovisual Communications, SwissFederal Institute of Technology (EPFL), Lausanne, Switzerland and also withthe Department of Electrical Engineering and Computer Sciences, UniversityofCalifornia, Berkeley, CA 94720 USA.
Publisher Item Identifier S 1057-7149(00)06914-1.
signal denoising using nonlinear techniques, in the setting of
additive white Gaussian noise. The seminal work on signal de-
noising via wavelet thresholding or shrinkage of Donoho and
Johnstone ([13][16]) have shown that various wavelet thresh-
olding schemes for denoising have near-optimal properties in
the minimax sense and perform well in simulation studies of
one-dimensional curve estimation. It has been shown to have
better rates of convergence than linear methods for approxi-
mating functions in Besov spaces ([13], [14]). Thresholding is
a nonlinear technique, yet it is very simple because it operates
on one wavelet coefficient at a time. Alternative approaches to
nonlinearwavelet-based denoising can be found in, for example,[1], [4], [8][10], [12], [18], [19], [24], [27][29], [32], [33],
[35], and references therein.
On a seemingly unrelated front, lossy compression has been
proposed for denoising in several works [6], [5], [21], [25],
[28]. Concerns regarding the compression rate were explicitly
addressed. This is important because any practical coder must
assume a limited resource (such as bits) at its disposal for repre-
senting the data. Other works [4], [12][16] also addressed the
connection between compression and denoising, especially with
nonlinear algorithms such as wavelet thresholding in a mathe-
matical framework. However, these latter works were not con-
cerned with quantization and bitrates: compression results from
a reduced number of nonzero wavelet coefficients, and not froman explicit design of a coder.
The intuition behind using lossy compression for denoising
may be explained as follows. A signal typically has structural
correlations that a good coder can exploit to yield a concise rep-
resentation. White noise, however, does not have structural re-
dundancies and thus is not easily compressable. Hence, a good
compression method can provide a suitable model for distin-
guishing between signal and noise. The discussion will be re-
stricted to wavelet-based coders, though these insights can be
extended to other transform-domain coders as well. A concrete
connection between lossy compression and denoising can easily
be seen when one examines the similarity between thresholding
and quantization, the latter of which is a necessary step in a prac-tical lossy coder. That is, the quantization of wavelet coefficients
with a zero-zone is an approximation to the thresholding func-
tion (see Fig. 1). Thus, provided that the quantization outside
of the zero-zone does not introduce significant distortion, it fol-
lows that wavelet-based lossy compression achieves denoising.
With this connection in mind, this paper is about wavelet thresh-
olding for image denoising and also for lossy compression. The
threshold choice aids the lossy coder to choose its zero-zone,
and the resulting coder achieves simultaneous denoising and
compression if such property is desired.
10577149/00$10.00 2000 IEEE
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
2/15
CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1533
Fig. 1. Thresholding function can be approximated by quantization with azero-zone.
The theoretical formalizationof filtering additive iidGaussiannoise (of zero-mean and standard deviation ) via thresholding
wavelet coefficients was pioneered by Donoho and Johnstone
[14]. A wavelet coefficient is compared to a given threshold and
is setto zero if itsmagnitude is less than thethreshold;otherwise,
it is kept or modified (depending on the thresholding rule). The
threshold acts as an oracle which distinguishes between the
insignificant coefficients likely due to noise, and the significant
coefficients consisting of important signal structures. Thresh-
olding rules are especially effective for signals with sparse or
near-sparserepresentationswhereonlyasmallsubsetofthecoef-
ficients represents all or most of the signal energy. Thresholding
essentially creates a region around zero where the coefficients
are considered negligible. Outside of this region, the thresholded
coefficients are kept to full precision (that is, without quanti-
zation). Their most well-known thresholding methods include
VisuShrink [14] and SureShrink [15]. These threshold choices
enjoy asymptotic minimax optimalities over function spaces
such as Besov spaces. For image denoising, however, VisuShrink
is known to yield overly smoothed images. This is because its
threshold choice, (called the universal thresholdand
is the noise variance), can be unwarrantedly large due to its
dependence onthenumber ofsamples, , whichis morethanfor a typical test image of size . SureShrinkusesahybrid
of the universal threshold and the SURE threshold, derived from
minimizing Steins unbiased risk estimator [30], and has been
showntoperformwell.SureShrinkwillbethemaincomparisonto
the method proposed here, and, as will be seen later in this paper,
ourproposedthresholdoftenyieldsbetterresult.
Since the works of Donoho and Johnstone, there has been
much research on finding thresholds for nonparametric estima-
tion in statistics. However, few are specifically tailored for im-
ages. In this paper, we propose a framework and a near-op-
timal threshold in this framework more suitable for image de-
noising. This approach can be formally described as Bayesian,
but this only describes our mathematical formulation, not ourphilosophy. The formulation is grounded on the empirical ob-
servation that the wavelet coefficients in a subband of a natural
image canbe summarized adequately by a generalized Gaussian
distribution (GGD). This observation is well-accepted in the
image processing community (for example, see [20], [22], [23],
[29], [34], [36]) and is used for state-of-the-art image coders
in [20], [22], [36]. It follows from this observation that the av-
erage MSE (in a subband) can be approximated by the corre-
sponding Bayesian squared error risk with the GGD as the prior
applied to each in an iidfashion. That is, a sum is approximated
by an integral. We emphasize that this is an analytical approx-
imation and our framework is broader than assuming wavelet
coefficients are iid draws from a GGD. The goal is to find the
soft-threshold that minimizes this Bayesian risk, and we call our
method BayesShrink.
The proposed Bayesian risk minimization is subband-depen-
dent. Given the signal being generalized Gaussian distributed
and the noise being Gaussian, via numerical calculation a nearly
optimal threshold for soft-thresholding is found to be
(where is the noise variance and the signal vari-ance). This threshold gives a risk within 5% of the minimal risk
over a broad range of parameters in the GGD family. To make
this threshold data-driven, the parameters and are esti-
mated from the observed data, one set for each subband.
To achieve simultaneous denoising and compression, the
nonzero thresholded wavelet coefficients need to be quantized.
Uniform quantizer and centroid reconstruction is used on the
GGD. The design parameters of the coder, such as the number of
quantizationlevelsand binwidths,aredecidedbasedon a criterion
derived from Rissanens minimum description length (MDL)
principle [26]. This criterion balances the tradeoff between the
compression rate and distortion, and yields a nice interpretation
ofoperatingatafixedslopeontherate-distortioncurve.The paper is organized as follows. In Section II, the wavelet
thresholding idea is introduced. Section II-A explains the
derivation of the BayesShrink threshold by minimizing a
Bayesian risk with squared error. The lossy compression based
on the MDL criterion is explained in Section III. Experimental
results on several test images are shown in Section IV and
compared with SureShrink. To benchmark against the best
possible performance of a threshold estimate, the compar-
isons also include OracleShrink, the best soft-thresholding
estimate obtainable assuming the original image known, and
OracleThresh, the best hard-thresholding counterpart. The
BayesShrink method often comes to within 5% of the MSEs
of OracleShrink, and is better than SureShrink up to 8% most
of the time, or is within 1% if it is worse. Furthermore, the
BayesShrink threshold is very easy to compute. BayesShrink
with the additional MDL-based compression, as expected,
introduces quantization noise to the image. This distortion may
negate the denoising achieved by thresholding, especially when
is small. However, for larger values of , the MSE due to the
lossy compression is still significantly lower than that of the
noisy image, while fewer bits are used to code the image, thus
achieving both denoising and compression.
II. WAVELET THRESHOLDING AND THRESHOLD SELECTION
Let the signal be , where is someinteger power of 2. It has been corrupted by additive noise and
one observes
(1)
where are independent and identically distributed ( )
as normal and independent of . The goal is to
remove the noise, or denoise , and to obtain an estimate
of which minimizes the mean squared error (MSE),
MSE (2)
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
3/15
1534 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000
Fig. 2. Subbands of the 2-D orthogonal wavelet transform.
Let , , and ; that is,
the boldfaced letters will denote the matrix representation of the
signals under consideration. Let denote the matrix
of wavelet coefficients of , where is the two-dimensional
dyadic orthogonal wavelet transform operator, and similarly
and . The readersare referred toreferences suchas[23], [31] for details of the two-dimensional orthogonal wavelet
transform. It is convenient to label the subbands of the transform
as in Fig. 2. The subbands , , ,
are called the details, where is the scale, with being the
largest (or coarsest) scale in the decomposition, and a subband
at scale has size . The subband is the low
resolution residual, and is typically chosen large enough such
that and . Note that since the transform
is orthogonal, are also iid .
The wavelet-thresholding denoising method filters each co-
efficient from the detail subbands with a threshold function
(to be explained shortly) to obtain . The denoised estimate is
then , where is the inverse wavelet transform
operator.
There are two thresholding methods frequently used. The
soft-thresholdfunction (also called the shrinkage function)
sgn (3)
takes the argument and shrinks it toward zero by the threshold
. The other popular alternative is the hard-threshold function
(4)
which keeps the input if it is larger than the threshold ; oth-
erwise, it is set to zero. The wavelet thresholding procedure re-
moves noise by thresholding only the wavelet coefficients of thedetail subbands, while keeping the low resolution coefficients
unaltered.
The soft-thresholding rule is chosen over hard-thresholding
for several reasons. First, soft-thresholding has been shown to
achieve near-optimal minimax rate over a large range of Besov
spaces [12], [14]. Second, for the generalized Gaussian prior
assumed in this work, the optimal soft-thresholding estimator
yields a smaller risk than the optimal hard-thresholding esti-
mator (to be shown later in this section). Lastly, in practice, the
soft-thresholding method yields more visually pleasant images
over hard-thresholding because the latter is discontinuous and
yields abrupt artifacts in the recovered images, especially when
the noise energy is significant. In what follows, soft-thresh-
olding will be the primary focus.
While the idea of thresholding is simple and effective,
finding a good threshold is not an easy task. For one-dimen-
sional (1-D) deterministic signal of length , Donoho and
Johnstone [14] proposed for VisuShrinkthe universal threshold,
, which results in an estimate asymptotically
optimal in the minimax sense (minimizing the maximumerror over all possible -sample signals). One other notable
threshold is the SURE threshold [15], derived from minimizing
Steins unbiased risk estimate [30] when soft-thresholding
is used. The SureShrink method is a hybrid of the universal
and the SURE threshold, with the choice being dependent
on the energy of the particular subband [15]. The SURE
threshold is data-driven, does not depend on explicitly,
and SureShrink estimates it in a subband-adaptive manner.
Moreover, SureShrink has yielded good image denoising
performance and comes close to the true minimum MSE of the
optimal soft-threshold estimator (cf. [4], [12]), and thus will be
the main comparison to our proposed method.
In the statistical Bayesian literature, many works have con-centrated on deriving the best threshold (or shrinkage factor)based on priors such as the Laplacian and a mixture of Gaus-
sians (cf. [1], [8], [9], [18], [24], [27], [29], [32], [35]). With an
integral approximation to the pixel-wise MSE distortion mea-
sure as discussed earlier, the formulation here is also Bayesian
for finding the best soft-thresholding rule under the general-
ized Gaussian prior. A related work is [27] where the hard-
thresholding rule is investigated for signals with Laplacian and
Gaussian distributions.
The GGD has been used in many subband or wavelet-based
image processing applications [2], [20], [22], [23], [29], [34],
[36]. In [29], it was observed that a GGD with the shape param-
eter ranging from 0.5 to 1 [see (1)] can adequately describe the
wavelet coefficients of a large set of natural images. Our expe-
rience with images supports the same conclusion. Fig. 3 shows
the histogram of the wavelet coefficients of the images shown in
Fig. 9, against the generalized Gaussian curve, with the param-
eters labeled (the estimation of the parameters will be explained
later in the text.) A heuristic can be set forward to explain why
there are a large number of small coefficients but relativelyfew large coefficients as the GGD suggests: the small ones
correspond to smooth regions in a natural image and the large
ones to edges or textures.
A. Adaptive Threshold for BayesShrinkThe GGD, following [20], is
(5)
, , , where
and
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
4/15
CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1535
(a) (b)
(c) (d)
Fig. 3. Histogram of the wavelet coefficients of four test images. For each image, from top to bottom it is fine to coarse scales: from left to right, they are theH H , H L , and L H subbands, respectively.
and is the gamma function. The pa-
rameter is the standard deviation and is the shape pa-
rameter. For a given set of parameters, the objective is to find
a soft-threshold which minimizes the Bayes risk,
(6)
where , and .
Denote the optimal threshold by ,
(7)
which is a function of the parameters and . To our knowl-
edge, there is no closed form solution for for this chosen
prior, thus numerical calculation is used to find its value. 1
Before examining the general case, it is insightful to consider
two special cases of the GGD: the Gaussian ( ) and the
1It was observed that for the numerical calculation, it is more robust to obtainthe value of T from locating the zero-crossing of the derivative, r ( T ) , thanfromminimizing r ( T ) directly. Inthe Laplaciancase,a recentwork[17] derivesan analytical equation for T to satisfy and calculates T by solving such anequation.
Laplacian ( ) distributions. The Laplacian case is particu-
larly interesting, because it is analytically more tractable and is
often used in image processing applications.
Case 1: (Gaussian) with . It is
straightforward to verify that
(8)
(9)
where
(10)
with the standard normal density function
and the survival function of the
standard normal .
http://-/?-http://-/?- -
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
5/15
http://-/?-http://-/?- -
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
6/15
CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1537
(a)
(b)
Fig. 5. Thresholding for the Laplacian prior, with = 1 . (a) Compare theoptimal soft-threshold T ( ; 1 ) (), the BayesShrink threshold T ( )( 1 1 1 ), the optimal hard-threshold T ( ; 1 ) (-- -),and thethreshold T h (- 1 - 1 )against the standard deviation on the horizontal axis. (b) Their correspondingrisks.
With these insights from the special cases, the discussion now
returns to the general case of GGD.
Case 3: (Generalized Gaussian) The proposed thresholdin (12) has been found to work well for the general
case. Let . In Fig. 6(a), each dotted line ( ) is the op-
timal threshold for a given fixed , plotted against
on the horizontal axis. The values are
shown. The threshold, , is plotted with the
solid line (). The curve of the optimal threshold that lies
closest to is for , the Laplacian case,
while other curves deviate from as moves away from 1.
Fig. 6(b) shows the corresponding risks. The deviation between
the optimal risk and grows as moves away from
1, but the error is still within 5% of the optimal for the
curves shown in Fig. 6(b). Because the threshold depends
(a)
(b)
Fig. 6. Thresholding for the generalized Gaussian prior, with = 1 . (a)Compare the approximation T ( ) = = () with the optimalthreshold T ( ; ) for = 0 : 6 ; 1 ; 2 ; 3 ; 4 ( 1 1 1 ). The horizontal axisis the standard deviation, . (b) The optimal risks are in ( 1 1 1 ), and theapproximation in ().
only on the standard deviation and not on the shape parameter ,
it may not yield a good approximation for values of other than
the range tested here, and the threshold may need to be modifiedto incorporate . However, since for the wavelet coefficients
typical values of falls in the range [0.5, 1], this simple form of
the threshold is appropriate for our purpose. For a fixed set of
parameters, the curve of the risk (as a function of the threshold
) is very flat near the optimal threshold , implying that the
error is not sensitive to a slight perturbation near .
B. Parameter Estimation for Data-Driven Adaptive Threshold
This section focuses on the estimation of the GGD parame-
ters, and , which in turn yields a data-driven estimate of
that is adaptive to different subband characteristics.
-
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
7/15
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
8/15
http://-/?-http://-/?-http://-/?- -
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
9/15
1540 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000
Now we state the model selection criterion, MDLQ:
with (31)
To find the best model, (32) is minimized over and to
find the corresponding set of quantized coefficients. In the im-
plementation, which is by no means optimized, this is done by
searching over , and for each , minimizing (32)
over a reasonable range of [such as
]. Typically, once a minimum (in )
has been reached, the MDLQ cost increase monotonically, thus
the search can terminate soon after a minimum has been de-
tected.
This MDLQ compression with BayesShrinkzero-zone selec-
tion is applied to each subband independently. The steps dis-cussed in this section are summarized as follows.
Estimate the noise variance , and the GGD standard
deviation .
Calculate the threshold , and soft-threshold the wavelet
coefficients.
To quantize the nonzero coefficients, minimize (32) over
and to find the corresponding quantized coefficients
, which is the compressed, denoised estimate of .
The coarsest subband is quantized differently in that it is
not thresholded, and the quantization with (32) assumes the uni-
form distribution. The coefficients are essentially local av-
erages of the image, and are not characterized by distributionswith a peak at zero, thus the uniform distribution is used for
generality. With the mean subtracted, the uniform distribution
is assumed to be symmetric about zero. Every quantization bin
(including the zero-zone) is of width , and the reconstruction
values are the midpoints of the intervals.
The MDLQ criterion in (32) has the additional interpretation
of operating at a specified point on the rate-distortion (R-D)
curve, as also pointed out by Liu and Moulin [21]. For a given
coder, one can obtain a set of operational rate-distortion points
. When there is a rate or distortion constraint, the con-
straint problem can be formulated into a minimization problem
with a Lagrange multiplier, . In this case, (32) can be
interpreted as operating at . Natarajan [25]and Liu and Moulin [21] both proposed to use compression
for denoising. The former operates at a constrained distortion,
, and the latter operates at on the
R-D curve. Both works recommend the use of any reasonable
coder while our coder is designed specifically with the purpose
of denoising.
IV. EXPERIMENTAL RESULTS AND DISCUSSION
The grayscale images goldhill, lena, bar-
bara and baboon are used as test images with different noise
levels . The original images are shown in
Fig. 9. Original images. From top left, clockwise: goldhill, lena, barbara andbaboon.
Fig. 9. The wavelet transform employs Daubechies least asym-
metric compactly-supported wavelet with eight vanishing mo-
ments [11] with four scales of orthogonal decomposition.
To assess the performance of BayesShrink, it is compared
with SureShrink [15]. The 1-D implementation of SureShrink
can be obtained from the WaveLab toolkit [3], and the 2-D
extension is straightforward. To gauge the best possible per-
formance of a soft-threshold estimator, these methods are also
benchmarked against what we call OracleShrink, which is thetruly optimal soft-thresholding estimator assuming the original
image is known. The threshold ofOracleShrinkin each subband
is
(32)
with known. To further justify the choice of soft-thresh-
olding over hard-thresholding, another benchmark, Ora-
cleThresh, is also computed. OracleThresh is the best possible
performance of a hard-threshold estimator, with subband-adap-
tive thresholds, each of which is defined as
(33)
with known. The MSEs from the various methods are com-
pared in Table I, and the data are collected from an average
of five runs. The columns refer to, respectively, OracleShrink,
SureShrink,BayesShrink,BayesShrinkwith MDLQ-based com-
pression, OracleThresh, Wiener filtering, and the bitrate (in bpp,
or bits-per-pixel) of the MDLQ-compressed image. Since the
main benchmark is against SureShrink, the better one of the
SureShrinkand BayesShrinkis highlighted in bold font for each
test set. The MSEs resulting from BayesShrinkcomes to within
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
10/15
CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1541
TABLE IFOR VARIOUS TEST IMAGES AND VALUES, LISTS MSE OF (1) OracleShrink, (2) SureShrink, (3) BayesShrink, (4) BayesShrinkWITH MDLQ-COMPRESSION, (5) OracleThresh, AND (6) WIENER FILTERING THE LAST COLUMN SHOWS THE BITRATE (BITS PER
PIXEL) OF THE COMPRESSED IMAGE OF (4). AVERAGED OVER FIVE RUNS
5% ofOracleShrinkfor the smoother images goldhill and lena,
and are most of the time within 6% for highly detailed images
such as barbara and baboon (though it may suffer up to 20%
for small ). BayesShrinkoutperforms SureShrink most of the
time, up to approximately 8%. We observed in the experiments
that using solely the SURE threshold yields excellent perfor-
mance (sometimes yielding even lower MSE than BayesShrink
by up to 12%). However, the hybrid method ofSureShrink re-
sults at times in the choice of the universal threshold which can
be too large. As illustrated in Table I, all three soft-thresholding
methods outperforms significantly the best hard-thresholding
rule, OracleThresh.
It is not surprising that the SURE threshold and the
BayesShrink threshold yield similar performances. The SURE
threshold can also be viewed as an approximate optimal
soft-threshold in terms of MSE. For a particular subband of
size , following [15],
Sure
(34)
where denotes min , and the SURE threshold is de-
fined to be the value of minimizing Sure .
Recall
and s are iid . Conditioning on , by Steins
result,
Sure (35)
Moreover, as we have done before, if the distribution of s is
approximated by a GGD, then the distribution of s is approx-
imated by the mixture distribution of GGD and ; or
while follows a GGD and is independent of .By the Law of Large Numbers,
Sure (36)
Taking expectation with respect to the GGD on both sides of
(35), the risk can be written as
Sure
(37)
http://-/?-http://-/?- -
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
11/15
1542 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000
Fig. 10. Comparing the performance of the various methods on goldhill with = 2 0 . (a) Original. (b) Noisy image, = 2 0 . (c) OracleShrink. (d) SureShrink.(e) BayesShrink. (f) BayesShrinkfollowed by MDLQ compression.
Comparing (37) with (36), one can conclude that
Sure is a data-based approximation to ,
and the SURE threshold, which minimizes Sure , is an
alternative to BayesShrink for minimizing the Bayesian risk.
We have also made comparisons with the Wiener filter, the
best linear filtering possible. The version used is the adaptive
filter, wiener2, in the MATLAB image processing toolbox,
using the default settings ( local window size, and the
unknown noise power is estimated). The MSE results are
shown in Table I, and they are considerably worse than the
nonlinear thresholding methods, especially when is large. The
image quality is also not as good as those of the thresholding
methods.
The MDLQ-based compression step introduces quantization
noise which is quite visible. As shown in the last column of
Table I, the coder achieves a lower bitrate, but at the expense
-
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
12/15
CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1543
of increasing the MSE. The MSE can be even worse than the
noisy observation for small values of , especially for the highly
detailed images. This is because the quantization noise is sig-
nificant compared to the additive Gaussian noise. For larger ,
the compressed images can achieve noise reduction up to ap-
proximately 75% in terms of MSE. Furthermore, the bitrates
are significantly less than the original 8 bpp for grayscale im-
ages. Thus, compression does achieve denoising and the pro-posed MDLQ-based compression can be used if simultaneous
denoising and compression is a desired feature. If only the best
denoising performance were the goal, obviously using solely
BayesShrinkis preferred.
Note that the first-order entropy coding, , for
the bitrate of the quantized coefficients is a rather loose es-
timate. With more sophisticated coding methods (e.g. predic-
tive coding, pixel classification), the same bitrate could yield a
higher number of quantization level , thus resulting in a lower
MSE and enhancing the performance of the MDLQ-based com-
pression-denoise.
A fair assessment of the MDLQ scheme for quantization afterthresholding is the R-D curve used in Hansen and Yu [17] (see
http://cm.bell-labs.com/stat/binyu/publications.html). This R-D
curve is calculated using noiseless coefficients, and yields the
best possible in terms of R-D tradeoff when the quantization is
restricted to equal-binwidth. It thus gives an idea on how ef-
fective MDLQ is in choosing the tradeoff with respect to the
optimal. The closeness of the MDLQ point to this R-D lower-
bound curve indicates that MDLQ chooses a good R-D tradeoff
without the knowledge of the noiseless coefficients required in
deriving this R-D curve.
Fig. 10 shows the resulting images of each denoising method
for goldhill and (a zoomed-in section of the image
is displayed in order to show the details). Table II compares
the threshold values for each subband chosen by OracleShrink,
SureShrink and BayesShrink, averaged over five runs. It is clear
that the BayesShrink threshold selection is comparable to the
SURE threshold and to the true optimal threshold . Some
of the unexpectedly large threshold values in SureShrink comes
from the universal threshold, not the SURE threshold, and these
are placed in parentheses in the table. Table II(c) lists the thresh-
olds of BayesShrink, and the thresholds in parentheses corre-
spond to the case when , and all coefficients
have been set to zero. Table III tabulates the values of chosen
by for each subband of the goldhill image, , av-
eraged over five runs. TheMDLQ criterion allocates more levelsin the coarser, more important levels, as would be the case in a
practical subband coding situation. A value of indicates
that the coefficients have already been thresholded to zero, and
there is nothing to code.
The results for lena and are also shown. Fig. 11
shows the same sequences for a zoomed-in portion of
lena with noise . The corresponding results of
threshold selections and MDLQ parameters for lena with
noise are listed in Tables IV and V. Interested
readers can obtain a better view of the images at the website,
http://www-wavelet.eecs.berkeley.edu/~grchang/compressDe-
noise/.
TABLE IITHE THRESHOLD VALUES OF OracleShrink, SureShrink, AND BayesShrink,
RESPECTIVELY, (AVERAGED OVER FIVE RUNS) FOR THE DIFFERENTSUBBANDS OF GOLDHILL, WITH NOISE STRENGTH = 2 0
TABLE IIITHE VALUE OF m (AVERAGED OVER FIVE RUNS) FOR THE DIFFERENT
SUBBANDS OF GOLDHILL, WITH NOISE STRENGTH = 2 0
V. CONCLUSION
Two main issues regarding image denoising were addressed
in this paper. Firstly, an adaptive threshold for wavelet thresh-
olding images was proposed, based on the GGD modeling of
subband coefficients, and test results showed excellent perfor-
mance. Secondly, a coder was designed specifically for simul-
taneous compression and denoising. The proposed BayesShrink
threshold specifies the zero-zone of the quantization step of this
coder, and this zero-zone is the main agent in the coder which
removes the noise. Although the setting in this paper was in the
http://-/?-http://-/?- -
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
13/15
1544 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000
Fig. 11. Comparing the performance of the various methods on lena with = 1 0 . (a) Original. (b) Noisy image, = 1 0 . (c) OracleShrink. (d) SureShrink.(e) BayesShrink. (f) BayesShrinkfollowed by MDLQ compression.
wavelet domain, the idea can be extended to other transform do-
mains such as DCT, which also relies on the energy compaction
and sparse representation properties to achieve good compres-
sion.
There are several interesting directions worth pursuing. The
current compression selects the threshold (i.e. zero-zone size)
and the quantization bin size in a two-stage process. In
typical image coders, however, the zero-zone is chosen to be
either the same size or twice the size as other bins. Thus it would
be interesting to jointly select these two values and analyze their
dependencies on each other. Furthermore, a more sophisticated
coder is likely to produce better compressed images than the
current scheme, which uses the first order entropy to code the
bin indices. With an improved coder, an increase in the number
-
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
14/15
CHANG et al.: ADAPTIVE WAVELET THRESHOLDING FOR IMAGE DENOISING AND COMPRESSION 1545
TABLE IVTHE THRESHOLD VALUES OF OracleShrink, SureShrink, AND BayesShrink,
RESPECTIVELY, (AVERAGED OVER FIVE RUNS) FOR THE DIFFERENTSUBBANDS OF LENA, WITH NOISE STRENGTH = 1 0
TABLE VTHE VALUE OF m (AVERAGED OVER FIVE RUNS) FOR THE DIFFERENT
SUBBANDS OF LENA, WITH NOISE STRENGTH = 1 0
of quantization bins would not increase the bitrate penalty by
much, and thus the coefficients would be quantized at a finer
resolution than the current method. Lastly, the model family
could be expanded. For example, one could use a collection of
wavelet bases for the wavelet decomposition, rather than using
just one chosen wavelet, to allow possibly better representations
of the signals.
In our other work [7], it was demonstrated that spatially adap-
tive thresholds greatly improves the denoising performance over
uniform thresholds. That is, thethreshold valuechanges for each
coefficient. The threshold selection uses the context-modeling
idea prevelant in coding methods, thus it would be interesting
to extend this spatially adaptive threshold to the compression
framework, without incuring too much overhead. This would
likely improve the denoising performance.
REFERENCES
[1] F. Abramovich, T. Sapatinas, and B. W. Silverman, Wavelet thresh-olding via a Bayesian approach, J. R. Statist. Soc., ser. B, vol. 60, pp.725749, 1998.
[2] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, Image codingusing wavelet transform, IEEE Trans. Image Processing, vol. 1, no. 2,pp. 205220, 1992.
[3] J. Buckheit,S. Chen, D.Donoho, I. Johnstone, andJ. Scargle,WaveLabToolkit,, http://www-stat.stanford.edu:80/~wavelab/.
[4] A. Chambolle, R. A. DeVore, N. Lee, and B. J. Lucier, Nonlinearwavelet image processing: Variational problems, compression, andnoise removal through wavelet shrinkage, IEEE Trans. Image Pro-cessing, vol. 7, pp. 319335, 1998.
[5] S. G. Chang, B. Yu, and M. Vetterli, Bridging compression to waveletthresholding as a denoising method, in Proc. Conf. Information Sci-ences Systems, Baltimore, MD, Mar. 1997, pp. 568573.
[6] , Image denoising via lossy compression and wavelet thresh-olding, in Proc. IEEE Int. Conf. Image Processing, vol. 1, Santa
Barbara, CA, Nov. 1997, pp. 604607.[7] , Spatially adaptive wavelet thresholding with context modeling
for image denoising, IEEE Trans. Image Processing, vol. 9, pp.15221531, Sept. 2000.
[8] H. Chipman, E. Kolaczyk, and R. McCulloch, Adaptive bayesianwavelet shrinkage, J. Amer. Statist. Assoc., vol. 92, no. 440, pp.14131421, 1997.
[9] M. Clyde, G. Parmigiani, and B. Vidakovic, Multiple shrinkage andsubset selection in wavelets, Biometrika, vol. 85, pp. 391402, 1998.
[10] M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, Wavelet-based sta-tistical signal processing using hidden Markov models, IEEE Trans.Signal Processing, vol. 46, pp. 886902, Apr. 1998.
[11] I. Daubechies, Ten Lectures on Wavelets, Vol. 61 of Proc. CBMS-NSF Re-gional Conference Series in Applied Mathematics. Philadelphia, PA:SIAM, 1992.
[12] R. A. DeVoreand B. J. Lucier, Fast wavelet techniques fornear-optimalimage processing, in IEEE Military Communications Conf. Rec. San
Diego, Oct. 1114, 1992, vol. 3, pp. 11291135.[13] D. L. Donoho, De-noising by soft-thresholding, IEEE Trans. Inform.
Theory, vol. 41, pp. 613627, May 1995.[14] D. L. Donoho and I. M. Johnstone, Ideal spatial adaptation via wavelet
shrinkage, Biometrika, vol. 81, pp. 425455, 1994.[15] , Adapting to unknown smoothness via wavelet shrinkage,
Journal of the American Statistical Assoc., vol. 90, no. 432, pp.12001224, December 1995.
[16] , Wavelet shrinkage: Asymptopia?, J. R. Stat. Soc. B, ser. B, vol.57, no. 2, pp. 301369, 1995.
[17] M. Hansen and B. Yu, Wavelet thresholding via MDL: Simultaneousdenoising and compression, , 1999, submitted for publication.
[18] M. Jansen, M. Malfait, and A. Bultheel, Generalized cross validationfor wavelet thresholding, Signal Process., vol. 56,pp. 3344,Jan.1997.
[19] I. M. Johnstone and B. W. Silverman, Wavelet threshold estimators fordata with correlated noise, J. R. Statist. Soc., vol. 59, 1997.
[20] R. L. Joshi, V. J. Crump, and T. R. Fisher, Image subband coding using
arithmetic and trellis coded quantization, IEEE Trans. Circuits Syst.Video Technol., vol. 5, pp. 515523, Dec. 1995.
[21] J. Liu and P. Moulin, Complexity-regularized image denoising, Proc.IEEE Int. Conf. Image Processing, vol. 2, pp. 370373, Oct. 1997.
[22] S. M. LoPresto, K. Ramchandran, and M. T. Orchard, Image codingbased on mixture modeling of wavelet coefficients and a fast estimation-quantization framework, in Proc. Data Compression Conf., Snowbird,UT, Mar. 1997, pp. 221230.
[23] S. Mallat, A theory for multiresolution signal decomposition: Thewavelet representation, IEEE Trans. Pattern Anal. Machine Intell.,vol. 11, pp. 674693, July 1989.
[24] G. Nason, Choice of the threshold parameter in wavelet function es-timation, in Wavelets in Statistics, A. Antoniades and G. Oppenheim,Eds. Berlin, Germany: Springer-Verlag, 1995.
[25] B. K. Natarajan, Filtering random noise from deterministic signalsvia data compression, IEEE Trans. Signal Processing, vol. 43, pp.25952605, Nov. 1995.
http://-/?-http://-/?- -
8/14/2019 ChangYV00a Wavelet Imagenes Denoising
15/15
1546 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2000
[26] J. Rissanen, Stochastic Complexity in Statistical Inquiry. Singapore:World Scientific, 1989.
[27] F. Ruggeri and B. Vidakovic, A Bayesian decision theoretic approachto wavelet thresholding, Statist. Sinica, vol. 9,no. 1, pp. 183197,1999.
[28] N. Saito,Simultaneous noisesuppression and signal compression usinga library of orthonormal bases and the minimum description length cri-terion, in Wavelets in Geophysics, E. Foufoula-Georgiou and P. Kumar,Eds. New York: Academic, 1994, pp. 299324.
[29] E. Simoncelli and E. Adelson, Noise removal via Bayesian wavelet
coring, Proc. IEEE Int. Conf. Image Processing, vol. 1, pp. 379382,Sept. 1996.[30] C. M. Stein, Estimation of the mean of a multivariate normal distribu-
tion, Ann. Statist., vol. 9, no. 6, pp. 11351151, 1981.[31] M. Vetterl i and J. Kovacevic, Wavelets and Subband
Coding. Englewood Cliffs, NJ: Prentice-Hall, 1995.[32] B. Vidakovic, Nonlinear wavelet shrinkage with Bayes rules and Bayes
factors, J. Amer. Statist. Assoc., vol. 93, no. 441, pp. 173179, 1998.[33] Y. Wang, Function estimation via wavelet shrinkage for long-memory
data, Ann. Statist., vol. 24, pp. 466484, 1996.[34] P. H. Westerink, J. Biemond, and D. E. Boekee, An optimal bit alloca-
tion algorithm for sub-band coding, in Proc. IEEE Int. Conf. Acoustics,Speech, Signal Processing, Dallas, TX, Apr. 1987, pp. 13781381.
[35] N. Weyrich and G. T. Warhola, De-noising using wavelets and cross-validation, Dept.of Mathematicsand Statistics, Air Force Inst.of Tech.,AFIT/ENC, OH, Tech. Rep. AFIT/EN/TR/94-01, 1994.
[36] Y. Yoo, A. Ortega, and B. Yu, Image subband coding using context-
based classification and adaptive quantization,IEEE Trans. Image Pro-cessing, vol. 8, pp. 17021715, Dec. 1999.
S. Grace Chang (S95) received the B.S. degreefrom the Massachusetts Institute of Technology,Cambridge, in 1993 and the M.S. and Ph.D. degreesfrom the University of California, Berkeley, in 1995and 1998, respectively, all in electrical engineering.
She was with Hewlett-Packard Laboratories, PaloAlto, CA, and is now with Hewlett-Packard Co.,Grenoble, France. Her research interests includeimage enhancement and compression, Internet appli-cations and content delivery, and telecommunicationsystems.
Dr. Chang was a recipient of the National Science Foundation Graduate Fel-lowship and a University of California Dissertation Fellowship.
Bin Yu (A92SM97) received the B.S. degree inmathematics fromPeking University, China,in 1984,and the M.S. and Ph.D. degrees in statistics from theUniversity of California, Berkeley, in 1987 and 1990,respectively.
She is an Associate Professor of statistics withthe University of California, Berkeley. Her researchinterests include statistical inference, informationtheory, signal compression and denoising, bioin-formatics, and remote sensing. She has publishedover 30 technical papers in journals such as IEEE
TRANSACTIONS ON INFORMATION THEORY, IEEE TRANSACTIONS ON IMAGEPROCESSING, IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING,The Annals of Statistics, Annals of Probability, Journal of American Statistical
Association, and Genomics. She has held faculty positions at the Universityof Wisconsin, Madison, and Yale University, New Haven, CT, and was aPostdoctoral Fellow with the Mathematical Science Research Institute (MSRI),University of California, Berkeley.
Dr. Yu wasin the S. S. Chern Mathematics Exchange Program between Chinaand the U.S. in 1985. She is a Fellow of the Institute of Mathematical Statistics(IMS), and a member of The American Statistical Association (ASA). She isserving on the Board of Governors of the IEEE Information Theory Society,and as an Associate Editor for The Annals of Statistics and for Statistica Sinica.
Martin Vetterli (S86M86SM90F95) re-ceived the Dipl. El.-Ing. degree from ETH Zrich(ETHZ), Zrich, Switzerland, in 1981, the M.S.degree from Stanford University, Stanford, CA, in1982, and the Dr.Sci. degree from EPF Lausanne(EPFL), Lausanne, Switzerland, in 1986.
He was a Research Assistant at Stanford Univer-sity and EPFL, and was with Siemens and AT&TBell Laboratories. In 1986, he joined Columbia
University, New York, where he was an AssociateProfessor of electrical engineering and Co-Directorof the Image and Advanced Television Laboratory. In 1993, he joined theUniversity of California, Berkeley, where he was a Professor in the Departmentof Electrical Engineering and Computer Sciences until 1997, and holds nowAdjunct Professor position. Since 1995, he has been a Professor of communica-tion systems at EPFL, where he chaired the Communications Systems Division(19961997), and heads the Audio-Visual Communications Laboratory. Heheld visiting positions at ETHZ in 1990 and Stanford University in 1998.He is on the editorial boards of Annals of Telecommunications, Applied andComputational Harmonic Analysis, and the Journal of Fourier Analysis and
Applications. He is the co-author, with J. Kovacevic, of the book Waveletsand Subband Coding (Englewood Cliffs, NJ: Prentice-Hall, 1995). He haspublished about 75 journal papers on a variety of topics in signal and imageprocessing and holds five patents. His research interests include wavelets,multirate signal processing, computational complexity, signal processing fortelecommunications, digital video processing, and compression and wireless
video communications.Dr. Vetterli is a member of SIAM and was the Area Editor for Speech,Image, Video, and Signal Processing for the IEEE TRANSACTIONS ONCOMMUNICATIONS. He received the Best Paper Award of EURASIP in 1984for his paper on multidimensional subband coding, the Research Prize of theBrown Bovery Corporation, Switzerland, in 1986 for his doctoral thesis, theIEEE Signal Processing Societys Senior Award in 1991 and 1996 (for paperswith D. LeGall and K. Ramchandran, respectively). He was a IEEE SignalProcessing Distinguished Lecturer in 1999. He received the Swiss NationalLatsis Prize in 1996 and the SPIE Presidential award in 1999. He has been aPlenary Speaker at various conferences including the 1992 IEEE InternationalConference on Acoustics, Speech, and Signal Processing.