A Content-Aware Image Prior€¦ · A Content-Aware Image Prior Taeg Sang Cho†, Neel Joshi‡, C....

A Content-Aware Image Prior

Taeg Sang Cho†, Neel Joshi‡, C. Lawrence Zitnick‡, Sing Bing Kang‡, Richard Szeliski‡, William T. Freeman†† Massachusetts Institute of Technology,‡ Microsoft Research

Abstract

In image restoration tasks, a heavy-tailed gradient distri-bution of natural images has been extensively exploited asan image prior. Most image restoration algorithms imposea sparse gradient prior on the whole image, reconstructingan image with piecewise smooth characteristics. While thesparse gradient prior removes ringing and noise artifacts,it also tends to remove mid-frequency textures, degradingthe visual quality. We can attribute such degradations toimposing an incorrect image prior. The gradient profile infractal-like textures, such as trees, is close to a Gaussiandis-tribution, and small gradients from such regions are severelypenalized by the sparse gradient prior.

To address this issue, we introduce an image restorationalgorithm that adapts the image prior to the underlying tex-ture. We adapt the prior to both low-level local structures aswell as mid-level textural characteristics. Improvementsinvisual quality is demonstrated on deconvolution and denois-ing tasks.

1. Introduction

Image enhancement algorithms resort to image priors tohallucinate information lost during the image capture. Inrecent years, image priors based on image gradient statis-tics have received much attention. Natural images oftenconsist of smooth regions with abrupt edges, leading to aheavy-tailed gradient profile. We can parameterize heavy-tailed gradient statistics with a generalized Gaussian distri-bution or a mixture of Gaussians. Prior works hand-selectparameters for the model distribution, and fix them for theentire image, imposing the same image prior everywhere[10, 17, 23]. Unfortunately, different textures have differ-ent gradient statistics even within a single image, thereforeimposing a single image prior for the entire image is inap-propriate (Figure1).

We introduce an algorithm that adapts the image prior toboth low-level local structures as well as mid-level texturecues, thereby imposing the correct prior for each texture.1

Adapting the image prior to the image content improves thequality of restored images.

1Strictly speaking, an estimate of image statistics made after examin-ing the image is no longer a “prior” probability. But the fitted gradientdistributions play the same role as an image prior in image reconstructionequations, and we keep that terminology.

0 0.1 0.2 0.3 0.4 0.5 0.610

−4

10−3

10−2

10−1

100

Aligned gradients

Gradient magnitude

0 0.1 0.2 0.3 0.4 0.5 0.610

−4

10−3

10−2

10−1

100

Orthogonal gradients

Gradient magnitude

ox

∆

ax

∆

Figure 1. The colored gradient profiles correspond to the regionwith the same color mask. The steered gradient profile is spatiallyvariant in most natural images. Therefore, the image prior shouldadapt to the image content. Insets illustrate how steered gradientsadapt to local structures.

2. Related workImage prior research revolves around finding a good im-

age transform or basis functions such that the transformedimage exhibits characteristics distinct from unnatural im-ages. Transforms derived from signal processing have beenexploited in the past, including the Fourier transform [12],the wavelet transform [28], the curvelet transform [6], andthe contourlet transform [8].

Basis functions learned from natural images have alsobeen introduced. Most techniques learn filters that lie in thenull-space of the natural image manifold [20, 30, 31, 32].Aharon et al. [1] learns a vocabulary from which a natu-ral image is composed. However, none of these techniquesadapt the basis functions to the image under analysis.

Edge-preserving smoothing operators do adapt to localstructures while reconstructing images. The anisotropic dif-fusion operator [5] detects edges, and smoothes along edges

1

but not across them. A similar idea appeared in a prob-abilistic framework called a Gaussian conditional randomfield [26]. A bilateral filter [27] is also closely related toanisotropic operators. Elad [9] and Barash [2] discuss rela-tionships between edge-preserving operators.

Some image models adapt to edge orientations as well asmagnitudes. Hammondet al. [13] present a Gaussian scalemixture model that captures the statistics of gradients adap-tively steered in the dominant orientation in image patches.Rothet al. [21] present a random field model that adapts tothe oriented structures. Bennettet al. [3] and Joshiet al. [14]exploit a prior on colors: an observation that there are twodominant colors in a small window.

Adapting the image prior to textural characteristics wasinvestigated for gray-scale images consisting of a singletexture [23]. Bishop et al. [4] present a variational im-age restoration framework that breaks an image into squareblocks and adapts the image prior to each block indepen-dently (i.e. the image prior is fixed within the block). How-ever, Bishopet al. [4] do not address issues with estimatingthe image prior at texture boundaries.

3. Image characteristics

We analyze the statistics of gradients adaptively steeredin the dominant local orientation of an imagex. Rothet al.[21] observe that the gradient profile of orthogonal gradi-ents∇ox is typically of higher variance compared to that ofaligned gradients∇ax, and propose imposing different pri-ors on∇ox and∇ax. We show that different textures withinthe same image also have distinct gradient profiles.

We parameterize the gradient profile using a generalizedGaussian distribution:

p(∇x) =γλ( 1

γ)

2Γ( 1γ)

exp(−λ‖∇x‖γ) (1)

whereΓ is a Gamma function, andγ, λ are the shape pa-rameters.γ determines the peakiness andλ determines thewidth of a distribution. We assume that∇ox and∇ax areindependent:p(∇ox,∇ax) = p(∇ox)p(∇ax).

3.1. Spatially variant gradient statistics

The local gradient statistics can be different from theglobal gradient statistics. Figure1 shows the gradient statis-tics of the colored regions. Two phenomena are responsiblefor the spatially variant gradient statistics: the material andthe viewing distance. For example, a building is noticeablymore piecewise smooth than a gravel path due to materialproperties, whereas the same gravel path can exhibit differ-ent gradient statistics depending on the viewing distance.Toaccount for the spatially variant gradient statistics, we pro-pose adjusting the image prior locally.

−4

gamma

log

−la

mb

da

Orthogonal gradient

0 0.5 1 1.5 2 2.5 3 3.5

2

4

6

8

10

12

14

0

1

2

3

4

5

6

x 10

gamma

log

−la

mb

da

Aligned gradient

0 0.5 1 1.5 2 2.5 3 3.5

2

4

6

8

10

12

14

0

1

2

3

4

5

6

x 10−4-4

Figure 2. The distribution ofγ, ln(λ) for ∇ox,∇ax in natural im-ages. While the density is the highest aroundγ = 0.5, the densitytails off slowly with significant density aroundγ = 2. We show, asinsets, some patches from Figure1 that are representative of differ-entγ, ln(λ).

3.2. The distribution of γ, λ in natural images

Different textures give rise to different gradient profilesand thus differentγ, λ. We study the distribution of theshape parametersγ, λ in natural images. We sample∼110, 000 patches of size41 × 41 from 500 high quality nat-ural images. We fit the gradient profile from each patch toa generalized Gaussian distribution to associate each patchwith γ, λ. We fit the distribution by searching forγ, λ thatminimize the Kullback-Leibler (KL) divergence betweenthe empirical gradient distribution and the model distribu-tion p, which is equivalent to minimizing the negative log-likelihood of the model distribution evaluated over the gra-dient samples:

[γ, λ] = argminγ,λ

{

−1

N

N∑

i=1

ln (p(∇xi))

}

(2)

Figure2shows the Parzen-window fit to sampledγ, ln(λ)for ∇ox,∇ax. For orthogonal gradients∇ox, there exists alarge cluster nearγ = 0.5, ln(λ) = 2. This cluster corre-sponds to patches from a smooth region with abrupt edges ora region near texture boundaries. This observation supportsthe dead leaves image model – an image is a collage of over-lapping instances [16, 19]. However, we also observe a sig-nificant density even whenγ is greater than1. Samples nearγ = 2 with largeλ correspond to flat regions such as sky,and samples nearγ = 2 with smallλ correspond to fractal-like textures such as tree leaves or grass. We observe sim-ilar characteristics for aligned gradients∇ax as well. Thedistribution of shape parameters suggests that a significantportion of natural images is not piecewise smooth, whichjustifies adapting the image prior to the image content.

4. Adapting the image prior

The goal of this paper is to identify the correct imageprior (i.e. shape parameters of a generalized Gaussian dis-tribution) for each pixel in the image. One way to identifythe image prior is to fit gradients from every sliding win-dow to a generalized Gaussian distribution (Eq2), but the

required computation would be overwhelming. We intro-duce a regression-based method to estimate the image prior.

4.1. Image model

Let y be an observed degraded image,k be a blur kernel(a point-spread function or a PSF), andx be a latent image.Image degradation is modeled as a convolution process:

y = k ⊗ x+ n (3)

where⊗ is a convolution operator, andn is an observationnoise. The goal of a (non-blind) image restoration problemis to recover a clean imagex from a degraded observationy given a blur kernelk and a standard deviation of noiseη, both of which can be estimated through other techniques[10, 18].

We introduce a conditional random field (CRF) model toincorporate texture variations within the image restorationframework. Typically, a CRF restoration model can be ex-pressed as follows:

p(x|y, k, η) =1

M

∏

i

φy(x; yi, k, η)φx(x) (4)

whereM is a partition function andi is a pixel index.φy

is derived from the observation process Eq3; φx from theassumed image prior:

φy(x; yi, k, η) ∝ exp(−(yi − (k ⊗ x)i)

2

2η2) (5)

φx(x) ∝ exp(−λ‖∇x‖γ) (6)

To model the spatially variant gradient statistics, we in-troduce an additional hidden variablez, calledtexture, to theconventional CRF model.z controls the shape parameters ofthe image prior:

p(x, z|y, k, η) =1

M

∏

i

φy(x; yi, k, η)φx,z(x, z) (7)

whereφx,z(x, z) ∝ exp(−λ(z)‖∇x‖γ(z)). We modelz as acontinuous variable since the distribution of[γ, λ] is heavy-tailed and does not form tight clusters (Figure2).

We maximizep(x|y, k, η) to estimate a clean imagex.To do so, we use a Laplace approximation around the modez to approximatep(x|y, k, η):

p(x|y, k, η) =

∫

z

p(x, z|y, k, η)dz ≈ p(x, z|y, k, η) (8)

Section4.2discusses how we estimatez for each pixel.

4.2. Estimating the texturez

A notable characteristic of a zero-mean generalizedGaussian distribution is that the variancev and the fourthmomentf completely determine the shape parameters[γ, λ][24]:

Down-sampled original image

Do

wn

-sa

mp

led

de

con

vo

lve

d im

ag

e

0.02 0.04 0.06 0.08 0.1

0.02

0.04

0.06

0.08

0.1

0.01

0.03

0.05

0.07

0.09

Down-sampled original image

Do

wn

-sa

mp

led

de

con

vo

lve

d im

ag

e

0.005 0.015 0.025 0.035 0.0450.01 0.02 0.03 0.04

0.005

0.015

0.025

0.035

0.045

0.01

0.02

0.03

0.04

Variance of gradients Fourth moment of gradients

Figure 3. The local variance and fourth moments of gradientscom-puted from the deconvolved, down-sampled image of Figure1 areclosely correlated with those of the down-sampledoriginal image.

v =Γ(3/γ)

λ2

γ Γ(1/γ), f =

Γ(5/γ)

λ4

γ Γ(1/γ)(9)

To take advantage of these relationships, we define the localtexture around a pixeli, zi, as a two dimensional vector. Thefirst dimension is the variancevi of gradients in the neigh-borhood of a pixeli, and the second dimension is the fourthmomentfi of gradients in the neighborhood of a pixeli:

zi = [vi(∇x), fi(∇x)] (10)

Qualitatively, the variance of gradientsvi(∇x) encodes thewidth of the distribution, and the fourth momentfi(∇x) en-codes the peakiness of the distribution. Note that we caneasily computevi, fi by convolving the gradient image witha window that defines the neighborhood. We use a Gaussianwindow with a 4-pixel standard deviation.

Estimating the texture z from the observation y Thetexture z should be estimated from the sharp imagex wewish to reconstruct, butx is not available when estimat-ing z. We address this issue by estimating the texturezfrom an image reconstructed using a spatially invariant im-age prior. We hand-select the spatially invariant prior witha weak gradient penalty so that textures are reasonably re-stored: [γo = 0.8, λo = 6.5], [γa = 0.6, λa = 6.5]. Acaveat is that the fixed prior deconvolution may contami-nate the gradient profile of the reconstructed image, whichmay induce texture estimation error. To reduce such decon-volution noise, we down-sample the deconvolved image bya factor of 2 in both dimensions before estimating the tex-ture z. The gradient profile of natural images is often scaleinvariant due to fractal properties of textures and piecewisesmooth properties of surfaces [16, 19], whereas that of thedeconvolution noise tends to be scale variant. Therefore, thetexturez estimated from the down-sampled deconvolved im-age better resembles the texture of the original sharp image.

4.3. Estimating the shape parametersγ, λ from z

We could numerically invert Eq9 to directly compute theshape parameters[γ, λ] from the variance and fourth mo-ment estimates [24]. However, a numerical inversion is com-putationally expensive and is sensitive to noise. We instead

(a)

10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

1

Image

edge

(b)

−5

−10

10 20 30 40 50 60 70 80 90 100

−6

−10

−14

−18

−20

log−variance − 2 pix. std.log−variance − 4 pix. std.log−fourth moment − 2 pix. std.

log−fourth moment − 4 pix. std.

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

(c)

Gamma - from regression

GCRF observation noise

Regularized gamma

Figure 4. We regularize the estimated shape parameters using aGCRF such that the texture transition mostly occurs at the textureboundary. We model the observation noise in the GCRF as thevarianceof the variance and fourth moments estimated from twoGaussian windows with different standard deviations – 2-pixel and4-pixel, as shown in (b). This reduces the shape parameter esti-mation error at texture boundaries, as shown in (c) (comparegreenand red curves).

use a kernel regressor that maps the log of the textureln(z)to shape parameters[γ, ln(λ)].

The regressor should learn the mapping from the tex-turez of the down-sampleddeconvolvedimage to shape pa-rameters in order to account for any residual deconvolutionnoise in the estimated texturez. Since the deconvolved im-age, thusz, depends on the blur kernel and the noise level,we would have to train regressors discriminatively for eachdegradation scenario, which is intractable. However, we em-pirically observe in Figure3 that the variance and fourth mo-ment of the deconvolved, down-sampled image are similarto those of the down-sampled original image. Therefore, welearn asingle regressor that maps the variance and fourthmoment of the down-sampledoriginal image to the shapeparameters, and use it to estimate the shape parameters fromthe down-sampled deconvolved image.

To learn the regression function, we sample125, 000patches of size17× 17 pixels from500 high quality naturalimages. We fit the gradient profile of each patch to a gen-eralized Gaussian distribution, and associate each fit withthe variance and fourth moment of gradients in the down-sampled version of each patch (9 × 9 pixels). We use thecollected data to learn the mapping from[ln(v), ln(f)] to[γ, ln(λ)] using LibSVM [7]. We use a 10-fold cross valida-tion technique to avoid over-fitting.

ln(λ)Υ

Est

ima

ted

fro

m

ori

gin

al i

ma

ge

Est

ima

ted

fro

m

de

con

vo

lve

d im

ag

e

0

0.5

1

1.5

2

2.5

3

−2

−1

0

1

2

3

4

5

6

7

Figure 5. The shape parameters for orthogonal gradients, estimatedfrom the down-sampled deconvolved image of Figure1 (recon-structed from an image degraded with 5% noise and the blur inFigure7) and the down-sampled original image. The shape param-eters estimated from the deconvolved image are similar to those ofthe original image.

4.4. Handling texture boundaries

If multiple textures appear within a single window, theestimated shape prior can be inaccurate. Suppose we wantto estimate the image prior for a 1-dimensional slice of animage (Figure4(a)). Ideally, we should recover two regionswith distinct shape parameters that abut each other via a thinband of shape parameters corresponding to an edge. How-ever, the estimated image prior becomes “sparse” (i.e. smallγ) near the texture boundary even if pixels do not correspondto an edge (the green curve in Figure4(c)). The use of afinite-size window for computingv andf causes this issue.

To recover shape parameters near texture boundaries, weregularize the estimated shape parameters using a Gaussianconditional random field (GCRF) [26]. Essentially, we wantto smooth shape parameters only near texture boundaries. Anotable characteristic at texture boundaries is thatz’s esti-mated from two different window sizes tend to be differentfrom each other: while a small window spans a homogenoustexture, a larger window could span two different textures,generating differentz’s. We leverage this characteristic tosmooth only near texture boundaries by defining the obser-vation noise level in GCRF as amean varianceof the vari-ancev and of the fourth momentf estimated from windowsof two different sizes (Gaussian windows with 2-pixel and4-pixel standard deviations.) If the variance of these esti-mates is large, the central pixel is likely to be near a textureboundary, thus we smooth the shape parameter at the centralpixel. Appendix A discusses the GCRF model in detail. Fig-ure 4(c) shows the estimated shape parameters before andafter regularization along with the estimated GCRF observa-tion noise. After regularization, two textures are separatedby a small band of sparse image prior corresponding to anedge.

Figure 5 shows the estimated shape parameters for or-

thogonal gradients of Figure1. In the top row of Figure5,the parameters are estimated from the image reconstructedfrom 5% noise and the blur in Figure7. We observe that theestimated prior in the tree region is close to Gaussian (i.e.γ = 2 ∼ 3), whereas the estimated prior in the building re-gion is sparse (i.e.γ < 1). The estimated shape parametersare similar to parameters estimated from the down-sampled,original image (the bottom row of Figure5). This supportsthe claim that shape parameters estimated from a degradedinput image reasonably accurate.

4.5. Implementation details

We minimize the negative log-posterior to reconstruct aclean imagex:

x = argminx

{(y − k ⊗ x)2

2η2(11)

+ w

N∑

i=1

(λo(zi)‖∇ox(i)‖γo(zi) + λa(zi)‖∇x‖

γa(zi))}

where[γo, λo], [γa, λa] are estimated parameters for orthog-onal and aligned gradients, respectively, andw is a weight-ing term that controls the gradient penalty.w = 0.025 in allexamples. We minimize Eq11using an iterative reweightedleast squares algorithm [17, 25].

5. Experimental results

We evaluate the performance of the content-aware imageprior for deblurring and denoising tasks. We compare ourresults to those reconstructed using a sparse unsteered gradi-ent prior [17] and a sparse steered gradient prior [21], usingpeak signal-to-noise ratio (PSNR) and gray-scale structuralsimilarity (SSIM) [29] as quality metrics. We augmentedthe steerable random fields [21], which introduced denois-ing and image inpainting as applications, to perform decon-volution using the sparse steered gradient prior. In all exper-iments, we use the first order and the second order gradientfilters [11]. We can augment these algorithms using higherorder gradient filters to improve reconstruction qualities, butit is not considered in this work. The test set consists of 21high quality images downloaded from LabelMe [22] withenough texture variations within each image.

Non-blind deconvolution The goal of non-blind deconvo-lution is to reconstruct a sharp image from a blurred, noisyimage given a blur kernel and a noise level. We gener-ate our test set by blurring images with the kernel shownin Figure7, and adding5% noise to blurred images. Fig-ure6 shows the measured PSNR and SSIM for different de-convolution methods. The content-aware prior deconvolu-tion method performs favorably compared to the competingmethods, both in terms of PSNR and SSIM. The benefit ofusing a spatially variant prior is more pronounced for imageswith large textured regions. If the image consists primarily

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 210.88

0.9

0.92

0.94

0.96

0.98

1

Image number

SSIM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2122

23

24

25

26

27

28

29

30

(dB

)

PSNR

Unsteered gradient prior

Steered gradient prior

Content-aware prior

Figure 6. Image deconvolution results : PSNR and SSIM. MeanPSNR: unsteered gradient prior – 26.45 dB, steered gradientprior– 26.33 dB,content-aware prior – 27.11 dB. Mean SSIM: un-steered gradient prior – 0.937, steered gradient prior – 0.940,content-aware prior – 0.951.

of piecewise smooth objects such as buildings, the differ-ence between the content-aware image prior and others isminor. Figure7 compares the visual quality of images re-constructed using different priors. We observe that texturedregions are best reconstructed using the content-aware im-age prior, illustrating the benefit of adapting the image priorto textures.

Denoising The goal of denoising is to reconstruct a sharpimage from a noisy observation given a noise level. We con-sider reconstructing clean images from degraded images attwo noise levels:5% and10%. Figure8 shows the mea-sured PSNR and SSIM for the denoising task. When thenoise level is low (5%), the content-aware prior reconstructsimages with lower PSNR compared to competing methods.One explanation is that the content-aware prior may not re-move all the noise in textured regions (such as trees) becausethe gradient statistics of noise is similar to that of the under-lying texture. Such noise, however, does not disturb the vi-sual quality of textured regions. The SSIM measure, whichis better correlated with the perceptual quality [29], showsthat the content-aware image prior performs slightly worse,if not comparably, compared to other methods at a5% noiselevel. It’s worth noting that when the noise level is low, theobservation term is strong so that reconstructed images donot depend heavily on the image prior. The top row of Fig-ure9 shows that at a5% noise level, reconstructed imagesare visually similar.

When the noise level is high (10%), SSIM clearly fa-vors images reconstructed using the content-aware prior. Inthis case, the observation term is weak, thus the image priorplays an important role in the quality of reconstructed im-

Blurry image Content-aware priorUnsteered gradient prior Steered gradient prior

PSNR: 25.59dB, SSIM: 0.960 PSNR: 25.48dB, SSIM: 0.962 PSNR: 26.45dB, SSIM: 0.970

Original image

Figure 7. Adapting the image prior to textures leads to better reconstructions. The red box denotes the cropped region.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2127

28

29

30

31

32

33

34

35

(dB

)

PSNR

Image number1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

0.965

0.97

0.975

0.98

0.985

0.99

0.995

1SSIM

σ = 5% σ = 10%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2125

26

27

28

29

30

31

32PSNR

Image number1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

0.88

0.9

0.92

0.94

0.96

0.98

1SSIM



Content-aware prior



Content-aware prior

Figure 8. Image denoising results : PSNR and SSIM. At5% noise = Mean PSNR: unsteered gradient prior – 32.53 dB, steered gradient prior– 32.74 dB,content-aware prior – 31.42 dB. Mean SSIM: unsteered gradient prior – 0.984, steered gradient prior – 0.984,content-awareprior – 0.982. At 10% noise = Mean PSNR: unsteered gradient prior – 28.54 dB, steered gradient prior – 28.43 dB,content-aware prior– 28.52 dB. Mean SSIM: unsteered gradient prior – 0.950, steered gradient prior – 0.953,content-aware prior – 0.959

ages. Therefore, the performance benefit from using thecontent-aware prior is more pronounced. The bottom row ofFigure9 shows denoising results at a10% noise level, sup-porting our claim that the content-aware image prior gener-ates more visually pleasing textures.

Figure10 shows the result of deconvolving a real blurryimage captured with a handheld camera. We estimate theblur kernel using the algorithm in Ferguset al. [10]. Again,textured regions are better reconstructed using our method.

5.1. User studyWe conducted a user study on Amazon Mechanical Turk

to compare the visual quality of the reconstructed images.We evaluated 5 randomly selected images for each degra-dation scenario considered above. Each user views two im-ages, one reconstructed using the content-aware prior andanother reconstructed using either the unsteered gradientprior or the steered gradient prior. The user selects the visu-ally pleasing one of the two, or selects“There is no differ-ence” option. We cropped each test image to500 × 350pixels to ensure that the users view both images withoutscrolling. We did not refer to restored images while select-

blu

r

kern

el

no

ise

lev

el

5% 10% 5%

blu

r

kern

el

no

ise

lev

el

5% 10% 5%

Content-aware gradient prior

Vs. sparse unsteered gradient prior

Content-aware gradient prior

Vs. sparse steered gradient prior

Steered grad. priorContent-aware priorNo DifferenceUnsteered grad. prior

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

Figure 11. User study results. The blue region corresponds to thefraction of users that favored our reconstructions for eachdegra-dation scenario. At a low degradation level, users do not preferone method over another, but as the level of degradation increases,users clearly favor the content-aware image prior.

ing crop regions in order to minimize bias.

We gathered about 20 user opinions for each comparison.In Figure11, we show the average user preference in eachdegradation scenario. Users did not have a particular pref-erence when the degradation was small (e.g.5% noise), butat a high image degradation level users clearly favored thecontent-aware image prior over others.

Noisy image Unsteered gradient prior Content-aware priorSteered gradient prior

PSNR : 30.74dB, SSIM: 0.995 PSNR : 30.85dB, SSIM: 0.995 PSNR : 29.27dB, SSIM: 0.995

5%

no

ise

10

% n

ois

e

PSNR : 29.49dB, SSIM: 0.957 PSNR : 29.33dB, SSIM: 0.960 PSNR : 29.41dB, SSIM: 0.965

Original image

Figure 9. The visual comparisons of denoised images. The redbox denotes the cropped region. At5% noise level, while the PSNR ofour result is lower than those of competing algorithms, visually the difference is imperceptible. At10% noise level, content-aware prioroutperforms others in terms of both the PSNR and the SSIM, andis more visually pleasing.

Sparse unsteered gradient prior Sparse steered gradient prior Content-aware priorBlurry input image

Figure 10. The deconvolution of a blurred image taken with a hand-held camera. We estimate the blur kernel using Ferguset al. [10]. Thered box denotes the cropped region. The textured region is better reconstructed using the content-aware image prior.

5.2. DiscussionsA limitation of our algorithm, which is shared with algo-

rithms using a conditional random field model with hiddenvariables [14, 21, 26], is that hidden variables, such as themagnitude and/or orientation of an edge, or textureness of aregion, are estimated from the degraded input image or theimage restored through other means. Any error from thispreprocessing step induces error in the final result.

Another way to estimate a spatially variant prior is to seg-ment the image into regions and assume a single prior withineach segment. Unless we segment the image into manypieces, the estimated prior can be inaccurate. Also, the seg-mentation may inadvertently generate artificial boundariesin reconstructed images. Therefore, we estimate a distinctimage prior for each pixel in the image.

6. Conclusion

We have explored the problem of estimating spatiallyvariant gradient statistics in natural images, and exploited

the estimated gradient statistics to adaptively restore differ-ent textural characteristics in image restoration tasks. Weshow that the content-aware image prior can restore piece-wise smooth regions without over-smoothing textured re-gions, improving the visual quality of reconstructed imagesas verified through user studies. Adapting to textural charac-teristics is especially important when the image degradationis significant.

Appendix A

Gaussian CRF model for[γ, λ] regularization

We regularize the regressor outputs[γ, λ] using a Gaus-sian Conditional Random Fields (GCRF). We maximize thefollowing probability to estimate regularizedγ:

P (γ; γ) ∝∏

i,j∈N(i)

ψ(γi|γi)Ψ(γi, γj) (12)

whereN(i) denotes the neighborhood ofi, ψ is the obser-vation model andΨ is the neighborhood potential:

ψ(γi|γi) ∝ exp

(

−(γi − γi)

2

2σ2l

)

Ψ(γi, γj) ∝ exp

(

−(γi − γj)

2

2σ2n(i, j)

)(13)

We setσl andσn adaptively. We set the varianceσ2n(i, j)

of the neighboringγ estimatesγi, γj asσ2n(i, j) = α(x(i)−

x(j))2, wherex is the image andα = 0.01 controls howsmooth the neighboring estimates should be.σn encouragesthe discontinuity at strong edges of the imagex [26]. Theobservation noiseσ2

l is the average variance of the varianceand fourth moment estimates (for two Gaussian windowswith standard deviation = 2 pixels, 4 pixels). We use thesame GCRF model to regularizeln(λ) with α = 0.001.

AcknowledgmentThis work was done while the first author was an intern

at MSR. This research is partially funded by NGA NEGI-1582-04-0004, by ONR-MURI Grant N00014-06-1-0734,and by gift from Microsoft, Google, Adobe. The first au-thor is also supported by Samsung Scholarship Foundation.

References

[1] M. Aharon, M. Elad, and A. Bruckstein. The K-SVD: an al-gorithm for designing of overcomplete dictionaries for sparserepresentation.IEEE TSP, Nov. 2006.1

[2] D. Barash. A fundamental relationship between bilateral fil-tering, adaptive smoothing, and the nonlinear diffusion equa-tion. IEEE TPAMI, 24(6):844 – 847, June 2002.2

[3] E. P. Bennett, M. Uyttendaele, C. L. Zitnick, R. Szeliski, andS. B. Kang. Video and image bayesian demosaicing with atwo color image prior. InECCV, 2006.2

[4] T. E. Bishop, R. Molina, and J. R. Hopgood. Nonstationaryblind image restoration using variational methods. InIEEEICIP, 2007.2

[5] M. J. Black, G. Sapiro, D. H. Marimont, and D. Heeger. Ro-bust anisotropic diffusion.IEEE TIP, Mar. 1998.1

[6] E. J. Candes and D. L. Donoho. Curvelets - a surprisinglyeffective nonadaptive representation for objects with edges,1999.1

[7] C.-C. Chang and C.-J. Lin.LIBSVM: a library for supportvector machines, 2001.4

[8] M. N. Do and M. Vetterli. The contourlet transform: an effi-cient directional multiresolution image representation.IEEETIP, Dec. 2005.1

[9] M. Elad. On the origin of the bilateral filter and ways toimprove it. IEEE TIP, Oct. 2002.2

[10] R. Fergus, B. Singh, A. Hertzmann, S. Roweis, and W. T.Freeman. Removing camera shake from a single photograph.ACM TOG (SIGGRAPH), 2006.1, 3, 6, 7

[11] W. T. Freeman and E. H. Adelson. The design and use ofsteerable filters.IEEE TPAMI, 13(9), Sept. 1991.5

[12] Gonzalez and Woods.Digital image processing. PrenticeHall, 2008.1

[13] D. K. Hammond and E. P. Simoncelli. Image denoising withan orientation-adaptive gaussian scale mixture model. InIEEE ICIP, 2006.2

[14] N. Joshi, C. L. Zitnick, R. Szeliski, and D. Kriegman. Imagedeblurring and denoising using color priors. InIEEE CVPR,2009.2, 7

[15] J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright.Convergence properties of the nelder-mead simplex methodin low dimensions.SIAM Journal of Optimization, 1998.

[16] A. B. Lee, D. Mumford, and J. Huang. Occlusion models fornatural images: a statistical study of a scale-invariant deadleaves model.IJCV, 41:35–59, 2001.2, 3

[17] A. Levin, R. Fergus, F. Durand, and W. T. Freeman. Imageand depth from a conventional camera with a coded aperture.ACM TOG (SIGGRAPH), 2007.1, 5

[18] C. Liu, W. T. Freeman, R. Szeliski, and S. B. Kang. Noiseestimation from a single image. InIEEE CVPR, 2006.3

[19] G. Matheron. Random Sets and Integral Geometry. JohnWiley and Sons, 1975.2, 3

[20] S. Roth and M. J. Black. Fields of experts: a framework forlearning image priors. InIEEE CVPR, June 2005.1

[21] S. Roth and M. J. Black. Steerable random fields. InIEEEICCV, 2007.2, 5, 7

[22] B. Russell, A. Torralba, K. Murphy, and W. T. Freeman. La-belMe: a database and web-based tool for image annotation.IJCV, 2008.5

[23] S. S. Saquib, C. A. Bouman, and K. Sauer. ML parame-ter estimation for markov random fields with applications tobayesian tomography.IEEE TIP, 7(7):1029, 1998.1, 2

[24] E. P. Simoncelli and E. H. Adelson. Noise removal viabayesian wavelet coring. InIEEE ICIP, 1996.3

[25] C. V. Stewart. Robust parameter estimation in computervi-sion. SIAM Reviews, 41(3):513 – 537, Sept. 1999.5

[26] M. F. Tappen, C. Liu, E. H. Adelson, and W. T. Freeman.Learning Gaussian conditional random fields for low-level vi-sion. InIEEE CVPR, 2007.2, 4, 7, 8

[27] C. Tomasi and R. Manduchi. Bilateral filtering for gray andcolor images. InIEEE ICCV, 1998.2

[28] M. Wainwright and E. P. Simoncelli. Scale mixtures of Gaus-sians and the statistics of natural images. InNIPS, 2000.1

[29] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli.Image quality assessment: from error visibility to structuralsimilarity. IEEE TIP, 2004.5

[30] Y. Weiss and W. T. Freeman. What makes a good model ofnatural images? InIEEE CVPR, 2007.1

[31] M. Welling, G. Hinton, and S. Osindero. Learning sparsetopographic representations with products of student-t distri-butions. InNIPS, 2002.1

[32] S. C. Zhu, Y. Wu, and D. Mumford. Filters, random fieldsand maximum entropy (FRAME): Towards a unified theoryfor texture modeling.IJCV, 1998.1

A Content-Aware Image Prior€¦ · A Content-Aware Image Prior Taeg Sang Cho†, Neel Joshi‡, C....

Documents

Transcript of A Content-Aware Image Prior€¦ · A Content-Aware Image Prior Taeg Sang Cho†, Neel Joshi‡, C....