High-Resolution Images from Low-Resolution Compressed Videohera.ugr.es/doi/14977801.pdf ·...

MAY 2003 IEEE SIGNAL PROCESSING MAGAZINE 371053-5888/03/$17.00©2003IEEE

High-Resolution Imagesfrom Low-ResolutionCompressed Video

Super resolution (SR) is the taskof estimating high-resolution(HR) images from a set oflow-resolution (LR) observa-

tions. These observations are acquired byeither multiple sensors imaging a singlescene or by a single sensor imaging thescene over a period of time. No matter the source of ob-servations though, the critical requirement for SR is thatthe observations contain different but related views of thescene. For scenes that are static, this requires subpixel dis-placements between the multiple sensors or within themotion of the single camera. For dynamic scenes, the nec-essary shifts are introduced by the motion of objects. No-tice that SR does not consider the case of a static sceneacquired by a stationary camera. These applications areaddressed by the field of image interpolation.

A wealth of research considers modeling the acquisi-tion and degradation of the LR frames and thereforesolves the HR problem. (In this article, we utilize theterms SR, HR, and resolution enhancement interchange-ably.) For example, literature reviews are presented in [1]and [2] as well as this special section. Work traditionallyaddresses the resolution enhancement of frames that arefiltered and down-sampled during acquisition and cor-rupted by additive noise during transmission and storage.In this article, though, we review approaches for the SRof compressed video. Hybrid motion compensation and

transform coding methods are the focus, which incorpo-rates the family of ITU and MPEG coding standards [3],[4]. The JPEG still image coding systems are also a specialcase of the approach.

The use of video compression differentiates the result-ing SR and traditional problems. As a first difference,video compression methods represent images with a se-quence of motion vectors and transform coefficients. Themotion vectors provide a noisy observation of the tempo-ral relationships within the HR scene. This is a type of ob-servation not traditionally available to the HR problem.The transform coefficients represent a noisy observationof the HR intensities. This noise results from more so-phisticated processing than the traditional processing sce-nario, as compression techniques may discard dataaccording to perceptual significance.

Additional problems arise with the introduction ofcompression. As a core requirement, resolution enhance-ment algorithms operate on a sequence of related but dif-ferent observations. Unfortunately, maintaining thisdifference is not the goal of a compression system. Thisdiscards some of the differences between frames and de-creases the potential for resolution enhancement.

In the rest of this article, we survey the field of SRprocessing for compressed video. The introduction ofmotion vectors, compression noise, and additional re-

©DIGITAL VISION, LTD.

C. Andrew Segall, Rafael Molina,and Aggelos K. Katsaggelos

dundancies within the image sequence makes this prob-lem fertile ground for novel processing methods. Inconducting this survey, though, we develop and presentall techniques within the Bayesian framework. This addsconsistency to the presentation and facilitates comparisonbetween the different methods.

The article is organized as follows. We define the acqui-sition system utilized by the surveyed procedures. Then weformulate the HR problem within the Bayesian frame-work and survey models for the acquisition and compres-sion systems. This requires consideration of both themotion vectors and transform coefficients within the com-pressed bit stream. We survey models for the original HRimage intensities and displacement values. We discuss solu-tions for the SR problem and provide examples of severalapproaches. Finally, we consider future research directions.

Acquisition Model and NotationBefore we can recover an HR image from a sequence ofLR observations, we must be precise in describing howthe two are related. We begin with the pictorial depictionof the system in Figure 1. As can be seen from the figure, acontinuous (HR) scene is first imaged by an LR sensor.This filters and samples the original HR data. The ac-quired LR image is then compressed with a hybrid mo-tion compensation and transform coding scheme. Theresulting bit stream contains both motion vectors andquantized transform coefficients. Finally, the compressedbit stream serves as the input to the resolution enhance-ment procedure. This SR algorithm provides an estimateof the HR scene.

In Figure 1, the HR data represents a time-varyingscene in the image plane coordinate system. This is de-noted as f x y t( , , ), where x, y, and t are real numbers thatindicate horizontal, vertical, and temporal locations. Thescene is filtered and sampled during acquisition to obtainthe discrete sequence g m nl ( , ), where l is an integer timeindex,1≥ ≥m M, and1≥ ≥n N. The sequence g m nl ( , ) isnot observable to any of the SR procedures. Instead, theframes are compressed with a video compression systemthat results in the observable sequence y m nl ( , ). It alsoprovides the motion vectors vl i m n, ( , ) that predict pixely m nl ( , ) from the previously transmitted y m ni ( , ).

Images in the figure are concisely expressed with thematrix-vector notation. In this format, one-dimensionalvectors represent two-dimensional images. These vectorsare formed by lexicographically ordering the image byrows, which is analogous to storing the frame in rasterscan format. Hence, the acquired and observed LR im-ages g m nl ( , ) and y m nl ( , ), respectively, are expressed bythe MN ×1 vectors g l and y l . The motion vectors thatpredict y l from y i are represented by the 2 1MN × vectorvl i, that is formed by stacking the transmitted horizontaland vertical offsets. Furthermore, since we utilize digitaltechniques to recover the HR scene, the HR frame is de-noted as s l . The dimension of this vector is PMPN ×1,

where P represents the resolution en-hancement factor.

Relationships between the origi-nal HR data and detected LR framesare further illustrated in Figure 2.Here, we show that the LR image g land HR image f l are related by

g AHfl l l , , ,= =, 1 2 3 K (1)

where H is a PMPN PMPN× matrixthat describes the filtering of the HRimage and A is an MN PMPN×down-sampling matrix. The matricesA and H model the acquisition systemand are assumed to be known. For themoment, we assume that the detectordoes not introduce noise.

38 IEEE SIGNAL PROCESSING MAGAZINE MAY 2003

Image Acquisition

Compression

Super Resolutiony m n v m nl l i( , ) ( , ),

f x y t( , , )

g m nl( , )

� 1. An overview of the SR problem. An HR image sequence iscaptured at low resolution by a camera or other acquisitionsystem. The LR frames are then compressed for storage andtransmission. The goal of the SR algorithm is to estimate theoriginal HR sequence from the compressed information.

fl−1

AH

Q[ ]

AH

Q[ ]

AH

Q[ ]

C d( )l l−1, C d( )l l+1,

fl+1

gl gl+1gl−1

yl−1 yl yl+1

MCl l l( , )y v MC 1 1 1( , )y vl l+ +l +

fl

pp

� 2. Relationships between the HR and LR images. HR frames are denoted as s l and aremapped to other time instances by the operator C d( ),i l . The acquisition system trans-forms the HR frames to the LR sequence g l , which is then compressed to produce y l .Notice that the compressed frames are also related through the motion vectors v i l, .

Frames within the HR sequence are also relatedthrough time. This is evident in Figure 2. Here, we as-sume a translational relationship between the frames thatis written as

( )f m n f m d m n n d m n r m nl k l km

l kn

l k( , ) ( , ), ( , ) ( , ), , ,= + + +(2)

where d m nl km, ( , ) and d m nl k

n, ( , ) denote the horizontal and

vertical components of the displacement dl k m n, ( , ) thatrelates the pixel at time k to the pixel at time l, andr m nl k, ( , )accounts for errors within the model. (Noise in-troduced by the sensor can also be incorporated into thiserror term.) In matrix-vector notation, (2) becomes

f C d f rl l k k l k= +( ), , (3)

where C d( ),l k is the PMPN PMPN× matrix that mapsframe f k to frame f l , d l k, is the column vector defined bylexicographically ordering the values of the displacementsbetween the two frames, and r l k, is the registration noise.Note that while this displacement model is prevalent in theliterature, a more general motion model could be employed.

Having considered the relationship between LR andHR images prior to compression, let us turn our atten-tion to the compression process. During compression,frames are divided into blocks that are encoded with oneof two available methods. For the first method, a lineartransform such as the discrete cosine transform (DCT) isapplied to the block. The operator decorrelates the inten-sity data, and the resulting transform coefficients are in-dependently quantized and transmitted to the decoder.For the second method, predictions for the blocks are firstgenerated by motion compensating previously transmit-ted image frames. The compensation is controlled by mo-tion vectors that define the spatial and temporal offsetbetween the current block and its prediction. Computingthe prediction error, transforming it with a linear trans-form, quantizing the transform coefficients, and trans-mitting the quantized information refine the prediction.

Elements of the video compression procedure areshown in Figure 3. In the figure, we show an original im-age frame and its transform and quantized representationin (a) and (b), respectively. This represents the first type ofcompression method, called intra-coding, which also en-compasses still image coding methods such as the JPEGprocedures. The second form of compression is illus-trated in Figure 4, where the original and previously com-pressed image frames appear in (a) and (b), respectively.This is often called inter-coding, where the original frameis first predicted from the previously compressed data.Motion vectors for the prediction are shown in Figure4(c), and the motion compensated estimate appears inFigure 4(d). The difference between the estimate and theoriginal frame results in the displaced frame difference (orerror residual), which is transformed with the DCT andquantized by the encoder. The quantized displaced framedifference for Figure 4(a) and (d) appears in part (e). At

the encoder, the motion compensated estimate andquantized displaced frame difference are combined to cre-ate the decoded frame appearing in Figure 4(b).

From the above discussion, we define the relationshipbetween the acquired LR frame and its compressed ob-servation as

( )[ ] ( )y T T g y v y vl l l lP

l l lP

lQ MC MC

l , ,

= − +

=

−1

1 2 3

( , ) ,

K (4)

where Q[.] represents the quantization procedure; T andT −1 are the forward and inverse-transform operations, re-spectively; MCl l

Pl( , )y v is the motion compensated pre-

diction of g l formed by motion compensating previouslydecoded frame(s) as defined by the encoding method;and y l

P and v l denote the set of decoded frames and mo-tion vectors that predict y l , respectively. We want to makeclear here that MCl depends on v l and only a subset of y.

MAY 2003 IEEE SIGNAL PROCESSING MAGAZINE 39

(a)

(b)

� 3. Intra-coding example. The image in (a) is transformed andquantized to produce the compressed frame in (b).

For example, when a bit stream contains a sequence ofP-frames then y y

l

Pl= −1 and v vl l l= −, 1 . However, as there

is a trend towards increased complexity and noncausalpredictions within the motion compensation procedure,we keep the above notation for generality.

With a definition for the compression system, we cannow be precise in describing the relationship between theHR frames and the LR observations. Combining (1),(3), and (4), the acquisition system depicted in Figure 1 isdenoted as

y AHC d f el l k k l k= +( ), , (5)

where e l k, includes the errors introduced during com-pression, registration, and acquisition.

Problem FormulationWith the acquisition system defined, we now formulatethe SR reconstruction problem. The reviewed methodsutilize a window of LR compressed observations to es-timate a single HR frame. Thus, the goal is to estimatethe HR frame f k and displacements d given the de-

coded intensities y and motion vectors v. Here, dis-placements between f k and all of the frames within theprocessing window are encapsulated in d, as d d={ ,l k| , , }l k TB k TF= − +K andTF TB+ +1establishes the num-ber of frames in the window. Similarly, all of the decodedobservations and motion vectors within the processingwindow are conta ined in y and v, as y y={ l| , , }l k TB k TF= − +K and v v= = −{ | , ,l l k TB K k TF+ }.

We pursue the estimate within the Bayesian paradigm.Therefore, the goal is to find $f k , an estimate of f k , and $d,an estimate of d, such that

$ , $ max, ( , ) ( , | , )f d f d y v f df dk k k

k

P P=arg .(6)

In this expression, P k( , | , )y v f d provides a mechanism toincorporate the compressed bit stream into the enhance-ment procedure, as it describes the probabilistic modelingof the process to obtain y and v from f k and d. Similarly,P k( , )f d allows for the integration of prior knowledgeabout the original HR scene and displacements. This issomewhat simplified in applications where the displace-ment values are assumed known, as the SR estimate be-comes

( )$ max ( ) , | ,f f y v f dfk k k

k

P P=arg(7)

where d contains the previously found displacements.

Modeling the ObservationHaving presented the SR problem within the Bayesianframework, we now consider the probability distribu-tions in (6). We begin with the distribution P k( , | , )y v f dthat models the relationship between the original HR in-tensities and displacements and the decoded intensitiesand motion vectors. For the purposes of this review, it isrewritten as

P P Pkl

l k l k( , | ) ( | , ) ( | , , )y v f ,d y f d v f d y=∏ (8)

where P l k( | , )y f d is the distribution of the noise intro-duced by quantizing the transform coefficients andP l k( | , , )v f d y expresses any information derived from themotion vectors. Note that (8) assumes independence be-tween the decoded intensities and motion vectorsthroughout the image sequence. This is well motivatedwhen the encoder selects the motion vectors andquantization intervals without regard to the future bitstream. Any dependence between these two quantitiesshould be considered as future work.

Quantization NoiseTo understand the structure of compression errors, weneed to model the degradation process Q in (4). This is anonlinear operation that discards data in the transformdomain, and it is typically realized by dividing each trans-


(a) (b)

18

1614

12

10

864

22 4 6 8 10 12 14 16 18 20 22

(c) (d)

(e)

� 4. Inter-coding example. The image in (a) is inter-coded togenerate the compressed frame in (b). The process begins byfinding the motion vectors in (c), which generates the motioncompensated prediction in (d). The difference between theprediction and input image is then computed, and it is trans-formed and quantized. The resulting residual appears in (e)and is added to the prediction in (d) to generate the com-pressed frame in (b).

form coefficient by a quantization scale factor and thenrounding the result. The procedure is expressed as

[ ]( ) ( )[ ]( )

( )Ty

Tgk

ki q ii

q i=

Round

(9)

where [ ]( )Ty k i denotes the ith transform coefficient ofthe compressed frame y k , q i( ) is the quantization factorfor coefficient i, and Round(.) is an operator that mapseach value to the nearest integer.

Two prominent models for the quantization noise ap-pear in the SR literature. The first follows from the factthat quantization errors are bounded by the quantizationscale factor, that is

− ≤ − ≤q i

i iq i

k k

( )[ ]( ) [ ]( )

( )2 2

Ty Tg(10)

according to (9). Thus, it seems reasonable that the recov-ered HR image (when mapped to LR) has transform co-efficients within the same interval. This is often called thequantization constraint, and it is enforced with the distri-bution

P

q i

MC

l k

l k k

l

1

2

( | , )

( ( ) / ) [ ( ( )

(

,

y s d

T AHC d f

y

=

− ≤

−const

0

if

lP

l iq i

i, ))]( )( )

.

v ≤ ∀

2elsewhere

(11)

As we are working within the Bayesian framework, wenote that this states that the quantization errors are uni-formly distributed within the quantization interval. Thus,[ ( )]( )~ [ ( ) / , ( ) / ]T y gk k i U q i q i− − 2 2 and so E k([ (T y −g k i)]( ))=0 and var([ ( )]( )) ( ) /T y gk k i q i− = 2 12.

Several authors employ the quantization constraint forSR processing. For example, it is utilized by Altunbasaket al. [5], [6], Gunturk et al. [7], Patti and Altunbasak [8],and Segall et al. [9], [10]. With the exception of [7],quantization is considered to be the sole source of noisewithin the acquisition system. This simplifies the con-struction of (11). However, since the distributionP l k1 ( | , )y f d in (11) is not differentiable, care must still betaken when finding the HR estimate. This is addressedlater in this article.

The second model for the quantization noise is con-structed in the spatial domain. This is appealing as it mo-tivates a Gaussian distribution that is differentiable. Tounderstand this motivation, consider the following.First, the quantization operator in (9) quantizes eachtransform coefficients independently. Thus, noise in thetransform domain is not correlated between transformindices. Second, the transform operator is linear. Withthese two conditions, quantization noise in the spatialdomain becomes a linear sum of independent noise pro-cesses. The resulting distribution tends to be Gaussian,and it is expressed as [11]

( )P l k l l k k

T

Q l l

2

1

12

( | , ) exp ( )

(

,y f d y AHC d f

K y AHC d

∝ − −

× −− ( ), )k kf

(12)

where K Q is a covariance matrix that describes the noise.The normal approximation for the quantization noise

appears in work by Chen and Schultz [12], Gunturk et al.[13], Mateos et al. [14], [15], Park et al. [16], [17], andSegall et al. [9], [18], [19]. A primary difference betweenthese efforts lies in the definition and estimation of thecovariance matrix. For example, a white noise model is as-sumed by Chen and Schultz and Mateos et al., whileGunturk et al. develop the distribution experimentally.Segall et al. consider a high bit-rate approximation for thequantization noise. Lower rate compression scenarios areaddressed by Park et al. where the covariance matrix andHR frame are estimated simultaneously.

In concluding this subsection, we mention that the spa-tial domain noise model also incorporates errors intro-duced by the sensor and motion models. This isaccomplished by modifying the covariance matrix K Q[20]. Interestingly, since these errors are often independentof the quantization noise, incorporating the additionalnoise components further motivates the Gaussian model.

Motion VectorsIncorporating the quantization noise is a major focus ofmuch of the SR for compressed video literature. How-ever, it is also reasonable to use the motion vectors, v,within the estimates for $f k and d. These motion vectorsintroduce a departure from traditional SR techniques. Intraditional approaches, the observed LR images providethe only source of information about the relationship be-tween HR frames. When compression is introduced,though, motion vectors provide an additional observa-tion for the displacement values. This information differsfrom what is conveyed by the decoded intensities.

There are several methods that exploit the motionvectors during resolution enhancement. At the highlevel, each tries to model some similarity betweentransmitted motion vectors and actual HR displace-ments. For example, Chen and Schultz [12] constrainthe motion vectors to be within a region surroundingthe actual subpixel displacements. This is accomplishedwith the distribution

P

v j j i PSl k

l i D l i

1 ( | , , )

| ( ) [ ]( )| , ,, ,

v f d y

A d

=

− ≤ ∈const if ∆ ∀

j0 elsewhere (13)

where A D is a matrix that maps the displacements to theLR grid, ∆ denotes the maximum difference between thetransmitted motion vectors and estimated displacements,PS represents the set of previously compressed frames em-


ployed to predict f k , and[ ]( ),A dD l i j is the jth element ofthe vector A dD l i, . Similarly, Mateos et al. [15] utilize thedistribution

P l kl

i PSl i D l i2

2

2( | , , ) exp || ||, ,v f d y v A d∝ − −

∈

∑γ

(14)

where γ l specifies the similarity between the transmittedand estimated information.

There are two disadvantages to modeling the motionvectors and HR displacements as similar throughout theframe. As a first problem, the significance of the motionvectors depends on the underlying compression ratio,which typically varies within the frame. As a second prob-lem, the quality of the motion vectors is dictated by theunderlying intensity values. Segall et al. [18], [19] ac-count for these errors by modeling the displaced framedifference within the encoder. This incorporates the mo-tion vectors and is written as

( )P MCl k l lP

l l k k

T

312

( | , , ) exp ( , ) ( ),v f d y y v AHC d f

K MV

∝ − −

( )− −

1 MCl lP

l l k k( , ) ( ),y v AHC d f

(15)

where K MV is the covariance matrix for the prediction er-ror between the original frame and its motion compen-sated estimate MCl l

Pl( , )y v . Estimates for K MV are

derived from the compressed bit stream and therefore re-flex the amount of compression.

Modeling the Original SequenceWe now consider the second distribution in (6), namelyP k( , )f d . This distribution contains a priori knowledgeabout the HR intensities and displacements. In the litera-ture, it is assumed that this information is independent.Thus, we write

P P Pk k( , ) ( ) ( )f d f d= (16)

for the remainder of this survey.Several researchers ignore the construction of the a priori

densities, focusing instead on other facets of the SR prob-lem. For example, portions of the literature solely addressthe modeling of compression noise, e.g., [8], [7], [5]. Thisis equivalent to using the noninformative prior for both theoriginal HR image and displacement data so that

P Pk( ) ( )f d∝ ∝const and const. (17)

In these approaches, the noise model determines the HRestimate, and resolution enhancement becomes a maxi-mum likelihood problem. Since the problem is ill-posedthough, care must be exercised so that the approach doesnot become unstable or noisy.

IntensitiesPrior distributions for the intensity information are mo-tivated by the following two statements. First, it is as-sumed that pixels in the original HR images arecorrelated. This is justified for the majority of acquiredimages, as scenes usually contain a number of smooth re-gions with (relatively) few edge locations. As a secondstatement, it is assumed that the original images are de-void of compression errors. This is also a reasonablestatement. Video coding often results in structured er-rors, such as blocking artifacts, that rarely occur in un-compressed image frames.

To encapsulate the statement that images are correlatedand absent of blocking artifacts, the prior distribution

P k k k( ) exp || || || ||f Q f Q AHf∝ − +

λ λ11

2 22

2

2 2 (18)

is utilized in [9] (and references therein). Here, Q 1 repre-sents a linear high-pass operation that penalizes SR esti-mates that are not smooth, Q 2 represents a linearhigh-pass operator that penalizes estimates with blockboundaries, and λ 1 and λ 2 control the influence of thenorms. A common choice for Q 1 is the discrete two-di-mensional Laplacian; a common choice forQ 2 is the sim-ple difference operation applied at the boundary locations.

Other distributions could also be incorporated into theestimation procedure. For example, Huber’s functioncould replace the quadratic norm. This is discussed in [12].

DisplacementsThe noninformative prior in (17) is the most commondistribution for P( )d in the literature. However, explicitmodels for the displacement values are recently presentedin [18] and [19]. There, the displacement information isassumed to be independent between frames so that

P Pl

l( ) ( )d d=∏ . (19)

Then, the displacements within each frame are assumedto be smooth and absent of coding artifacts. To penalizethese errors, the displacement prior is given by

P l l( ) exp || ||d Q d∝ −

λ 33

2

2 (20)

where Q 3 is a linear high-pass operator and λ 3 is the in-verse of the noise variance of the normal distribution. The


Video compression algorithmspredict frames with the motionvectors and then quantize theprediction error.

discrete two-dimensional Laplacian is typically selectedfor Q 3 .

Realization of the SR MethodsWith the SR for compressed video techniques summa-rized by the previous distributions, we turn our attentionto computing the enhanced frame. Formally this requiresthe solution of (6), where we estimate the HR intensitiesand displacements given some combination of the pro-posed distributions. The joint estimate is found by takinglogarithms of (6) and solving

$ , $ max log ( , ) ( , | , ),

f d f d y v f df dk k k

k

P P=arg(21)

with a combination of gradient descent, nonlinear projec-tion, and full-search methods. Scenarios where d is al-ready known or separately estimated are a special case ofthe resulting procedure.

One way to evaluate (21) is with the cyclic coordinatedescent procedure [21]. With the approach, an estimatefor the displacements is first found by assuming that theHR image is known, so that

( )$ max log ( ) , |$ ,d d y v f dd

qkqP P+ =1 arg

(22)

where q is the iteration index for the joint estimate. (Forthe case where d is known, (22) becomes $d dq q+ = ∀1 .)The intensity information is then estimated by assumingthat the displacement estimates are exact, that is

( )$ max log ( ) , | , $f f y v f df

kq

k kq

k

P P+ +=1 1arg .(23)

The displacement information is reestimated with the re-sult from (23), and the process iterates until convergence.

The remaining question is how to solve (22) and (23)for the distributions presented in the previous sections.This is considered in the following subsections.

Finding the DisplacementsAs we have mentioned before, the noninformative priorin (17) is a common choice for P( )d . We will consider itsuse first in developing algorithms, as the displacement es-timate in (22) is simplified and becomes

( )$ max log , |$ ,d y v f dd

qkqP+ =1 arg .

(24)

This estimation problem is quite attractive, as displace-ment values for different regions of the frame are now in-dependent.

Block-matching algorithms are well suited for solving(24), and the construction of P k( , | , )y v f d controls theperformance of the block-matching procedure. For exam-ple, Mateos et al. [15] combine the spatial domain modelfor the quantization noise with the distribution for

P k( | , , )v f d y in (14). The resulting block-matching costfunction is

( )$ min ( )$, ,,

d y AHC d f K

y AHC

dl kq

l l k kq

T

Q

l

l k

+ −= −

× −

1 1arg

( )( )$ || || ., , ,d f v A dl k kq l

l k D l k+ −

γ2

2

(25)

Similarly, Segall et al. [18], [19] utilize the spatial domainnoise model and (15) for P k( | , , )v f d y . The cost functionthen becomes

( )$ min ( )$, ,,

d y AHC d f K

y AHC

dl kq

l l k kq

T

Q-

l

l k

+ = −

× −

1 1arg

( )( )

( )$

( ) ( )$

(

,

,

d f

y ,v AHC d f

K y

l k kq

l lP

l l k kq

T

l lP

MC

MC

+ −

× −MV

1 ( ), ) ( )$ .,v AHC d fl l k kq−

(26)

Finally, Chen and Schultz [12] substitute the distributionfor P k( | , , )v f d y in (13), which results in theblock-matching cost function

( )$ min ( )$, ,,

d y AHC d f K

y

dl kq

C l l k kq

T

Q-

l

l k

+

∈= −

×

1 1argMV

( )−

AHC d f( )$,l k kq

(27)

where CMV follows from (13) and denotes the set ofdisplacements that satisfy the condition| ( ),v il k −[ ]( )|,A dD l k i i< ∀∆ .

When the quantization constraint in (11) is combinedwith the noninformative P( )d , estimation of the displace-ment information is

( )$ min log |$ , ,,

d v f d yd

q

C km

l k Q

P+

∈=1 arg

(28)

whereC Q denotes the set of displacements that satisfy theconstraint

( )− ≤ −

≤ ∀q i

MC iq i

il k k l lP

l

( )( )$ ( , ) ( )

( ),2 2

T AHC d f y v .

This is a tenuous statement. P k( | , , )v f d y is traditionallydefined at locations with motion vectors; however, motionvectors are not transmitted for every block in the com-pressed video sequence. To overcome this problem, au-thors typically estimate d separately. This is analogous toaltering P k k( | , )y f d when estimating the displacements.


When P( )d is not described by the noninformative dis-tribution, differential methods become common estima-tors for the displacements. These methods are based onthe optical flow equation and are explored in Segall et al.[18], [19]. In these works, the spatial domain quanti-zation noise model in (12) is combined with distributionsfor the motion vectors in (15) and displacements in (20).The estimation problem is then expressed as

( )$ min ( )$, ,,

d y AHC d f K

y AHC

dl kq

l l k kq

T

Q

l

l k

+ −= −

× −

1 1arg

( )( )

( )$

( , ) ( )$

(

,

,

d f

y v AHC d f

K y

l k kq

l lP

l l k kq

T

l lP

MC

MC

+ −

× −MV

1 ( ), ) ( )$

.

,

, ,

v AHC d f

d Q Q d

l l k kq

l kT T

l k

−

+

λ 3 3 3

(29)

Finding the displacements is accomplished by differenti-ating (29) with respect to d l k, and setting the result equalto zero. This leads to a successive approximations algo-rithm [18].

An alternative differential approach is utilized by Parket al. [20], [16]. In these works, the motion between LRframes is estimated with the block-based optical flowmethod suggested by Lucas and Kanade [22]. Displace-ments are estimated for the LR frames in this case.

Finding the IntensitiesMethods for estimating the HR intensities from (23) arelargely determined by the quantization noise model. Forexample, consider the least complicated combination ofthe quantization constraint in (11) with the noninforma-tive distributions for P k( )f and P k( | , , )v f d y . The intensityestimate is then stated as

$f kq

QF+ ∈1

(30)

where FQ denotes the set of intensities that satisfy theconstraint

( )[ ]− ≤ − ≤ ∀+q iMC i

q iil k

qk l l

Pl

( )( ) ( , ) ( )

( ),2 2

1T AHC d f y v .

Note that the solution to this problem is not unique, asthe set-theoretic method only limits the magnitude of thequantization error in the system model. A frame that sat-isfies the constraint is therefore found with the projectiononto convex sets (POCS) algorithm [23], where sourcesfor the projection equations include [8], [9].

A different approach must be followed when incorpo-rating the spatial domain noise model in (12). If we stillassume a noninformative distribution for P k( )f andP k( | , , )v f d y , the estimate for the HR intensities is

( )( )$ min $

$

,f y AHC d f

K y AHC d

skq

l l kq

k

T

l

Q l

k

+ +

−

= −

× −

∑1 1

1

arg

( )( )l kq

k, .+

1 f(31)

This can be found with a gradient descent algorithm [9].Alternative combinations of P k( )f and P k( | , , )v f d y

lead to more involved algorithms. Nonetheless, the fun-damental difference lies in the choice for the quantizationnoise model. For example, combining the distribution forP k( )f in (18) with the noninformative prior forP k( | , , )v f d y results in the estimate

$ min

|| || || $ || log (

f

Q f Q AHf y

fkq

k k

k

P

+ =

+ −

1

11

2 22

2

2 2

arg

λ λ| , ) .f dk

(32)Interestingly, the estimate for f k is not changed by substi-tuting (13) for (14) for the distribution P k( | , , )v f d y , asthese distributions are noninformative to the intensity es-timate. An exception is the distribution in (15) that mod-els the displaced frame difference. When this P k( | , , )v f d yis combined with the model for P k( )f in (18) the HR esti-mate becomes

$ min || || || ||f Q f Q AHffk

mk k

lk

+ = + +

∑1 11

2 22

2

2 212

argλ λ

( ) ( )( )( )

× −

× −

+

−

MC

MC

l lP

l l kq

k

T

l lP

l l kq

y v AHC d f

K y v AHC d

, $

, $

,

,

1

1MV ( )( )+

−

1 f

y f d

k

kPlog ( | , ) .

(33)

In concluding the subsection, we utilize the estimate in(32) to compare the performance of the quantizationnoise models. Two iterative algorithms are employed. Forthe case that the quantization constraint in (11) is uti-lized, the estimate is found with the iteration

({f f Q Q f

H A Q

kq s

Q kq s

fT

kq s

T T

P+ + + += −

+

1 1 11 1 1

1

2 2

, , ,$ $α λ

λ )}Tkq sQ AHf2

1$ ,+ (34)

where $ ,f kq s+ +1 1 and $ ,f k

q s+1 are estimates for $f kq +1 at the

( )s +1 th and sth iterations of the algorithm, respectively,α f controls the convergence and rate of convergence ofthe algorithm, and PQ denotes the projection operatorthat finds the solution to (30). When the spatial domainnoise model in (12) is utilized, we estimate intensitieswith the iteration


( )$ $ $, ,,f f C d H A K

y A

kq s

kq s

f l l kq

TT T

Q

l

+ + + + −= −

× −

∑1 1 1 1 1α

( )( )HC d f

Q Q f

H A Q Q AHf

$ $

$

,,

,

l kq

kq s

Tkq s

T T Tk

+ +

++

+

1 1

1 1 11

2 2 2

λ

λ q s+

1,

(35)

where K Q is the covariance matrix found in [10].The iterations in (34) and (35) are applied to a single

compressed bit stream. The bit stream is generated bysubsampling an original 352 × 288 image sequence by a fac-

tor of two in both the horizontal and vertical directions. Thedecimated frames are then compressed with an MPEG-4encoder operating at 1 Mb/s. This is a high bit-rate simula-tion that maintains differences between temporally adjacentframes. (Lower bit rates generally result in less resolutionenhancement for a given processing window.) The originalHR frame and decoded result appear in Figure 5.

For the simulations, we first incorporate thenoninformative P k( )f so that λ λ1 2 0= = . Displacementinformation is then estimated prior to enhancement. Thisis motivated by the quantization constraint, which com-plicates the displacement estimate, and the method in[10] is utilized. Seven frames are incorporated into the es-timate with TB TF= =3, and we choose α f =0125. .


(a)

(b)

� 6. Resolution enhancement with two estimation techniques:(a) super-resolved image employing the quantization con-straint for P k( | , )y s d , (b) super-resolved image employing thenormal approximation for P k( | , )y s d . The method in (a) is sus-ceptible to artifacts when registration errors occur, which is ev-ident around the calendar numbers and within the upper-rightpart of the frame. PSNR values for (a) and (b) are 28.8 and33.4 dB, respectively.

(a)

(b)

� 5. Acquired image frames: (a) original image and (b) decodedresult after bilinear interpolation. The original image isdown-sampled by a factor of two in both the horizontal andvertical directions and then compressed.

Estimates in Figure 6(a) and (b) show the super-re-solved results from (34) and (35), respectively. As a firstobservation, notice that both estimates lead to a higherresolution image. This is best illustrated by comparingthe estimated frames and the bilinearly interpolated resultin Figure 5(b). (Observe the text and texture regions inthe right-hand part of the frame; both are sharper thanthe interpolated image.) However, close inspection of thetwo HR estimates shows that the image in Figure 6(a) iscorrupted by artifacts near sharp boundaries. This is at-tributable to the quantization constraint noise model,and it is introduced by registration errors as well as the

nonunique solution of the approach. In comparison, thenormal approximation for the quantization noise in (12)is less sensitive to registration errors. This is a function ofthe covariance matrix K Q and the unique fixed point ofthe algorithm. PSNR results for the estimated framessupport the visual assessment. The quantization con-straint and normal approximation models result in aPSNR of 28.8 and 33.4 dB, respectively.

A second set of experiments appears in Figure 7. Inthese simulations, the prior model in (18) is now utilizedfor P k( )f . Parameters are unchanged, except thatλ λ1 2 025= = . and α f =005. . This facilitates comparisonof the incorporated and noninformative priors for P k( )f .Estimates from (34) and (35) appear in Figure 7(a) and(b), respectively. Again, we see evidence of resolution en-hancement in both frames. Moreover, incorporating thequantization constraint no longer results in artifacts. Thisis a direct benefit of the prior model P k( )f that regularizesthe solution. For the normal approximation method, wesee that the estimated frame is now smoother. This is aweakness of the method, as it is sensitive to parameter se-lection. PSNR values for the two sequences are 32.4 and30.5 dB, respectively.

As a final simulation, we employ (15) for P k( | , , )v f d y .This incorporates a model for the motion vectors, and itrequires an expansion of the algorithm in (32) as well as adefinition for K MV . We utilize the methods in [18]. Pa-rameters are kept the same as the previous experiments,except that λ 1 001= . , λ 2 002= . ,α f =0125. . The estimatedframe appears in Figure 8. Resolution improvement is ev-ident throughout the image, and it leads to the largestPSNR value for all simulated algorithms. The PSNRvalue is 33.7 dB.

Directions of Future ResearchIn concluding this article, we want to identify several re-search areas that will benefit the field of SR from com-pressed video. As a first area, we believe that thesimultaneous estimate of multiple HR frames should leadto improved solutions. These enlarged estimates incorpo-rate additional spatio-temporal descriptions for the se-quence and provide increased flexibility in modeling thescene. For example, the temporal evolution of the dis-placements can be modeled. Note that there is some re-lated work in the field of compressed video processing;see, for example, Choi et al. [24].

Accurate estimates of the HR displacements are criti-cal for the SR problem, and methods that improve theseestimates are a second area of research. Optical flow tech-niques seem suitable for the general problem of resolu-tion enhancement. However, there is work to be done indesigning methods for the blurred, subsampled, aliased,and blocky observations provided by a decoder. Towardthis goal, alternative probability distributions within theestimation procedures are of interest. This is related to thework by Simoncelli et al. [25] and Nestares and Navarro


(a)

(b)

� 7. Resolution enhancement with two estimation techniques:(a) super-resolved image employing the quantization con-straint for P k( | , )y s d , (b) super-resolved image employing thenormal approximation for P k( | , )y s d . The distribution in (18) isutilized for resolution enhancement. This regularizes themethod in (a). However, the technique in (b) is sensitive to pa-rameter selection and becomes overly smooth. PSNR valuesfor (a) and (b) are 32.4 and 30.5 dB.

[26]. Also, coarse-to-fine estimation methods have thepotential for further improvement; see, for instanceLuettgen et al. [27]. Finally, we mention the use of banksof multidirectional/multiscale representations for esti-mating the necessary displacements. A review of thesemethods appears in Chamorro-Martínez [28].

Prior models for the HR intensities and displace-ments will also benefit from future work. For example,the use of piece-wise smooth model for the estimateswill improve the HR problem. This is realized with lineprocesses [29] or an object based approach. For exam-ple, Irani and Peleg [30], Irani et al. [31], and Weiss andAdelson [32] present the idea of segmenting frames intoobjects and then reconstructing each object individually.This could benefit the resolution enhancement of com-pressed video. It also leads to an interesting estimationscenario, as compression standards such as MPEG-4provide information about the boundary informationwithin the bit stream.

Finally, resolution enhancement is often a precursorto some form of image analysis or feature recognitiontask. Recently, there is a trend to address these prob-lems directly. The idea is to learn priors from the imagesand then apply them to the HR problem. The recogni-tion of faces is a current focus, and relevant work isfound in Baker and Kanade [33], Freeman et al.[34],[35], and Capel and Zisserman [36]. With the increas-ing use of digital video technology, it seems only natu-ral that these investigations consider the processing ofcompressed video.

AcknowledgmentsThe work of Segall and Katsaggelos was supported inpart by the Motorola Center for Communications,Northwestern University, while the work of Molina wassupported by the “Comision Nacional de Ciencia yTecnologia” under contract TIC2000-1275.

C. Andrew Segall received the B.S. and M.S. degreesfrom Oklahoma State University in 1995 and 1997, re-spectively, and the Ph.D. degree from NorthwesternUniversity in 2002, all in electrical engineering. He iscurrently a post-doctoral researcher at NorthwesternUniversity, Evanston, Illinois. His research interests arein image processing and include recovery problems forcompressed video, scale space theory, and nonlinear fil-tering. He is a member of the IEEE, Phi Kappa Phi, EtaKappa Nu, and SPIE.

Rafael Molina received the degree in mathematics (statis-tics) in 1979 and the Ph.D. degree in optimal design inlinear models in 1983. He became professor of computerscience and artificial intelligence at the University ofGranada, Granada, Spain, in 2000. His areas of researchinterest are image restoration (applications to astronomyand medicine), parameter estimation, image and video

compression, HR image reconstruction, and blinddeconvolution. He is a member of SPIE, Royal StatisticalSociety, and the Asociación Española de Reconocimentode Formas y Análisis de Imágenes (AERFAI).

Aggelos K. Katsaggelos received the Diploma degree inelectrical and mechanical engineering from the Aristote-lian University of Thessaloniki, Greece, in 1979 and theM.S. and Ph.D. degrees in electrical engineering from theGeorgia Institute of Technology, in 1981 and 1985, re-spectively. In 1985 he joined the Department of Electricaland Computer Engineering at Northwestern University,where he is currently professor, holding the AmeritechChair of Information Technology. He is also the directorof the Motorola Center for Communications. He is a Fel-low of the IEEE an Ameritech Fellow; a member of theAssociate Staff, Department of Medicine, at EvanstonHospital; and a member of SPIE. He has been very activewithin the IEEE, serving on the Publication Board of theProceedings of the IEEE and IEEE Signal Processing Soci-ety, the IEEE Technical Committees on Visual SignalProcessing and Communications, IEEE Technical Com-mittee on Image and Multi-Dimensional Signal Process-ing, and the Board of Governors of the IEEE SignalProcessing Society, to name a few. He was editor-in-chiefof IEEE Signal Processing Magazine and associate editorfor IEEE Transcations on Signal Processing. He is the editor


� 8. Example HR estimate when information from the motionvectors is incorporated. This provides further gains in resolu-tion improvement and the largest quantitative measure in thesimulations. The PSNR for the frame is 33.7 dB.

The use of compressed low-resolution video dataresults in a challengingsuper-resolution problem.

or coeditor of several books, including Digital Image Res-toration (Springer-Verlag, 1991). He holds eight interna-tional patents and received of the IEEE ThirdMillennium Medal, the IEEE Signal Processing SocietyMeritorious Service Award, and an IEEE Signal Process-ing Society Best Paper Award.

Reference[1] S. Chaudhuri, Ed., Super-Resolution Imaging. Norwell, MA: Kluwer, 2001.

[2] S. Borman and R. Stevenson, “Spatial resolution enhancement of low-reso-lution image sequences. A comprehensive review with directions for futureresearch,” Lab. for Image and Sugnal Analysis, Univ. Notre Dame, Tech.Rep., 1998.

[3] A.N. Netravali and B.G. Haskell, Digital Picture—Representation and Com-pression. New York: Plenum, 1995.

[4] V. Bhaskaran and K. Konstantinides, Image and Video Compression Stan-dards. Norwell, MA: Kluwer, 1995.

[5] Y. Altunbasak, A.J. Patti, and R.M. Mersereau, “Super-resolution still andvideo reconstruction from mpeg-coded video,” IEEE Transa. Circuits Syst.Video Technol., vol. 12, pp. 217-226, Apr. 2002.

[6] Y. Altunbasak and A.J Patti, “A maximum a posteriori estimator for highresolution video reconstruction from mpeg video,” in Proc. IEEE Int. Conf.Image Processing, 2000, vol. 2, pp. 649-652.

[7] B.K. Gunturk, Y. Antunbasak, and R. Mersereau, “Bayesian resolu-tion-enhancement framework for transform-coded video,” in Proc. IEEEInt. Conf. Image Processing, 2001, vol. 2, pp. 41-44.

[8] A.J. Patti and Y. Altunbasak, “Super-resolution image estimation for trans-form coded video with application to MPEG,” in Proc. IEEE Int. Conf. Im-age Processing, 1999, vol. 3, pp. 179-183.

[9] C.A. Segall, A.K. Katsaggelos, R. Molina, and J. Mateos, “Super-resolutionfrom compressed video,” in Super-Resolution Imaging, S. Chaudhuri, Ed.Norwell, MA: Kluwer, 2001, pp. 211-242.

[10] C.A. Segall, R. Molina, A.K. Katsaggelos, and J. Mateos, “Bayesianhigh-resolution reconstruction of low-resolution compressed video,” inProc. IEEE Int. Conf. Image Processing, 2001, vol. 2, pp. 25-28.

[11] M.A. Robertson and R.L. Stevenson, “DCT quantization noise in com-pressed images,” in Proc. IEEE Int. Conf. Image Processing, 2001, vol. 1, pp.185-188.

[12] D. Chen and R.R. Schultz, “Extraction of high-resolution video stillsfrom mpeg image sequences,” in Proc. IEEE Int. Conf. Image Processing,1998, vol. 2, pp. 465-469.

[13] B.K. Gunturk, Y. Altunbasak, and R.M. Mersereau, “Multiframe resolu-tion-enhancement methods for compressed video,” IEEE Signal ProcessingLett., vol. 9, pp. 170-174, June 2002.

[14] J. Mateos, A.K. Katsaggelos, and R. Molina, “Resolution enhancement ofcompressed low resolution video,” in Proc. IEEE Int. Conf. Acoustics, Speech,and Signal Processing, 2000, vol. 4, pp. 1919-1922.

[15] J. Mateos, A.K. Katsaggelos, and R. Molina, “Simultaneous motion esti-mation and resolution enhancement of compressed low resolution video,”in Proc. IEEE Int. Conf. Image Processing, 2000, vol. 2, pp. 653-656.

[16] S.C. Park, M.G. Kang, C.A. Segall, and A.K. Katsaggelos, “Spatiallyadaptive high-resoltuion image reconstruction of low-resolution dct-basedcompressed images,” in Proc. IEEE Int. Conf. Image Processing, 2002, vol. 2,pp. 861-864.

[17] S.C. Park, M.G. Kang, C.A. Segall, and A.K. Katsaggelos, “Spatiallyadaptive high-resoltuion image reconstruction of low-resolution dct-based

compressed images,” IEEE Trans. Image Processing, submitted forpublication.

[18] C.A. Segall, R. Molina, A.K. Katsaggelos, and J. Mateos, “Reconstructionof high-resolution image frames from a sequence of low-resolution andcompressed observations,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Sig-nal Processing,, 2002, vol. 2, pp. 1701-1704.

[19] C. A. Segall, A. K. Katsaggelos, R. Molina, and J. Mateos, “Bayesian reso-lution enhancement of compressed video,” IEEE Trans. Image Processing,submitted for publication.

[20] S.C. Park, M.G. Kang, C.A. Segall, and A.K. Katsaggelos,“High-resolution image reconstruction of low-resolution dct-based com-pressed images,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Pro-cessing, 2002, vol. 2, pp. 1665-1668.

[21] D.G. Luenberger, Linear and Nonlinear Programming. Reading, MA: Ad-dison-Wesley, 1984.

[22] B.D. Lucas and T. Kanade, “An iterative image registration techniquewith an application to stereo vision,” in Proc. Imaging Understanding Work-shop, 1981, pp. 121-130.

[23] D.C. Youla and H. Webb, “Image restoration by the method od convexprojections: Part 1—Theory,” IEEE Trans. Med. Imag., vol. MI-1, no. 2,pp. 81-94, 1982.

[24] M.C. Choi, Y. Yang, and N.P. Galatsanos, “Multichannel regularized re-covery of compressed video sequences,” IEEE Trans. Circuits Syst. II, vol.48, pp. 376-387, 2001.

[25] E.P. Simoncelli, E.H. Adelson, and D.J. Heeger, “Probability distribu-tions of optical flow,” in Proc. IEEE Computer Soc. Conf. Computer Visionand Pattern Recognition, 1991, pp. 310-315.

[26] O. Nestares and R. Navarro, “Probabilistic estimation of optical flow inmultiple band-pass directional channels,” Image and Vision Computing, vol.19, pp. 339-351, 2001.

[27] M.R. Luettgen, W. Clem Karl, and A.S. Willsky, “Efficient multiscale reg-ularization with applications to the computation of optical flow,” IEEETrans. Image Processing, vol. 3, pp. 41-64, 1994.

[28] J. Chamorro-Martínez, “Desarrollo de modelos computacionales derepresentación de secuencias de imágenes y su aplicación a la estimación demovimiento” (in Spanish), Ph.D. dissertation, Univ. of Granada, 2001.

[29] J. Konrad and E. Dubois, “Bayesian estimation of motion vector fields,”IEEE Trans. Pattern Anal. Machine Intell., vol. 14, no. 9, pp. 910-927,1992.

[30] M. Irani and S. Peleg, “Motion analysis for image enhancement: Resolu-tion, occlusion, and transparency,” J. Visual Commun. Image Process., vol. 4,pp. 324-335, 1993.

[31] M. Irani, B. Rousso, and S. Peleg, “Computing occluding and transparentmotions,” Int. J. Comput. Vision, vol. 12, no. 1, pp. 5-16, Jan. 94.

[32] Y. Weiss and E.H. Adelson, “A unified mixture framework for motionsegmentation: incorporating spatial coherence and estimating the numberof models,” in Proc. IEEE Computer Soc. Conf. Computer Vision and PatternRecognition, 1996, pp. 321-326.

[33] S. Baker and T. Kanade, “Limits on super-resolution and how to breakthem,” IEEE Trans. Pattern Anal. Machine Intell., vol. 24, pp. 1167-1183,2002.

[34] W.T. Freeman, E.C. Pasztor, and O.T. Carmichael, “Learning low-levelvision,” Int. J. Computer Vision, vol. 40, pp. 24-57, 2000.

[35] W.T. Freeman, T.R. Jones, and E.C. Pasztor, “Example-basedsuper-resolution,” IEEE Comput. Graph. Appli., pp. 56-65, 2002.

[36] D. Capel and A. Zisserman, “Super-resolution from multiple views usinglearnt image models,” in Proc. IEEE Computer Soc. Conf. Computer Visionand Pattern Recognition, 2001, vol. 2, pp. 627-634.


High-Resolution Images from Low-Resolution Compressed Videohera.ugr.es/doi/14977801.pdf ·...

Documents

Transcript of High-Resolution Images from Low-Resolution Compressed Videohera.ugr.es/doi/14977801.pdf ·...