Final report

Super-resolution on video with GPU

Chien-hsin Hsueh∗

CSIE, NTNUHsing-Han Ho†

CSIE, NTUKuen-Shiou Tsai‡

CSIE, NTU

Abstract

This work introduces a practical approach for super-resolution, theprocess of reconstructing a high-resolution image from the low-resolution input ones. The emphasis of our work is to super-resolveframes from dynamic video sequences and to improve the efficiencyby GPU. In this work, we have implemented two super-resolutionalgorithms to reconstruct the high-resolution for different motiontypes of frame. As the quality of super-resolved images highlyrelies on the correctness of image alignment between consecutiveframes, we employ the macroblock optical flow method to accu-rately estimate motion between the image pair. An efficient and re-liable scheme for GPU is designed to improve the performance ofour super-resolution algorithm. We also implement a video playerto demonstrate our result. A number of complex and dynamic videosequences are tested to demonstrate the applicability and reliabilityof our algorithm.

Keywords: super-resolution, upsampling, image sequence, gpu,cuda

1 Introduction

The goal of Super-Resolution (SR) methods is to recover a highresolution image from one or more low resolution input images.In the classical multi-image SR a set of low-resolution images ofthe same scene are taken (at subpixel misalignments). Each lowresolution image imposes a set of linear constraints on the unknownhigh resolution intensity values. If enough low-resolution imagesare available (at subpixel shifts), then the set of equations becomesdetermined and can be solved to recover the high-resolution image.

Most of the proposed super-resolution algorithms belong toreconstruction-based algorithms which are based on sampling the-orems. However, due to the constraints on the motion models ofthe input video sequences, it is difficult to apply reconstruction-based algorithms. Most algorithms have either implicitly or ex-plicitly assumed the image pairs are related by a global paramet-ric transformations, which may not be satisfied in dynamic video.It is challenging to design super-resolution algorithm for arbitraryvideo sequences. Video frames in general cannot be related throughglobal parametric transformation due to the arbitrary individualpixel movement between image pairs. Hence local motion models,such as optical flow, need to be used for image alignment.

In this work, we have implemented two super-resolution algo-rithm to reconstruct the high-resolution for different motion typesof frame. The first one is Fast and General Super-Resolution(FGSR), which can deal with the general video with good perfor-mance. As the quality of super-resolved images highly relies onthe correctness of image alignment between consecutive frames,we employ the macroblock optical flow method to accurately esti-mate motion between the image pair. The second algorithm is Fastand Robust Super-Resolution (FRSR) to reconstruct the high res-olution image from a global-motion-based video. We also design

∗e-mail: [email protected]†e-mail:[email protected]‡e-mail:[email protected]

several efficient and reliable scheme for GPU which are designedto improve the performance of our super-resolution algorithm.

This paper is organized as follows. In the next section we givea brief survey of existing work on this topic. In Section 3, we de-scribe our implementation of super-resolution algorithm, and in thefollowing section we discuss the results obtained and compare themwith different methods. Finally, in Section 5, we describe the draw-backs of this method as revealed and summarize our conclusions.

2 Related work

Existing super-resolution algorithms can be roughly divided intotwo main categories. One is reconstruction-based algorithms whilethe other is learning-based algorithms.

Reconstruction-based Super-Resolution The base ofreconstruction-based super-resolution is uniform/non-uniformsampling theories. It assumes the original high-resolution signal(image) can be well predicted from the low-resolution inputsamples (images). Most super-resolution algorithms fall intothis category. In most cases, the enforced smoothness constraintsuppresses high-frequency components and hence the results areusually blurred. Regularization method can be used when thescene is strongly rigid, such as the case of a binary text image.Super-resolution can also be performed simultaneously in time andin space.

Several refinements have been proprosed to address the robust-ness issue of of super-resolution algorithms. One approach handlesthe case of moving object by motion segmentation. An accuratemotion segmentation is hence crucial. Unfortunately, accurate seg-mentation is hard to obtain in the presence of aliasing and noise.Recently, a robust median estimator is used in an iterative super-resolution algorithm.

Learning-based Super-Resolution This kind of algorithms createhigh-frequency image details by using the learned generative modelfrom a set of training images. Several algorithms have been pro-posed for specific types of scene, such as faces and text. However,Learning-based super-resolution algorithm are awkward to handlethe dynamic real-world video sequences.

3 Algorithm

The input of our algorithm includes: 1) multiple low-resolutionvideo frames, (including the target frame and its neighboringframes), 2) the desired magnification factor. The output is a high-resolution image reconstructed at the target frame.

In this work, we implement two super-resolution algorithmsto reconstruct the high-resolution images from several neighbor-ing frames. We refer 2 papers, [Farsiu et al. 2004] and [Jianget al. 2003] , and simplify the original algorithm. Parts of codehave accelerated by cuda to improve the execution time. Both ofthem are reconstruction-based algorithms, one is Fast and GeneralSuper-Resolution (FGSR) while the other is Fast and Robust Super-Resolution (FRSR). In the following context, we will describe theimplementation of each algorithm in detail.

3.1 FGSR

FGSR represents for Fast and General Super-resolution (FGSR),which could generate the super-resolved image from any dynamicvideo sequence. Before we continue to describe the detail, lets firstdefine some notations :

• x denotes a target low-resolution image

• f denotes the desired high-resolution image

• f (n) is the approximation of f obtained after n-th iterations.

• gk denotes the k-th low-resolution image

• sk denotes the result of optical flow from the low-resolutionimage gk to the target image f

β －

+

fn+1fn

gk-1

gk

gk+1

.

.

.

.

.

.

Figure 1: The procedure of FGSR . fn will add the difference be-tween fn and gk , in this way, it could iteratively improve the detailof high-resolved image.

Figure. 1 illustrates the basic procedure of the FGSR algorithm.It starts with an initial estimation f0 by bicubic interpolation for thehigh-resolution image f . After we up-sampled all g0, k, then theoptical flow process ( from the gk to the target frame f ) is carriedout to obtain the simulated high-resolution images s0, k. If the gkis aligned with f , the residual pixels of sk − fn should improvethe detail of fn. We can iteratively project the result of sk − fn torefine the approximation. The β is defined as

β =1

temporaldistance+ 1

it represents for the reciprocal of temporal distance between the skand the target frame x. β is seen as the weight of sk−fn and projectonto the fn+1. With lower β represents for the lower alignment tothe target frame and lower influence to increase the detail of f .

In this work, we accelerate the bicubic interpolation by gpu.By parallelize the interpolation, it can relieve the execution time ofthe algorithm for six to ten times (depends on the image size). InFigure. 2, we can find that the cpu time increases linearly and thegpu time maintains a constant execution time. Parallel processingpotential of this part, which significantly increases the overall speedof execution.

3.2 FRSR

FRSR represents for Fast and Robust Super-resolution(FRSR),which is specific to the global-motion-based video frames. The fol-lowing notation are used with the following meanings in FRSR :

• x denotes a target low-resolution image

• f denotes the estimated high-resolution image

Figure 2: The execution time of cpu and gpu. Depends on the inputimage size, gpu can accelerate the bicubic interpolation for six toten times.

• f (n) is the approximation of f obtained after n-th iterations.

• gk denotes the k-th low-resolution image

• mk denotes the mapping from the low-resolution image gk tothe target image f

g0,k

f

Figure 3: Median of g0, k . each pixel value of f is estimated bythe median of go, k.

Median of the neighboring frames Af the beginning of thealgorithm, we estimated the initail guess of high-resolution imageby medain operator. Figure 3 illustrates the procedure of medianoperator. First, we need to align all neighboring frames to mapthe target frame. As proved by Zhao etal. [Zhao and Sawhney2002] an accurate alignment is the key to success of reconstruction-based super-resolution algorithms. We employ the macro-block op-tical flow algorithm in our work. Second, for each pixel of f0, wechoose the median of all g0,k to be the pixel values. The initial es-timated high-resolution image tends to be blurred, so the next stepwe should deblur it to enhance the detail.

Bilateral Non Iterative Artifact Removal , we add anon-iterative outlier removal step, after data fusion and, beforedeblurring-interpolation step using the bilateral filter. Our refine-ment method essentially calculates the correlation of different mea-surements (pixels from different frames) with each other and re-moves the inconsistent data. The computed correlation is basedon the bilateral idea, so the high-frequency (edge-information) datawill be differentiated from outliers. We assign a weight to eachpixel in the measurements based on its bilateral correlation with

corresponding pixels in the data-fused image. After computingthese weights, pixels with very small weights will be removed fromthe data set. As pixels containing high-frequency information re-ceive higher weights than the ones located in the low-frequencyareas, it is reasonable to compute and compare the penalty weightsfor blocks of pixels rather than for single pixels.

Robust Regularization Super-resolution is an ill-posed prob-lem [Nguyen et al. 2001] [Tekalp 1995]. For the underdeterminedcases, there exist an infinite number of solutions. The solution forsquare and overdetermined cases is not stable, which means smallamount of noise in measurements will result in large perturbationsin the final solution. Therefore, considering regularization in super-resolution algorithm as a means for picking a stable solution is veryuseful, if not necessary. Also, regularization can help the algorithmto remove artifacts from the final answer and improve the rate ofconvergence. Of the many possible regularization terms, we desireone which results in HR images with sharp edges and is easy toimplement.

One of the most widely referenced regularization cost functionsis the Tikhonov cost function [Elad and Feuer 1999]:

γT (X) = ∥ΓX∥22

where Γ is usually a high-pass operator such as derivative, Lapla-cian, or even identity matrix. The intuition behind this regulariza-tion method is to limit the total energy of the image (when Γ isthe identity matrix) or forcing spatial smoothness (for derivative orLaplacian choices of Γ). As the noisy and edge pixels both con-tain high-frequency energy, they will be removed in the regular-ization process and the resulting denoised image will not containsharp edges. Certain types of regularization cost functions workefficiently for some special types of images but are not suitablefor general images (such as maximum entropy regularization whichproduce in sharp reconstructions of point objects, such as star fieldsin astronomical images [Gibson and Bovik 2000]). One of themost successful regularization methods for denoising and deblur-ring is the Total Variation (TV) method [Rudin et al. 1992]. Thetotal variation criterion penalizes the total amount of change in theimage as measured by the L1 norm of the gradient and is definedas:

γTV (X) = ∥ ▽X∥1

where ▽ is the gradient operator. The most useful property of totalvariation criterion is that it tends to preserve edges in the recon-struction [Gibson and Bovik 2000] [Rudin et al. 1992] [Chan et al.2001], as it does not severely penalize steep local gradients. Basedon the spirit of total variation criterion, and bilateral filter, this reg-ularizer called Bilateral-TV, which is computationally cheap to im-plement, and preserves edges. The regularizing function looks like,

γBTV (X) =

P∑l=0

P∑m=0

αm+l∥X − SlxS

my X∥1

where matrices (operators) Slx , and Sm

y shift X by l, and k pixelsin horizontal and vertical directions repectively, presenting severalscales of derivatives. The scalar weight α, 0 < α < 1, is applied togive a spatially decaying effect to the summation of the regulariza-tion term.

4 Result

To verify our algorithm, we tested it with two video clips, namelyText (98 x 114, 30 fps, Figure 3), Bottle(128 x 96, 30 fps, Fig-ure. 5(a) ),. All experiments and timing statistics are carried out

and recorded on Intel CoreI5-750 2.66 GHz CPU and 2 GB mem-ory and nVidia N240 1 GB video memory of GPU.

We first compare our first method, FGSR, with naive Bilinearinterpolation in Figure. 4(c). The textitBook example (Figure 4(a))shows the target frame in the video clip with panning motion. Togenerate the result, four neighboring low-resolution frames plus thetarget frame are used. The displacement between the consecutiveframes is almost 10 pixels in somes cases. The super-resolved im-age is magnified two times, i.e. 98x114 in resolution. Result frombilinear interpolation exhibits blocky and artifact (see Figure. 4(c)) when comparing with our result. Next, we use our second SRmethod, FRSR to deal with the same target frame. In Figure. 4(e),we can find that the result generated by FRSR is slightly better thanthat of FGSR and bicubic interpolation one .

In the Bottle example, we compare this two method with theground truth image. The low-resolution input with frames are sim-ulated by down-sampling the original frames (original resolution:256x192) to 128x96. The ground truth of target frame is shown inFigure. 5(a). We blow up part of the image to highlight the differ-ence between images by bilinear interpolation (Figure. 5(b)), FGSR(Figure. 5(c)), FRSR(Figure. 5(d)). Experimental result (Table. 1)shows that image generated by FGSR algorithm outperforms thatof bilinear interpolation by 0.1927dB in terms of peak signal-to-noise ratio (PSNR). And the result generated by FRSR algorithmalso outperforms that of bilinear interpolation by 0.7007dB and ofFGSR algorithm by 0.508dB.

(a)Target frame

(b) (c)

(d) (e))

Figure 4: The Text example. In this experiment, the low-resolutioninput frames are simulated by down-sampling the original frames(b) (original resolution: 98x114) to 49x57. One target frame of thedown-sampled version is shown in (1). We magnifiy the target frametwo times and compare with the result of bilinear interpolation (c),FGSR (d), and FRSR (e).

We also implement a video player (see Figure. 6), which canplay the video sequence in real-time. User can zoom in/out theframe to change the resolution on the selected area, and switch theupsampling algorithm to compare the result. However, we haven’tintegrated all our super-resolution algorithm into it for the lackof time. So far, you just can switch the mode between nearest-neighbor, bilinear, and bicubic.

(a)Ground truth

(b)Bilinear interpolation

(c)FGSR algorithm result

(d))FRSR algorithm result

Figure 5: Comparison with ground truth (a). In this experiment,we magnify the target frame of Bottle two times in both dimensionsusing bilinear interpolation(b) and the result of FGSR (c) , and theresult of FRSR (c).

Our implementation can treat mild motion blur and spatiallyvarying blur in real-world video clips. Efficiently remove the noisyand aliasing and generate high-resolution image. However, severeblurring needs more efforts.

Table 1: A comparision of PSNR between different super-resolutionalgorithm

Image Bilinear FGSR FRSRBottle 14.4501 dB 14.6428 dB 15.1508 dBText 13.5860 dB 13.2931 dB 13.3693 dB

(a)Nearest-neighbor

(b)Bicubic interpolation

Figure 6: The video player. User can zoom in/out the frame andswitch the upsampling algorithm to compare the result.

5 Conclusion

In this work, we implement two practical super-resolutionalgorithms that is capable of reconstructing high-resolution im-ages from complex and dynamic video sequences, which may con-tain mild motion blur. By integrating the super-resolution algo-rithm with GPU into the iterative reconstruction process, the super-resolved images are generated in a short period of time. Two mildand dynamic video sequences are tested to demonstrate the appli-cability of this two algorithms. The performance of our algorithmdepends on the varying of global parametric transformations.

To further improve the speed performance, it would be pos-sible the find a new algorithm avoid the iterative calculation. Be-cause of the dependency of each iteration, it’s hard to accelerateby gpu. Another direction to reconstruct the high-resolution imageby learning-based algorithm. In learning-based algorithm, it needsto model correlations in image structure over extended neighbor-hoods. The modeling complexity can be reduced remarkably if weconstruct the prior model on images patches instead of full-size im-ages, which can be easily parralelize on gpu.

References

CHAN, T., OSHER, S., AND SHEN, J. 2001. The digital tv filterand nonlinear denoising. Image Processing, IEEE Transactionson 10, 2 (feb), 231 –241.

ELAD, M., AND FEUER, A. 1999. Super-resolution reconstruc-tion of continuous image sequences. In Image Processing, 1999.ICIP 99. Proceedings. 1999 International Conference on, vol. 3,459 –463 vol.3.

FARSIU, S., ROBINSON, M., ELAD, M., AND MILANFAR, P.2004. Fast and robust multiframe super resolution. Image Pro-cessing, IEEE Transactions on 13, 10 (oct.), 1327 –1344.

GIBSON, J. D., AND BOVIK, A., Eds. 2000. Handbook of Imageand Video Processing. Academic Press, Inc., Orlando, FL, USA.

JIANG, Z., WONG, T.-T., AND BAO, H. 2003. Practical super-resolution from dynamic video sequences. In Computer Visionand Pattern Recognition, 2003. Proceedings. 2003 IEEE Com-puter Society Conference on, vol. 2, II–549 – II–554 vol.2.

NGUYEN, N., MILANFAR, P., AND GOLUB, G. 2001. A com-putationally efficient superresolution image reconstruction algo-rithm. Image Processing, IEEE Transactions on 10, 4 (apr), 573–583.

RUDIN, L. I., OSHER, S., AND FATEMI, E. 1992. Nonlinear totalvariation based noise removal algorithms. In Proceedings of theeleventh annual international conference of the Center for Non-linear Studies on Experimental mathematics : computational is-sues in nonlinear science, Elsevier North-Holland, Inc., Amster-dam, The Netherlands, The Netherlands, 259–268.

TEKALP, A. M. 1995. Digital video processing. Prentice-Hall,Inc., Upper Saddle River, NJ, USA.

ZHAO, W., AND SAWHNEY, H. S. 2002. Is super-resolution withoptical flow feasible? In ECCV ’02: Proceedings of the 7th Eu-ropean Conference on Computer Vision-Part I, Springer-Verlag,London, UK, 599–613.

Final report

Documents

Transcript of Final report