Implementation of a Frame Rate Up-Conversion Filter · CodeMill wished to have the lter implemented...

Implementation of a FrameRate Up-Conversion Filter

Fredrik Vestermark

May 24, 2013Master’s Thesis in Computing Science, 30 credits

Supervisor at CS-UmU: Frank DrewesExaminer: Eddie Wadbro

Umea UniversityDepartment of Computing Science

SE-901 87 UMEASWEDEN

Abstract

When providing video for devices that support different frame rates, some kind of frame rateconversion has to be done. Simple methods such as frame repetition, frame averaging andframe dropping are commonly used to change the frame rate, but these methods introducejerky motions or blurring artifacts. To avoid these artifacts, many algorithms have beendeveloped to approximate new frames based on estimated movements between successiveframes (motion compensation).

This thesis summarizes a selection of previously proposed motion compensation tech-niques for frame rate up-conversion, and presents a prototype implementation that uses asubset of the described methods.

The prototype lacks a number of features present in modern frame rate conversion tools,but objective evaluations show an improved result compared to frame repetition and frameaveraging. Subjective evaluations show that even though the prototype performs well inmany cases, the artifacts are more disturbing than those introduced by simple methods. Theresults are analyzed with a focus on specific unwanted artifacts, their causes and possibleways to minimize them.

Contents

1 Introduction 1

2 Problem Description 3

2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Frame Rate Conversion 5

3.1 Frame Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2 Block Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2.1 Block Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2.2 Estimation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3 Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3.1 Search Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3.2 Full Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3.3 Diamond Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3.4 Hexagon Based Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3.5 Other Hexagon Based Algorithms . . . . . . . . . . . . . . . . . . . . 11

3.4 Motion Vector Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.5 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.5.1 One-step-stop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.5.2 Spatial and Temporal Correlation . . . . . . . . . . . . . . . . . . . . 13

3.5.3 Subsampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.6 Motion Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.6.1 Block Motion Compensation . . . . . . . . . . . . . . . . . . . . . . . 14

3.6.2 Overlapping Block Motion Compensation . . . . . . . . . . . . . . . . 15

3.6.3 Control Grid Interpolation . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Implementation 19

4.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1.1 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

iii

iv CONTENTS

4.1.2 Motion Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Discussion 27

5.1 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6 Acknowledgements 29

References 31

A Diagrams 35

A.1 Box Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

A.2 Line Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

List of Figures

2.1 Frame generated via frame repetition . . . . . . . . . . . . . . . . . . . . . . . 3

3.1 Frame generated via frame averaging . . . . . . . . . . . . . . . . . . . . . . . 5

3.2 Temporal shifting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.3 Forward motion estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.4 Bilateral motion estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.5 Overlapping motion estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.6 Full search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.7 Diamond search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.8 Hexagon based search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.9 Bad spatial MV refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.10 1:2 pixel decimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.11 Alternating 1:4 pixel decimation of distortion method . . . . . . . . . . . . . 14

3.12 Bilateral motion compensation . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.13 Direct motion estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.14 Pixel-based OBMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.15 Control grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1 System structure and communication . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Examples of a quick scene change . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3 Examples of a slow fade between two scenes . . . . . . . . . . . . . . . . . . . 24

4.4 Example of artefacts along the edges . . . . . . . . . . . . . . . . . . . . . . . 25

4.5 Approximations of a frame from the clip Park Joy . . . . . . . . . . . . . . . 25

4.6 Approximations of a frame from the clip Foreman . . . . . . . . . . . . . . . . 25

4.7 Example of a bad approximation . . . . . . . . . . . . . . . . . . . . . . . . . 26

A.1 Box plots of the PSNR for images generated from the test videos. . . . . . . . 36

A.2 Line charts of frame rate up-conversion PSNR for test sequences. . . . . . . . 37

v

vi LIST OF FIGURES

List of Tables

2.1 Test videos used when evaluating the quality of frame rate conversions. . . . 4

3.1 Hexagon based search algorithms and SIR claimed by their authors . . . . . . 12

vii

viii LIST OF TABLES

Chapter 1

Introduction

Different video devices support different formats, and sometimes it is necessary to convertvideo to a different format or frame rate. Film, as an example, is produced in 24 framesper second (FPS), but shown on TV in 25 FPS in most of Europe and 30 FPS in most ofAmerica. While the audio and frame rate can be sped up from 24 to 25 FPS without beingnoticeable for humans, movements and sounds get unnatural if sped up to 30 FPS, whichmakes conversion necessary. It is impossible to perform a completely flawless frame rateconversion (FRC) in the general case, but a number of algorithms to generate approximatedintermediate frames exist.

This thesis presents a filter for motion compensated frame rate up-conversion (MC-FRUC) that can perform FRUC with higher image quality than frame repetition and frameaveraging.

The topic is based on an external work at the IT consulting-firm CodeMill1, whosepartner Vidispine AB offers an API media asset management platform2. They currentlyuse frame repetition as their FRC method but are interested in offering a module whichgives their customers a better visual experience.

The report is structured as follows:

Chapter 2 describes the problem in further detail and the restrictions that were initiallymade.

Chapter 3 describes a few methods for block motion estimation and compensation.

Chapter 4 includes a description of the system structure and an image quality evaluationbased on statistical comparisons between implemented motion compensation methods.

Chapter 5 includes a discussion about the project and the results, what limitations hadto be made and future work.

Chapter 6 expresses my gratitude to those that in one way or another have helped meduring the writing of this thesis.

1http://www.codemill.se2http://vidispine.com

1

http://www.codemill.se

http://vidispine.com

2 Chapter 1. Introduction

Chapter 2

Problem Description

2.1 Problem Statement

As previously mentioned, Vidispine currently uses frame repetition to implement FRC.Frame repetition is a very fast method, but it introduces jerky motions to moving objectsif the new frame rate is not an integer multiple of the old one since frames are repeateda different number of times (Castagno et al., 1996). For example, every other image isrepeated when the frame rate is increased by a factor of 3/2. Figure 2.1 shows an exampleof frame repetition.

Figure 2.1: Frame generated via frame repetition surrounded by the reference frames. Thegenerated frame is identical to the first one.

The goal of the project was to implement a frame rate conversion filter that would beable to convert video with higher quality than the currently used frame repetition, preferablywith higher quality than frame averaging and in real-time. No restriction on video resolutionwas specified in the original problem specification, but real-time 720p/24 (720 pixels highprogressive 24 FPS video) to 720p/30 conversion was agreed upon as an aim since 720pis a common format and conversion from 24 FPS to 30 FPS is a common problem. Anagreement was also made to aim at support for resolution up to 4320p if time permits.

CodeMill wished to have the filter implemented using the FFmpeg library libavfilter1

instead of a native Vidispine filter mainly for two reasons:

– libavfilter support is already planned to be integrated into Vidispine.

1https://ffmpeg.org/libavfilter.html

3

https://ffmpeg.org/libavfilter.html

4 Chapter 2. Problem Description

– If the filter turns out to require too much maintenance, it can be released to the FFm-peg community under LGPL, who can use and help improving the filter. Eventuallyit may fulfill Vidispine’s requirements.

2.2 Methods

As a first step, a frame repetition filter and an edge detection filter were implemented toget some basic understanding of how frames are represented and how libavfilter works.

The plan was to start with literature studies on block motion compensation, differentsearch algorithms and optimizations followed by implementation of a few of the algorithms.In reality these steps were overlapping since practical implementation improved the under-standing of studied algorithms. Additionally, an existing tool for FRUC was briefly studied.Afterwards, the structure of the implemented filter was improved and the code was tidiedup.

To assure that the filter would meet the image quality goal, it was continously testedwith some standard benchmarking videos from Xiph.org2 and trailers345 for videos releasedunder CC BY 3.06. More information on how the evaluation was performed can be foundin Section 4.2.1.

Video Resolution Frame rate SourceForeman 352x288 29.97 Xiph.orgBus 352x288 30 Xiph.orgPark Joy 1280x720 50 Xiph.orgBig Buck Bunny Trailer 1920x1080 25 Official web pageSintel Trailer 1280x544 24 Official web pageTears of Steel 1920x800 24 Official YouTube channel

Table 2.1: Test videos used when evaluating the quality of frame rate conversions.

2.3 Related Work

Motion estimation and motion compensation are well studied areas with thousands of relatedarticles. The use of motion compensation in frame interpolation was first proposed by H.C.Bergmann in 1984, but motion compensation was used in video coding even earlier (Luessiand Katsaggelos, 2009). Recent related work consists of, but is not limited to, algorithmsthat improve estimation speed (Zhu et al., 2002; Tsai and Pan, 2004; Huang and Chang,2011), motion estimation accuracy (Porto et al., 2012; Huang and Chang, 2011) and thevisual quality of interpolations (Choi et al., 2000).

MVTools is a plugin for AviSynth which can perform motion compensated frame rateconversion. Since it is licensed under GPL, it cannot be used in Vidispine but was used forcomparison and some inspiration.

Notably, a thesis similar to this one was written by Jostell and Isberg (2012). Theiraim was to up-convert video from a surveillance camera from 1080p/20 to 1080p/60, and

2http://media.xiph.org/video/derf/3http://www.bigbuckbunny.org/index.php/trailer-page/4http://www.sintel.org/download5http://www.youtube.com/watch?v=WwHux5QZfC86http://creativecommons.org/licenses/by/3.0/

http://media.xiph.org/video/derf/

http://www.bigbuckbunny.org/index.php/trailer-page/

http://www.sintel.org/download

http://www.youtube.com/watch?v=WwHux5QZfC8

http://creativecommons.org/licenses/by/3.0/

2.3. Related Work 5

the biggest difference is that they heavily focused on real-time application whereas the mainfocus of this thesis is on video quality. Jostell and Isberg present a successful implementationof real-time up-conversion with higher video quality than simple up-conversion methods, butit introduces some blocking artifacts.

6 Chapter 2. Problem Description

Chapter 3

Frame Rate Conversion

In this chapter, the low-complexity FRC method frame averaging (FA) is described followedby methods to perform motion estimation (ME) and motion compensation (MC). Examplefigures show the interpolated frame Fi surrounded by the previous reference frame Fp andthe current reference frame Fc. To simplify, Fi is located at a temporal position exactly inthe middle of Fp and Fc if not stated otherwise.

3.1 Frame Averaging

The process of blending frames together by interpolating pixel values is called frame aver-aging. Pixel values in Fi are linearly interpolated from the corresponding pixel values in Fp

and Fc, weighted depending on the relative temporal position. It reduces flickering at thecost of moving objects getting blurred (Wong and Au, 1995). Figure 3.1 shows the result ofa FA.

Figure 3.1: Frame generated via frame averaging surrounded by the reference frames. Blur-ring artifacts are especially visible on the fence and the statue with background.

When performing FA to achieve a frame rate that is not an integer multiple of the originalframe rate, some of the original frames will not be present in the up-converted video due tonon-matching temporal frame positions. Castagno et al. (1996) argue that in such cases theoriginal frame with a slightly shifted temporal position, as shown in Figure 3.2, would usuallygive a better result than straightforward frame averaging. Their subjective experimentalresults verify that the jerkiness introduced by the temporal shifting is not disturbing in a50 FPS to 75 FPS conversion even when viewing a uniform motion.

7

8 Chapter 3. Frame Rate Conversion

(a) Straightforward frame averaging. (b) Frame averaging with temporal shifting.

Figure 3.2: Frame rate up-conversion from 50 FPS to 75 FPS, with temporal shifting toreduce the amount of frames with artifacts. The thick lines represent frames. Black framesare frames in the original video, green frames are losslessly copied from the original videoand red frames are interpolated from two neighbouring frames.

3.2 Block Motion Estimation

Block motion estimation (BME) aims to find motions between two or more images bysystematically comparing a block of pixels in one frame with blocks of the same size in theother frame. This is an important preparatory step of block motion compensation and theestimated motion vectors should approximate true motions, since these estimated motionsare later used to approximate new frames by drawing these blocks in intermediate positions.

Block motion estimation has other application areas such as video compression, butthese estimation methods cannot necessarily be used in frame rate conversion, since videocompression usually does not require the true motions (Zhai et al., 2005).

3.2.1 Block Distortion

Measurement of the distortion between two blocks is one of the keystones in making anaccurate motion estimation. A lower distortion implies a higher similarity. This sectiondiscusses a few block distortion methods, where the intensity of the pixel at coordinate(x, y) in frame n is denoted by fn(x, y). Note that chroma values are ignored in theseexamples, but can be incorporated in implementations. The width and height of a block inthe following formulas are denoted by W and H respectively. All methods are presentedwith both the total difference of all pixel values and the corresponding mean version, butJostell and Isberg (2012) point out that the mean versions of the distortion methods belowmake use of floating point precision and are therefore more expensive than the sum versions.

Absolute Differences

The sum of absolute differences (SAD) is a formula that sums the absolute values of alldifferences in intensity between corresponding pixels in two blocks. It is computationallycheap, and can be calculated with Formula 3.1. Similarly, the mean absolute difference(MAD) can be calculated by dividing SAD with the number of compared pixels (Wong and

3.2. Block Motion Estimation 9

Au, 1995), as shown in Formula 3.2.

SAD(x1, y1, x2, y2) =

W−1∑i=0

H−1∑j=0

|fk(x1 + i, y1 + j)− fk−1(x2 + i, y2 + j)| (3.1)

MAD(x1, y1, x2, y2) =SAD(x1, y1, x2, y2)

W ·H(3.2)

A problem caused by the use of absolute differences in general is that two blocks with onlya few pixels indicating a high difference easily can be considered to have a lower distortionthan two blocks with only slightly higher evenly distributed overall distortion.

Squared Differences

The previously mentioned problem can be solved by using squared differences. The formulafor summed squared differences (SSD), also known as summed squared error, is shownin Formula 3.3. The corresponding mean squared difference (MSD), also known as meansquared error (MSE), is shown in Formula 3.4.

SSD(x1, y1, x2, y2) =

W−1∑i=0

H−1∑j=0

(fk(x1 + i, y1 + j)− fk−1(x2 + i, y2 + j))2 (3.3)

MSD(x1, y1, x2, y2) =SSD(x1, y1, x2, y2)

W ·H(3.4)

According to Xiong and Zhu (2008), MSD is the most widely accepted distortion method,but the less accurate SAD is commonly used because it does not require the use of multipli-cation. They also suggest the multiplication free distortion method weighted SAD (WSAD)which gives more accurate results than SAD. It is however not covered in this thesis.

3.2.2 Estimation Techniques

During motion estimation, blocks in the previous frame Fp and the current frame Fc arecompared to each other. How the comparison is performed depends on the estimationtechnique, but common for the techniques described below are that they generate a field ofmotion vectors which describes movements between Fp and Fc.

Uni- and Bidirectional Motion Estimation

In unidirectional motion estimation as described by Luessi and Katsaggelos (2009, p. 374)either Fp or Fc is first split into a non-overlapping grid of blocks. For each of these blocksthe other reference frame is searched for similar blocks within a range limited by a searchwindow. The best match is used to create a motion vector representing the estimatedmovement. Figure 3.3 shows the estimation of a forward motion vector. Backward motionvectors can be estimated in the same way by simply interchanging the frames Fp and Fc.

Bidirectional motion estimation is performed by estimating both forward- and backwardmotion vectors (Tang and Au, 1997, p. 1444), and can be used to generate more accurateframes.

The strength of uni- and bidirectional motion estimation is that the same motion vectorscan be used to generate any number of frames between the original frames. However itneeds post-processing in the motion compensation step to handle hole areas and overlappingblocks, which is described in detail in Section 3.6.


Figure 3.3: Forward motion estimation of a single block within a limited search window,and what an interpolated frame Fi using that motion vector would look like.

Bilateral Motion Estimation

Bilateral motion estimation is designed to remove the need to handle hole areas and over-lapping blocks. It does this by using the to-be interpolated frame Fi as a starting-point,as opposed to unidirectional ME which uses Fp or Fc. The ME is performed by dividingFi into a non-overlapping grid of blocks and searching for blocks in Fp that matches blocksin Fc around that location, as shown in Figure 3.4. The matching blocks in Fp and Fc arelocated at the relative positions (dx, dy) and (−dx,−dy) respectively. Note that the blocksin Fi does not contain any data and are only used as location reference points.

Figure 3.4: Bilateral motion estimation of a single block within a limited search window,and what an interpolated frame Fi using that motion vector could look like.

The most apparent problem in bilateral estimation is it is hard to capture movements ofsmall fast moving objects, which are instead treated as two separate objects; one existing inFp but not in Fc, and one not existing in Fp but in Fc. In the compensation step, this mayresult in a visual experience similar to frame averaging or that the moving object blinks inand out of existence.

Choi et al. (2000) who suggest this motion estimation technique call it bidirectionalmotion estimation, but more recently published papers seem to prefer the term bilateralmotion estimation. The most obvious reasons for using the latter term is that the originalterm already had a different definition and that the word bilateral better describes thetechnique.

Overlapping Block Motion Estimation

Ha et al. (2004) suggest to improve the accuracy of the motion estimation by using large

3.3. Search Algorithms 11

blocks when computing block distortion, blocks that are larger than the ones in the originalcontiguous non-overlapping grid. An example is shown in Figure 3.5. This is called over-lapped block motion estimation (OBME) since the grid can be seen as a grid of overlappingblocks.

Figure 3.5: Overlapping bilateral ME of a single block and what an interpolated frameFi using that motion vector could look like. The large dashed squares in Fp and Fc arecompared by the distortion method, as opposed to the solid squares in non-overlapping ME.

OBME can be performed in combination with both unidirectional and bilateral ME forimproved accuracy, but at the cost of more expensive computations.

3.3 Search Algorithms

Searching for similarities between blocks in frames is computationally intensive, which hasled to many different algorithms that reduce the number of search points (SPs). A searchpoint is a point in a reference frame for which a block is evaluated against a block in the otherreference frame, using a distortion method. Reducing the number of SPs can significantlyspeed up the motion estimation, but usually results in decreased quality (Tham et al., 1998,p. 369). Furthermore, Porto et al. (2012) claim that the output quality for fast algorithmssuch as the ones covered in this chapter can decrease significantly with increased videodefinition since the high quality makes them vulnerable to getting stuck in local minima.

This section covers the search algorithms diamond search (DS) and hexagon based search(HEXBS) with some variations. Some other commonly used ME methods such as tree-stepsearch, new tree-step search and four-step search were not considered during this thesisdue to time limitations and that results from Tham et al. (1998) and Zhu et al. (2002)indicate that DS and HEXBS generally evaluate substantially fewer SPs than the previouslymentioned ME methods.

For simplicity, algorithms described in this section demonstrate forward ME but theycan be altered and used for backward ME and bilateral ME as well.

3.3.1 Search Speed

The speed of a BME search algorithm depends on the number of SPs that are evaluated bythe algorithm. The speed improvement ratio (SIR), how many times faster an algorithm alg2is compared to an other algorithm alg1, is defined as in Formula 3.5, where SP (algorithm)denotes the average number of search points that have to be evaluated by algorithm. Chi-ang et al. (2007), Zhu et al. (2002) and Thambidurai et al. (2007) among others use this


definition.

SIR(alg1, alg2) =SP(alg1)− SP(alg2)

SP(alg2)(3.5)

Guanfeng et al. (2003) and Tsai and Pan (2004) and others however use a slightlydifferent version which shows the percentual speedup (see Formula 3.6). Formula 3.5 wasused during this thesis since Formula 3.6 was falsely considered to be incorrect.

SIR(alg1, alg2) =SP(alg1)− SP(alg2)

SP(alg1)(3.6)

However, it is important to note that the estimated SIR often depends on the videoresolution and the size of movements in the test material. As long as it is not shown thatthe SIR is constant despite changed video resolution, which seldom seem to be stated inresearch papers, the SIR should be used with scepticism.

3.3.2 Full Search

Full search (FS), also known as exhaustive search, is the unoptimized algorithm which findsthe block Bp within the search window that is most similar to Bc by evaluating every searchpoint within the search window, iterating over the search area by moving one pixel at thetime, which is illustrated in Figure 3.6.

(a) Starting step. (b) Ending step.

Figure 3.6: Full search. Search points evaluated in the current step are white, previouslyevaluated are grey and the best match in the previous step is green.

3.3.3 Diamond Search

Center-biased algorithms such as diamond search (DS) favour small movements over largemovements. They generally evaluate some SPs that represent no or little movement andrepeatedly evaluate SPs contiguous to the best match until no better match is found. Theimplementation suggested by Tham et al. (1998) is performed in three steps, namely starting,searching and ending.

– Starting is performed by evaluating a diamond shape built up by the search point atcenter position [x, y], four points located at [x ± 2, y] and [x, y ± 2] that creates thevertexes in the diamond shape, and four points located at [x ± 1, y ± 1] creating thefaces, as shown in Figure 3.7a. If the best match is found at the center position, thealgorithm jumps directly to the ending step, or else it jumps to search.

3.3. Search Algorithms 13

– Searching is an optimised version of starting, performed by evaluating previously un-evaluated search points in a diamond shape located around the best candidate searchpoint in the previous step. Five new search points have to be evaluated if the bestmatch is located at a vertex, or three search points if it is located at a face, as shownin Figure 3.7b and Figure 3.7c. If the best match is found at the center position, thealgorithm jumps to the ending step, or else it performs a new search.

– Ending is performed by evaluating every search point located at [x±1, y] and [x, y±1]around the best match, as shown in Figure 3.7d. The location of the best match isreturned by the algorithm.

(a) Starting step. (b) Vertex search. (c) Face search. (d) Ending step.

Figure 3.7: Diamond search. Search points evaluated in the current step are white, previ-ously evaluated are grey and the best match in the previous step is green.

3.3.4 Hexagon Based Search

Hexagon based search (HEXBS) as suggested by Zhu et al. (2002) is performed similarly toDS, but with a search shape formed as a hexagon instead of a diamond. Figure 3.8 shows thedifferent search steps, where the starting- and searching steps evaluate new search points at[x, y], [x±1, y±2] and [x±2, y]. The ending step, identical to the DS ending step, evaluates[x± 1, y ± 1] and returns the position of the best match.

(a) Starting step. (b) Searching step. (c) Ending step.

Figure 3.8: Hexagon based search. Search points evaluated in the current step are white,previously evaluated are grey and the best match in the previous step is green.

Experimental results presented by Zhu et al. (2002) show a SIR between 21.41% and36.74% at the cost of increasing the mean absolute difference (MAD) with between 0.68%and 1.68% compared to DS.


3.3.5 Other Hexagon Based Algorithms

A number of other search algorithms that evaluate search points in the shape of a hexagonwere developed after HEXBS of which a few are listed in Table 3.1. Their main character-istics are described below.

Modified hexagon based search uses the search heuristics one-step-stop and subsamplingof the distortion method. These heuristics and/or simpler variants of them are described inSection 3.5. The SIR is claimed to be 708.45 for video sequences with median and largemotions and 3685.74 for video sequences with small motions (converted to the definitionof SIR used in this paper); substantially higher than the other algorithms in Table 3.1.The average increase of MAD is between 0.35% and 7.7% over DS in their test sequences.However, parameters are fine-tuned to suit the test videos and the results can therefore notbe properly compared.

Predict hexagon search takes into account that, according to studies made by Tsai andPan (2004), there is a probability of 65% that the best motion vector being the zero vectorand a probability of 85% that the vector is within a 5 × 5 area around the center point.The probabilities are of course resolution dependent. Additionally, studies show a higherprobability for blocks to move horizontally or vertically rather than diagonally, which is alsoreflected in the search patterns.

Proposed HEXBS uses HEXBS with an adaptive-sized search area based on the well-known fact that spatially adjacent blocks most likely have similar motions and the searchwindow therefore can be shrunken and displaced accordingly (Chiang et al., 2007).

Algorithm Avg. SIR Year SourceMHBS 2197.10 2003 Calc. from Guanfeng et al. (2003, Table IV p. 1208)PHS 71.68 2004 Calc. from Tsai and Pan (2004, p. 611)PHEXBS 57.62 2007 Calc. from Chiang et al. (2007, p. 1153)

Table 3.1: Experimental SIR over HEXBS claimed by the respective authors. It is importantto note that different test videos are used for each paper and the results are therefore notsufficient to make a trustworthy comparison.

3.4 Motion Vector Refinement

Sometimes an estimated motion vector is pointing in a completely different direction thanthe surrounding ones, which typically indicates that the estimation was invalid. Such vectorsare commonly removed by applying a median filter on the MVF (Ha et al., 2004). A negativeside effect of median filters is that edges of moving objects may become blurred (Xu et al.,2011).

Refining vectors based on spatially adjacent vectors that generate a lower distortion thanthe original vector is another possible refining method, inspired by predictive search. Whilegenerally creating a more uniform MV field, there are some cases where at least one of thesurrounding blocks has a large MV that generates a better match, but is far from the truemotion vector. In such cases the bad vector may propagate and the result may look as inFigure 3.9. A combination of median filtering and the described motion vector refinementtend to at least subjectively give a slightly more accurate result.

3.5. Heuristics 15

(a) Original image. (b) Badly reconstructed image.

Figure 3.9: Example of a bad spatial MV refinement based on the best of the spatiallyadjacent vectors, where the background (a cloudy sky) and the foreground (a dragon) aremoving fast in different directions.

3.5 Heuristics

As previously hinted, search algorithms can be sped up significantly by using heuristics.This section presents heuristics that are based on the observation that blocks are generallystationary, have similar movement as neighbouring blocks and that neighbouring pixelsusually generate similar distortion.

3.5.1 One-step-stop

One-step-stop is a speed improvement method performed by defining a block distortionthreshold value under which blocks are presumed to be stationary (Guanfeng et al., 2003).It is based on the observation that many blocks are stationary, and is especially applicablein low resolution video.

3.5.2 Spatial and Temporal Correlation

Zafar et al. (1991) show that there is a high correlation between the motion vectors ofspatially adjacent blocks, and in an accompanying paper (Zhang and Zafar, 1991) thatthere is a high correlation between motion vectors of temporally adjacent blocks as well.This implies that a good MV approximation of a block often can be made from previouslyestimated MVs of spatially and temporally adjacent blocks, and an accurate estimation canbe made with less computation.

Chalidabhongse and Kuo (1997) exploit the MV correlation by computing the blockdistortion using MVs of the adjacent blocks, selecting the one with the lowest distortion,and then performing further refinement based on that motion vector. In addition to speedup,they show that the resulting MV field is less noisy than if motion estimations are performedindependently. Chung and Chang (2003) exploit the spatial correlation in a slightly differentway and perform local full search around all of the spatially adjacent MVs before choosingthe best match.


3.5.3 Subsampling

Subsampling methods use only a subset of the information available to approximate theirresults.

Block Distortion

The number of pixels compared by a block distortion method can be divided in half bysubsampling the pixel block and thereby only estimating every other pixel in for example achess like pattern (Wong and Au, 1995), shown in Figure 3.10.

Figure 3.10: 1:2 pixel decimation of distortion method. Pixels measured for distortion aremarked with a black square.

Liu and Zaccarin (1993) suggest a pattern which reduces the evaluations to one fourth ofthe full estimation. To achieve a result close to the original distortion method, they applyfour patterns using an alternating pattern depending on the location of the search point, asshown in Figure 3.11.

(a) Pattern one. (b) Pattern two. (c) Pattern three. (d) Pattern four. (e) Alternation.

Figure 3.11: Alternating 1:4 pixel decimation of distortion method. Pixels where distortionis evaluated are marked with a black square in (a)-(d), whereas (e) shows which patternthat should be applied at which search point.

Even though these methods could in theory increase the speed significantly, few recentlypublished papers seem to mention them. Fast hardware support for performing SAD onmultiple pixels at the same time (Jostell and Isberg, 2012, p. 40) is a possible cause.

Motion Vector Field

Motion vector field subsampling is a method where a subsample of the motion vector field isfully estimated. The remaining MVs are estimated based on adjacent MVs, taking advantageof spatial correlation. Liu and Zaccarin (1993) suggest to subsample the MV field in a chesspattern and for the previously unevaluated blocks evaluate the motion vector of the foursurrounding blocks and choose the best one.

3.6. Motion Compensation 17

3.6 Motion Compensation

Motion compensation is used to approximate the frame Fi from the estimated MV field andthe reference frames Fp and Fc. This paper covers block motion compensation and motioncompensation by control grid interpolation.

3.6.1 Block Motion Compensation

In block motion compensation (BMC), the MVs are interpreted as the translational motionof blocks between two frames, which corresponds very well to block motion estimation.

When performing BMC based on a bilateral MV field, Fi can be interpolated directlyfrom pixel values of blocks from Fp and Fc at locations calculated from the MVs, as shownin Figure 3.12, without creating any holes or overlapped regions in the interpolated image.

Figure 3.12: Bilateral motion compensation of a single block. The block in frame Fi isinterpolated from the two blocks in Fp and Fc, according to the previously estimated motionvector.

The simplest form of block motion compensation based on uni- and bidirectional MVfields is performed by directly drawing blocks in Fi by pulling the blocks in Fp half of thevector length towards the corresponding block in Fc, or the other way around. In thiscase however, blocks will most certainly overlap and there will be holes in the generatedimage, as shown in Figure 3.13. These holes can be filled by for example estimating motionvectors depending on MVs adjacent to the hole (Choi et al., 2000). Wong and Au (1995)however first convert the old MV field into a bilateral MV field with holes by following thetrajectories and adding the MVs to a list of candidate vectors. The best MV for each blockis then selected before performing the hole filling.

Figure 3.13: Direct motion motion of a single block, surrounded by the reference frames.


3.6.2 Overlapping Block Motion Compensation

The block motion compensation algorithms themselves suffer from blocking artifacts, butas long as the motion vector field is a non-overlapping contiguous grid (bilateral MV fieldshave this feature, but not uni- or bidirectional ones), these artifacts can easily be reducedby using overlapping block motion compensation (OBMC). The original block sizes areincreased, while keeping the blocks centered around the original location. This generatesoverlapping blocks, and pixel values in overlapping regions around the block edges can thenbe interpolated from all surrounding blocks.

Choi et al. (2000) perform OBMC (a version of it that was originally suggested by Kuoand Kuo (1997)) by increasing the block size and dividing the blocks into three types ofregions, namely non-overlapping regions (R1), regions with two overlapping blocks (R2) andregions with four overlapping blocks (R3), as shown in Figure 3.14. The pixel values for R1are generated by using only the original motion vector as in straight-forward BMC, whilethe pixel values in R2 and R3 are based equally on the pixel values of blocks overlappingeach other at that location.

Figure 3.14: Pixel-based OBMC. The grid of solid lines represents the original blocks ofsize N , generated during block motion estimation, while the dashed lines represent areascreated when increasing the block size with w on each side.

3.6.3 Control Grid Interpolation

Sullivan and Baker (1991) suggest to use Control Grid Interpolation (CGI) in video coding,in order to encode Fc in terms of Fp and motion vectors. Instead of interpreting the MVfield as a set of translational motion vectors of individual blocks, they view it as a controlgrid which describes a spatial transformation of Fp into Fc, as shown in Figure 3.15. Thistransformation generates a smooth image without blocking artifacts, and catches zoomingand wrapping fairly well. At the downside, it performs badly at abrupt changes in thedirection (Ishwar and Moulin, 2000). In the example with the rolling ball in Figure 3.15,the transformation of the ball is accurate, but the area closest to the ball is also skewed asa result.

Ishwar and Moulin (2000) introduce Switched CGI, which can be summarized as switch-ing between CGI and (O)BMC depending on the motion between adjacent blocks. CGIis used in smooth parts of the MV field, whereas (O)BMC is used where motion changesabruptly. An approach similar to SCGI was used by Jeon et al. (2003) to implement holefilling in their frame rate up-conversion algorithm.

3.6. Motion Compensation 19

Figure 3.15: Example of control grid representing the motion of a ball rolling to the rightand approaching the camera, and MVs approximated from the original MVs (the controlgrid).

Just as with BMC, the implementation of CGI as MC method is trivial for bilateral MVfields, but generates holes and overlapping pixels if used with uni- and bidirectional MVfields.

Chapter 4

Implementation

As discussed in the introduction, the practical goal of the project underlying this thesiswas to implement a frame rate up-conversion filter which produces intermediate frameswith higher quality than frame repetition and frame averaging. The filter, named xfps dueto its ability to change frame rate, was implemented using the programming language Cand the framework libavfilter. It supports unidirectional, bidirectional and bilateral motionestimation as well as direct compensation and control grid interpolation.

Most of the non-libavfilter related implementation is strongly related to the algorithmsdescribed in Chapter 3 and the algorithms are thereby already described, even though notall previously described features are implemented. Xfps supports both motion compen-sated frame rate up-conversion and frame averaging, but the term xfps refers to motioncompensated up-conversion in the text below, unless explicitly stated otherwise.

4.1 Structure

The system structure can be coarsely divided into the main parts motion estimation andmotion compensation. Figure 4.1 shows a slightly simplified version of the structure andcommunication.

4.1.1 Motion Estimation

The most important inputs to the motion estimation function are the two reference framesand the time stamp at which an intermediate frame will be created during motion com-pensation (only used in bilateral ME). Default values or user specified values also availableduring motion estimation are:

– Motion estimation block size.

– Maximum search distance.

– Block distortion threshold, under which a block should be presumed to be stationaryor have the same motion as neighbouring blocks.

First, a MV field is instantiated (unidirectional, bidirectional or bilateral depending onthe preselected ME method) and a reference is sent together with all of the input param-eters to a motion prediction function, which estimates the MVs in the MV field. When

21

22 Chapter 4. Implementation

StartingMotion

EstimationRef. frames

EndingMotion

CompensationRef. frames+MVF

In: Ref. framesOut: MVF

MedianFiltering

MVF

Refinement

MVF

Generated frame

FrameGeneration

In: MVF+Ref. framesOut: Generated frame

No Prediction Spatial PredictionOvelapping

BMC

In: MVFOut: Generated frame

DirectBMC

In: MVFOut: Generated frame

In: Ref. frames, x, y, MVOut: MV

FullSearch

DiamondSearch

HexagonBasedSearch

Figure 4.1: System structure and communication. Solid output edges denote mandatorysteps, dashed edges denote that one of the choices has to be selected and dotted edgesdenote optional steps.

the function returns, after motion predictive search, the MV field is passed on to motioncompensation.

Predictive Search

Motion vectors in the MV field are estimated from left-to-right, top-to-bottom, in all cur-rently implemented motion prediction methods. The actual prediction is performed byevaluating MVs of previously evaluated spatially neighbouring blocks. The MV which gen-erates the lowest block distortion is then used as a starting point when searching for thebest match, using a search method preselected by the user.

The implementation currently supports a spatially predictive search which selects thebest MV from all previously estimated spatially adjacent blocks in the MV field, a left-spatially predictive search which only uses the MV of the block located to the left, and anon-predictive search.

Search Methods

The following search methods are currently supported by the filter:

– Full search (see Section 3.3.2); straight-forward implementation, implemented mainlyfor comparison.

– Diamond search (see Section 3.3.3); implemented due to its simplicity and slightlybetter approximations than HEXBS.

4.2. Evaluation 23

– Hexagon based search (see Section 3.3.4); implemented since it is fast compared tomany other algorithms, popular and many even faster search algorithms are based onit (as covered in Section 3.3.4 and 3.3.5).

The steps in DS and HEXBS and others all have in common that they take two referenceframes, a location and a motion vector (the best match from the previous step) as input,evaluate block distortion for a set of MVs relative to the best match and then return thenew best match. Due to this, a generic function which takes the parameters previouslymentioned and also a list of relative MVs was written to simplify implementation of newsearch algorithms.

To further simplify the implementation of search algorithms and avoiding repeated com-parisons of the same search point, help structures and functions which caches distortionvalues were implemented. The values are cached in a two-dimensional array of the same sizeas the search window. Distortion values are evaluated or read from the cache with a gettermethod which returns a high distortion for invalid (out-of-window) comparisons.

Together, the two types of help functions make it possible to add new search methodsby only implementing the general case, since the special cases are handled automatically.

4.1.2 Motion Compensation

The implementation supports two motion compensation methods, namely direct compen-sation and control grid interpolation, described in Section 3.6.1 and 3.6.3 respectively. Theinput parameters are the two reference frames, the MV field and an output buffer. Fromthis information, it draws the intermediate frame in the output buffer, and returns.

Control grid interpolation is currently the only practically useful implementation of mo-tion compensation since the direct BMC does not perform hole filling or block artifactreduction. Notably, the implemented CGI method performs hole filling on uni- and bidi-rectional MV fields by converting them into bilateral MV fields before generating the newframe, which makes it suitable for all kinds of MV fields discussed in this report.

The implementation of direct compensation is minimalistic and does not perform anyhole filling or block artifact reduction, which makes it unsuitable for most practical purposes.Nevertheless, it can be of interest when measuring the performance of other compensationmethods.

4.2 Evaluation

Evaluation of video quality was performed both objectively and subjectively with the videoslisted in Section 2.2. The output of xfps was compared to the output of frame repetition,frame averaging and MVTools to verify that the goals were reached and to get an idea abouthow good the image quality is compared to other tools. The evaluation was performed bydropping every other frame, reconstructing them and finally comparing each reconstructedframe to the corresponding dropped frame.

Since the frame rate is halved, motion sizes are doubled between successive frames, and asa result the estimated MVs are less accurate. One could argue that the test videos thereforeshould have the double frame rate of the video which the implementation is intended for, butthe use of high temporal resolution videos does however not seem to be common practice.One possible reason is that most of the commonly used standard test videos are recordedin common frame rates, and scientists are likely to be interested in comparing their resultswith previous results. An other reason may be the fact that larger errors are easier for


the human eye to detect, which simplifies the subjective evaluation process. Nevertheless,the conditions should be as similar as possible when evaluating whether or not a filter issufficient in a particular case.

The frame rate up-conversion was performed with bidirectional motion estimation of8x8 pixel blocks, and the compensation used control grid interpolation or similar. MVToolswas used with diamond search and xfps was used with hexagon based search. It is howevernoteworthy that MVTools has far more features than xfps, and a few of these features wereaccidentally left activated due to ignorance. Since MVTools was only used to indicate howhow well xfps performed compared to other tools, the conversion was not repeated.

For objective comparison of frames, the mathematical model peak signal-to-noise ratio(PSNR) was used due to its common usage in video quality assessment. Importantly, Huynh-Thu and Ghanbari (2008) show that PSNR correlates well with subjective video quality whencomparing codecs operating on a fixed content, but that the correlation between PSNR andsubjective quality is low when considering how well the codecs work on different content.In other words, PSNR can be used to compare the output quality of two different codecswhich operates on the same clip or frame, but not to compare how well a codec performson different clips or frames compared to each other. The ImageMagick implementation ofPSNR1 was used to generate the PSNR values for the diagrams shown in Appendix A.

4.2.1 Analysis

Objective quality comparisons (see the box plots in Figure A.1) show that MVTools gen-erates the best result of the compared filters, followed by xfps, frame averaging and framerepetition. By analyzing the line charts in Figure A.2 and comparing the correspondinggenerated frames, several interesting observations can be made.

Rapid Scene Change

During rapid scene changes, such as in generated frame 114 in the Big Buck Bunny Trailer(frame approximations shown in Figure 4.2), all methods generate a low PSNR, as can beseen in Figure A.2d. Frame repetition generates the lowest PSNR, followed by xfps andframe averaging, which is also used by MVTools. Frame averaging is subjectively the bestmethod in this case (with a perfectly satisfying result as long as only one intermediateframe is generated), followed by frame repetition (since it introduces jerky motions) andxfps (worst since the image is obfuscated).

However, MVTools has a few large negative spikes in Figure A.2b where it uses frameaveraging even though there is no frame change. Nevertheless, if a frame change is notdetected and the result looks similar to Figure 4.2d, then the artifacts of a false positivedetection of a frame change (which results in frame averaging) are considerable smaller thanthose of a false negative (which results in invalid motion compensation).

Fading

Sometimes a scene slowly fades to black, which creates different pixel intensity for blocksthat otherwise would result in a perfect match. This in turn generates bad estimations and aflickering image when using xfps, but MVTools (and frame averaging) handles this problemmuch better.

A case of fading that is harder to account for is when fading between two scenes withmoving objects. In this case the previously described problem with fading is combined with

1http://www.imagemagick.org/script/compare.php

http://www.imagemagick.org/script/compare.php

4.2. Evaluation 25

(a) Original frame. (b) Frame repetition.

(c) Frame averaging. (d) Xfps.

Figure 4.2: Examples of artefacts introduced by a quick scene change, from the Big BuckBunny Trailer.

the fact that there are two semi-correct motion vectors for each block, but none which istruly correct unless objects in the two scenes have the same motion. An example is shownin Figure 4.3. Subjectively, xfps does not generate reasonable results in such cases, whereasMVTools does in most cases. These kinds of artifacts are not necessarily clearly visiblewhen looking at PSNR since the area of the errors is small and most of the image is wellcompensated.

Slow Motions

Frame averaging performs very well in some cases, especially in the trailers (see PSNR inFigure A.2c and A.2d). It turns out that these passages generally contain no or very smallmovements. According to this observation, it would be suitable to perform frame averaginginstead of MC if two frames are very similar, both in the sense of removing the risk for usinginvalid motion vectors and also to reduce the computational power needed.

Frame Edges

One recurring type of artifact is caused by invalid motion estimation along the edges, as canbe seen along the bottom edge in Figure 4.4. The invalid estimations occur when an objector part of an object moves in or out of view between two frames, since the implementedsearch algorithms cannot estimate the movement of an object if it only exists in one of theframes.

These kinds of artifacts may be reduced by using OBME in combination with allowingthe block distortion method to work on blocks which are not entirely in the frame, and couldpossibly be implemented by assuming that distortion of out-of-frame pixels is equal to themean distortion in the rest of the block. However, some pixels in the interpolated framemay then not be located inside any of the reference frames in the case of bilateral ME. An


(a) Original frame. (b) Frame averaging.

(c) Xfps. (d) MVTools.

Figure 4.3: Examples of artefacts introduced by a slow fade from one scene to an other,from the Sintel Trailer.

example would be the upper left and lower right corner of the frame when the camera pansup and left at the same time.

Subjective Evaluation

In the end it has to be based on subjective evaluation whether or not the filter is good enoughfor the intended use. The generated videos were viewed in full speed, and subjectively thequality was usually slightly higher with xfps than with frame repetition and frame averag-ing. However, the perceived improvement seems to decrease with increased spatial videoresolution and the estimation is still very computationally intense, which is also observed byPorto et al. (2012). Figure 4.5 shows a representable example of the quality of a conversionof high spatial resolution video, and Figure 4.6 of the quality if the spatial resolution is low.It is however not the average case which is the main reason to why the filter is considerednot to be good enough for Vidispine, but the bad cases. Figure 4.7 is a good example ofhow bad frame rate up-conversion may be devastating for the user experience.

4.2. Evaluation 27

(a) Original frame. (b) Frame averaging.

(c) Xfps. (d) MVTools.

Figure 4.4: Examples of artefacts introduced when part of an object moves out of the image.Note especially the bottom of the frames generated by Xfps and MVTools.

(a) Original frame. (b) Frame averaging. (c) Xfps.

Figure 4.5: Approximations of a frame in the clip Park Joy. It contains small movements ofsmall objects and large movements of a large object. The frame is subjectively representablefor this video.

(a) Original frame. (b) Frame averaging. (c) Xfps.

Figure 4.6: A motion compensated image containing small movements of small objectsand large movements of a large object. The frame is both subjectively and objectivelyrepresentable for this video.


(a) Motion compensated frame. (b) Original frame.

Figure 4.7: An example of a badly approximated image and what it should have looked like,cropped from a frame in the Sintel trailer.

Chapter 5

Discussion

The objective evaluations made in Section 4.2.1 indicate that the implemented frame rateup-conversion filter generates an improved video quality compared to frame repetition andframe averaging. However, the subjective evaluations show that the result is not goodenough for high-resolution video, especially in the worst cases. MVTools outperforms xfps,but cannot be used in Vidispine due to licensing incompatibilities.

In the problem statement, an aim was set alongside with the goals, namely real-timeconversion from 720p/24 to 720p/30. Unfortunately, little time was spent on optimizing thetime-efficiency of the implementation, and the current implementation generates between1 and 2 frames per second on an Aspire 3750 with Intel Core i3 2330M processor and 4GBDDR3 SDRAM at 1066 MHz. Due to the little time spent on optimizations, no effortswere made to scientifically quantify these results. A faster computer may improve the speedslightly, but is unlikely to come even close to real-time conversion.

In conclusion, a lot of optimizations have to be done before xfps is ready for release,both regarding video quality and speed.

5.1 Limitations and Future Work

The development of xfps is put on hold due to a large amount of limitations. These limita-tions are listed below, sometimes in combination with a possible solution in case of continueddevelopment.

– The implementations of functions for measuring distortion between pixels and thefunction for interpolating pixel values only work with video in the YUVJ family ofpixel formats. Support for more YUV based pixel formats could be added to thefunctions with little effort, but RGB based pixel formats have to be converted to YUVor similar before they can be used successfully common distortion methods.

– The prototype lacks proper error handling, and invalid/incompatible parameters maycause segmentation fault.

– Motion estimation does not work well on high-resolution video or fast movements.One solution may be to use a hierarchical frame structure with the reference framesscaled to different resolutions and perform a coarse-to-fine grained motion estimation(Jeon et al., 2003) or to use iterative random search (Porto et al., 2012). MVTools

29

30 Chapter 5. Discussion

uses a variant of the coarse-to-fine grained estimation, which is likely to be a majorreason why MVTools works so well.

– Motion estimation does not work well if fade occurs between two successive referenceframes. Thaipanich et al. (2009) suggest a method to adjust the frame intensity beforeperforming motion estimation.

– Scene changes cause poor image quality. A fallback to frame averaging on large dis-tortion between two frames may work as a solution, but a false positive scene changedetection may also result in poor image quality. It is shown that more advancedscene change detection based on the difference between successively estimated MVfields (Shu and Chau, 2005) or histogram analysis (Kang et al., 2012) can improve theaccuracy of the detection.

– The comparatively slow functions malloc and calloc are overused in the implementa-tion. Reuse of allocated memory to a larger extent is likely to increase the executionspeed.

– The prototype runs on the CPU in a single thread. Jostell and Isberg (2012) usedOpenMP for parallelizing, OpenCL to enable GPU processing and SSE (PSADBW)to compute SAD on the CPU for up to 16 pixels at the time. These libraries and opencomputer vision (OpenCV) may be of interest when improving the hardware support.

– Block distortion methods are currently unable to compare blocks partly out-of-frame.Section 4.2.1 mentions how this may be implemented, but also a new problems thatwill arise with such an implementation.

– Luessi and Katsaggelos (2009) suggest to reuse MVs from the video bitstream to im-prove the estimation speed. They acknowledge that problems exist in many algorithmswhich reuse the MVs from the bitstream, and compensate for these without a decreasein image quality. Notably, though, they only use low resolution video, and the resultsmay not be valid for high resolution video.

– Pixel-based MV selection is a method that can be used to reduce the skewing artefactsintroduced by Control Grid Interpolation and similar motion compensation techniques(Tran and LeDinh, 2011).

Since MVTools1 performs substantially better than xfps, it would be interesting to havean extra look at its functionality. What seem to be the most interesting parameters to themotion estimation function (MVAnalyse) can be divided into two groups:

– For coherence between vectors in the MV field: lambda, lsad and pnew.

– For improved estimation at luma flicker and fades: dct.

1http://avisynth.org.ru/mvtools/mvtools.html

http://avisynth.org.ru/mvtools/mvtools.html

Chapter 6

Acknowledgements

Many thanks go out to employers and employees at CodeMill, to my supervisors FrankDrewes and Tomas Hardin, and to everybody else who supported me in one way or another.

31

32 Chapter 6. Acknowledgements

References

Castagno, R., Haavisto, P., and Ramponi, G. (1996). A method for motion adaptive framerate up-conversion. IEEE Transactions on Circuits and Systems for Video Technology,6(5):436–446.

Chalidabhongse, J. and Kuo, C.-C. (1997). Fast motion vector estimation usingmultiresolution-spatio-temporal correlations. IEEE Transactions on Circuits and Sys-tems for Video Technology, 7(3):477–488.

Chiang, J., Kuo, W., and Su, L. (2007). Fast motion estimation using hexagon-based searchpattern in predictive search range. In Proceedings of 16th International Conference onComputer Communications and Networks, 2007, pages 1149–1153.

Choi, B., Lee, S., and Ko, S. (2000). New frame rate up-conversion using bi-directionalmotion estimation. IEEE Transactions on Consumer Electronics, 46(3):603–609.

Chung, K.-L. and Chang, L.-C. (2003). A new predictive search area approach for fast blockmotion estimation. IEEE Transactions on Image Processing, 12(6):648–652.

Guanfeng, Z., Guizhong, L., and Rui, S. (2003). A modified hexagon-based search algorithmfor block motion estimation. In Proceedings of the 2003 International Conference onNeural Networks and Signal Processing, volume 2, pages 1205–1208.

Ha, T., Lee, S., and Kim, J. (2004). Motion compensated frame interpolation by newblock-based motion estimation algorithm. IEEE Transactions on Consumer Electronics,50(2):752–759.

Huang, H. and Chang, S. (2011). Block motion estimation based on search pattern andpredictor. In 2011 IEEE Symposium on Computational Intelligence for Multimedia, Signaland Vision Processing, pages 47–51.

Huynh-Thu, Q. and Ghanbari, M. (2008). Scope of validity of psnr in image/video qualityassessment. Electronics letters, 44(13):800–801.

Ishwar, P. and Moulin, P. (2000). On spatial adaptation of motion-field smoothness in videocoding. IEEE Transactions on Circuits and Systems for Video Technology, 10(6):980–989.

Jeon, B., Lee, G., Lee, S., and Park, R. (2003). Coarse-to-fine frame interpolation for framerate up-conversion using pyramid structure. IEEE Transactions on Consumer Electronics,49(3):499–508.

Jostell, J. and Isberg, A. (2012). Frame rate up-conversion of real-time high-definitionremote surveillance video. Master’s thesis, Chalmers University of Technology, Sweden.

33

34 REFERENCES

Kang, S.-J., Cho, S. I., Yoo, S., and Kim, Y. H. (2012). Scene change detection using mul-tiple histograms for motion-compensated frame rate up-conversion. Display Technology,Journal of, 8(3):121–126.

Kuo, T.-Y. and Kuo, C.-C. J. (1997). Complexity reduction for overlapped block motioncompensation (obmc). In Proc. SPIE Visual Communications and Image Processing,volume 3024, pages 303–314.

Liu, B. and Zaccarin, A. (1993). New fast algorithms for the estimation of block motionvectors. IEEE Transactions on Circuits and Systems for Video Technology, 3(2):148–157.

Luessi, M. and Katsaggelos, A. (2009). Efficient motion compensated frame rate upconver-sion using multiple interpolations and median filtering. In 2009 16th IEEE InternationalConference on Image Processing, pages 373–376.

Porto, M., Cristani, C., Dall’Oglio, P., Grellert, M., Mattos, J., Bampi, S., and Agostini,L. (2012). Iterative random search: a new local minima resistant algorithm for motionestimation in high-definition videos. Multimedia Tools and Applications, pages 1–21.

Shu, H. and Chau, L.-P. (2005). A new scene change feature for video transcoding. InCircuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on, pages 4582–4585. IEEE.

Sullivan, G. J. and Baker, R. L. (1991). Motion compensation for video compression usingcontrol grid interpolation. In 1991 International Conference on Acoustics, Speech, andSignal Processing, pages 2713–2716.

Tang, C. and Au, O. (1997). Unidirectional motion compensated temporal interpolation. InProceedings of 1997 IEEE International Symposium on Circuits and Systems., volume 2,pages 1444–1447.

Thaipanich, T., Wu, P., and Kuo, C. (2009). Low complexity algorithm for robust videoframe rate up-conversion (fruc) technique. IEEE Transactions on Consumer Electronics,55(1):220–228.

Tham, J., Ranganath, S., Ranganath, M., and Kassim, A. (1998). A novel unrestrictedcenter-biased diamond search algorithm for block motion estimation. IEEE Transactionson Circuits and Systems for Video Technology, 8(4):369–377.

Thambidurai, P., Ezhilarasan, M., and Ramachandran, D. (2007). Efficient motion estima-tion algorithm for advanced video coding. In International Conference on ComputationalIntelligence and Multimedia Applications, 2007, volume 3, pages 47–52.

Tran, T. and LeDinh, C. (2011). Frame rate converter with pixel-based motion vectorsselection and halo reduction using preliminary interpolation. IEEE Journal of SelectedTopics in Signal Processing, 5(2):252–261.

Tsai, T. and Pan, Y. (2004). A novel predict hexagon search algorithm for fast block motionestimation on h. 264 video coding. In The 2004 IEEE Asia-Pacific Conference on Circuitsand Systems. Proceedings, volume 1, pages 609–612.

Wong, C. and Au, O. (1995). Fast motion compensated temporal interpolation for video.In Visual Communications and Image Processing’95, pages 1108–1118.

REFERENCES 35

Xiong, B. and Zhu, C. (2008). A new multiplication-free block matching criterion. IEEETransactions on Circuits and Systems for Video Technology, 18(10):1441–1446.

Xu, C., Chen, Y., Gao, Z., Ye, Y., and Shan, T. (2011). Frame rate up-conversion withtrue motion estimation and adaptive motion vector refinement. In 2011 4th InternationalCongress on Image and Signal Processing, volume 1, pages 353–356.

Zafar, S., Zhang, Y.-Q., and Baras, J. S. (1991). Predictive block-matching motion estima-tion for tv coding—part i: Inter-block prediction. IEEE Transactions on Broadcasting,37(3):97–101.

Zhai, J., Yu, K., Li, J., and Li, S. (2005). A low complexity motion compensated frameinterpolation method. In IEEE International Symposium on Circuits and Systems, 2005,pages 4927–4930.

Zhang, Y.-Q. and Zafar, S. (1991). Predictive block-matching motion estimation for tvcoding. ii. inter-frame prediction. IEEE Transactions on Broadcasting, 37(3):102–105.

Zhu, C., Lin, X., and Chau, L. (2002). Hexagon-based search pattern for fast block motionestimation. IEEE Transactions on Circuits and Systems for Video Technology, 12(5):349–355.

36 REFERENCES

Appendix A

Diagrams

37

38 Chapter A. Diagrams

A.1 Box Plots

0 10 20 30 40 50 60 70 80 90

100 110

fps xfps:avg xfps:mc mvtools

PS

NR

(a) Sintel trailer PSNR.

14 16 18 20 22 24 26 28 30 32 34 36


PS

NR

(b) Park Joy PSNR.

0

20

40

60

80

100

120


PS

NR

(c) Tears of Steel trailer PSNR.

0 10 20 30 40 50 60 70 80 90


PS

NR

(d) Big Buck Bunny PSNR.

15

20

25

30

35

40


PS

NR

(e) Foreman PSNR.

14 16 18 20 22 24 26 28


PS

NR

(f) Bus PSNR.

Figure A.1: Box plots of the PSNR for images generated from the test videos.

A.2. Line Charts 39

A.2 Line Charts

0 10 20 30 40 50 60 70 80 90

100 110

0 100 200 300 400 500 600 700

PS

NR

Frame

fpsxfps: avgxfps: mcmvtools

(a) Sintel trailer PSNR.

14 16 18 20 22 24 26 28 30 32 34 36

0 50 100 150 200 250

PS

NR

Frame


(b) Park Joy PSNR.

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450 500

PS

NR

Frame


(c) Tears of Steel trailer PSNR.

0 10 20 30 40 50 60 70 80 90

0 50 100 150 200 250 300 350 400 450

PS

NR

Frame


(d) Big Buck Bunny PSNR.

15

20

25

30

35

40

0 20 40 60 80 100 120 140 160

PS

NR

Frame


(e) Foreman PSNR.

14 16 18 20 22 24 26 28

0 10 20 30 40 50 60 70 80

PS

NR

Frame


(f) Bus PSNR.

Figure A.2: Line charts of frame rate up-conversion PSNR for test sequences.

Implementation of a Frame Rate Up-Conversion Filter · CodeMill wished to have the lter implemented...

Documents

Transcript of Implementation of a Frame Rate Up-Conversion Filter · CodeMill wished to have the lter implemented...