[IEEE 2011 IEEE International Conference on Multimedia and Expo (ICME) - Barcelona, Spain...

NEW FRAME RATE UP-CONVERSION BASED ON

FOREGROUND/BACKGROUND SEGMENTATION

Mehmet Mutlu Cekic and Ulug Bayazit

Istanbul Technical University

[email protected], [email protected]

ABSTRACT

Frame rate up-conversion (FRUC) increases the quality of a

video by increasing its temporal frequency. Motion

compensated and non-motion compensated frame rate up

conversion techniques make up the two main classes of

techniques used in this area. Halo artifacts and jaggy edges

cause the quality of video to be reduced both subjectively

and objectively in these techniques. In this paper, we

introduce a new method of motion compensated FRUC that

uses foreground background segmentation to address these

problems. In typical motion compensated FRUC, motion

vectors of existent frames are scaled to define the motion

vectors of interpolated frames. In the proposed method,

those pixels of an interpolated frame with no

unidirectionally assigned motion vectors are assigned

motion vectors by interpolation based on known motion

vectors of their neighbors and their foreground or

background identity. The proposed method significantly

improves the video quality and yields better PSNR results

than the methods of [1] and [2].

Index Terms— — frame rate up-conversion; motion

vectors, segmentation; background subtraction

1. INTRODUCTION

Advances in display technologies of televisions or

monitors take place almost every day. Recently, the

popularity of high definition (HD) TV’s has made the

100Hz-120Hz video display technology a desirable feature.

Due to the limited bandwidth of the transmission network,

video is generally captured and encoded at a low frame rate

before being broadcast, but needs to be converted to a higher

frame rate video at the receiver with the purpose of

displaying fast motion seamlessly. For instance, a video

decoded at 50 frames per second (fps) can be converted to

100 fps or 200 fps video before being displayed. This

technique is called frame rate up conversion (FRUC).

There are two types of FRUC methods used in video

processing. These are non-motion compensated and motion

compensated video frame rate up conversion (MC-FRUC).

Non-motion compensated FRUC doesn’t exploit the motion

of the objects. It is implemented as either frame repetition or

linear interpolation of the frames [3]. Due to the lack of

adequate computational resources, this technique was

popular in the past. On the down side, it causes artifacts in

the video like motion blur or motion judder.

In the last ten years, motion compensated video frame

rate up conversion methods with higher complexity could be

supported due to the developments in the microchip

technology for receiver end processors. When compared to

the non-motion compensated methods, these methods

efficiently reduce the occurrence of the aforementioned

artifacts resulting in a higher interpolated frame quality.

Motion estimation (ME) and motion compensation (MC)

make up the two main stages of motion compensated frame

rate up conversion.

In the ME stage, the motion vectors (MV) are determined

by using typically two successive, reconstructed frames that

are available following video decoding. Generally, the block

matching (BM) method is used for motion estimation where

the minimized error measure between the target and

displaced blocks in the reconstructed frames is typically the

sum of absolute differences (SAD) measure.

Block matching yields acceptable video coding

performance. However, it does not yield vectors

representing true motion which are very important for

constructing high quality interpolated frames. 3-D Recursive

Search Block Matching introduced by De Haan [4]

addresses and solves this deficiency of block matching.

Some other techniques like ME for overlapped motion

vectors [5], motion vector classification [1], correlation of

motion vectors [6] and adaptive true motion estimation [7]

are also used to overcome ME problems in obtaining true

motion vectors. Techniques which improve ME performance

over boundaries of regions or blocks with different motion

such as Variable Size Block Matching [8] and Adaptive

Overlapped Block Matching Compensation [9] have been

reported to also improve the quality of the interpolated

frames.

In the motion compensated interpolation stage, a new

frame between two successive frames is constructed via

estimated MV’s. While this method is being applied some

problems can be encountered like halo artifacts or jaggy

978-1-61284-350-6/11/$26.00 ©2011 IEEE

edges at occlusion areas. Handling occlusion artifacts is not

a trivial problem and many solutions have been proposed to

decrease these artifacts. The most popular of these is based

on using occlusion map as suggested by [10] and [11].

Another approach is constructing color segmentation maps

obtained by a Markovian approach [12]. Ince [13] proposed

a new method called geometry based estimation which

overcomes occlusion artifacts by using geometric

mismatches.

Identifying moving objects is a very popular problem in

computer vision. The most popular solution is background

subtraction. Typically, stationary background pixels are

determined and then subtracted from the whole frame in this

technique. Thereby, pixels of moving foreground objects are

determined. One such method used for background

subtraction is due to Zivkovic [14] where a Gaussian

mixture model (GMM) is employed for estimating the

probability density function. GMM modelling helps to cope

with pixel values of foreground objects having complex

distributions [14]. Another approach is based on color

segmentation or background modeling. This method

proposes using color information obtained from background

subtraction and shadow detection to improve object

segmentation and background update [15].

In this paper, a method is proposed for enhancing the

quality of the interpolated frame by using foreground and

background identities of pixels determined by the

background subtraction method. With this technique, the

motion vectors of the pixels of the interpolated frame with

no motion vectors mapped to them from the existent frames

can be successfully determined. Next section describes the

proposed method and section 3 shows experimental results

providing comparisons against the methods of [1],[2] and

[16] as well as demonstrating the gain with the use of

segmentation. Final section presents concluding remarks.

2. THE PROPOSED METHOD

Let subscript 0 denote the previous frame and subscript 1

denote the current frame of the two successive frames of a

video sequence. The interpolated frame midway between (in

time) these two frames is denoted by subscript ½. The

notation ),(, yxmv ji denotes a motion vector at coordinates

),( yx in frame i that points to frame j. In the current

implementation, the initial motion field between frames 1

and 0 is obtained by using Hexagonal block matching

algorithm [17] at full pixel accuracy.

During the first stage of the proposed method, the

foreground region is determined by the background

subtraction method. Later, when the interpolated frame is

constructed, this information, that indicates the positions of

the moving object pixels, is utilized.

The second stage is the scaling of the motion vectors

from ),(0,1 yxmv to ),(2/1,1 yxmv for each pixel. Since the

motion vector ),(2/1,1 yxmv is obtained by scaling, it does

not have full pixel accuracy. Let x~ and y~ be used to denote

the coordinates of the half pixel accurate point in frame ½

which can be found via the following equations:

),(~,2/1,1 yxmvxx x+=

),(~,2/1,1 yxmvyy y+=

The pixel positions in frame ½ in a half pixel

neighborhood around x~ and y~ are then determined. The

coordinates of such a pixel satisfies,

)5.0~()5.0x~x( ≤−′≤−′ yyAND

The motion vector(s) at coordinates )','( yx frame ½ can now

be expressed by using the motion vectors from frame 1 to

frame 1/2.

),()','( 2/1,11,2/1 yxmvyxmv −=

),()','( 2/1,10,2/1 yxmvyxmv =

Before we can start to reconstruct frame ½ one important

step is needed to perform the interpolation successfully. The

background/foreground identities of pixels in frame ½ need

to be determined. Since ),()','( 2/1,11,2/1 yxmvyxmv −= ,

if pixel at coordinates ),( yx belongs to foreground then

pixel at coordinates )','( yx

belongs to foreground, too.

Otherwise pixel at coordinates )','( yx belongs to

background.

As a consequence of the many-to-one motion vector

mapping from frame 1 to frame ½, multiple motion vectors

may be assigned to a pixel of frame ½ (termed overlapped

pixels), no motion vector may be assigned to a pixel of

frame ½ (termed gap pixels) or a single motion vector may

be assigned to a pixel of frame ½. Each of the three types is

handled differently in the interpolation stage.

For the overlapped pixel type, more than one motion

vector from frame 1 to frame ½ maps to the same location

and the most reliable one needs to be singled out. For a

given )','( yx , the motion vector ),(2/1,1 ii yxmv with the

minimum value of ( ) ( )22 ~'~' ii yyxx −+− is used. If there is

more than one such motion vector, the one for which

),(0,1 ii yxmv has the least block matching error (SAD)

between frames 1 and 0 is used.

))),(((minarg* 0,1 iii yxmvSADi =

2/),(),( **0,11,2/1 ii yxmvyxmv −=′′

Then, )','(1,2/1 yxmv is used to transfer pixel value from

frame 1 to frame ½ by bilinear interpolation of pixel values.

Since half pixel accurate motion vectors are used, bilinear

interpolation in frame 1 is performed with the following

equations.

)y,x(mvxx ,x,/ ′′+′=′′ 121 (1)

)y,x(mvyy ,y,/ ′′+′=′′ 121 (2)

xx ′′=1 , xx ′′=2 (3)

yy ′′=1 , yy ′′=2 (4)

∑∑= =

=2

1

2

112/1 ),(

4

1)','(

j kkj yxfyxf (5)

A gap pixel does not have any motion vector assigned to

it. A motion vector can be determined by linear interpolation

of the assigned motion vectors of nearby pixels. Two types

of regions for gap pixel at coordinates )','( yx can be

identified:

1. Gap pixel at coordinates )','( yx is surrounded on

all sides by foreground (or on all sides by background)

pixels (vertically when going up or down or horizontally

when going left or right in frame ½ all four nearest pixels

with known motion vectors belong to foreground

(background) region as depicted in Figure 1(a)). In this case,

we determine )','(0,2/1 yxmv by the bilinear interpolation

using the motion vectors of the nearest foreground

(background) pixels having the same vertical or horizontal

coordinate as

+′′×+

+′+′×+

+−′′×+

+′−′×+

=′′

),())/((

),())/((

),())/((

),())/((

2

1),(

0,2/1

0,2/1

0,2/1

0,2/1

0,2/1

σβσβ

δαδα

ββσσ

ααδδ

yxmv

yxmv

yxmv

yxmv

yxmv

where α , β , δ , and σ are the distances to the nearest

neighboring pixels that are not gap pixels. Then, we use

)','(0,2/1 yxmv to unidirectionally transfer bilinear

interpolated pixel value from frame 0 to frame ½ similar to

(1)-(5).

2. Gap pixel at coordinates )','( yx is surrounded on

some sides by background and on other sides by foreground

(vertically when going up or down or horizontally when

going left or right in frame ½ some of the four nearest pixels

belong to background and others belong to the foreground).

If the three nearest non-gap pixels have the same

foreground/background identity as in Figure 1(b), we

determine )','(0,2/1 yxmv

by linear interpolation of their

motion vectors. On the other hand, if two nearest non-gap

pixels have foreground identity and the other two have

background identity (which typically occurs when opposite

neighboring pixels have opposite identity as in Figure 1(c)),

a tiebreaking rule is applied as follows:

Fig. 1. a) For a gap pixel for which the identities of the nearest

four non-gap pixels on four sides are the same, the motion vector is

determined by the average of the bilinear interpolations of the

motion vectors of these pixels with the same horizontal and same

vertical coordinate. b) For a gap pixel for which the two nearest

non-gap pixels on two sides are foreground (light shade) and the

other two background (dark shade), the average of the distances

between the motion vectors in frame 0 pointed to from the nearest

non-gap foreground pixels and the motion vector in frame 0

pointed to from the gap pixel by the linear interpolation of the

foreground motion vectors in frame ½ is computed and compared

against a similarly computed average distance for the background

pixels. In this case, the gap pixel motion vector is linearly

interpolated from the motion vectors of the two foreground pixels

since the average distance is smaller for the foreground (since

Bv ,1 disagrees with 0,2 =Bv and 0,3 =Bv shown as dots).

The motion vector of the pixel in frame 0, referenced by the

linear interpolation of the two foreground motion vectors

from the gap pixel position in frame ½, is computed first:

( )( )

′+′×+

+−′′×+=′′

),())/((

),())/((),(

0,2/1

0,2/1

0,2/1yxmv

yxmvyxmv

F

δαδα

ααδδ

( )),(),,( ,0,2/1,0,2/11,0,1 yxmvyyxmvxmvv Fy

FxF ′′+′′′+′= −

The average of the distances between this vector and each of

the two motion vectors of pixels in frame 0 referenced by the

motion vectors of the two foreground pixels in frame ½ is

determined next:

( )),(),,( ,0,2/1,0,2/11,0,2 ααα −′′+−′−′′+′= − yxmvyyxmvxmvv yxF

( )),(),,( ,0,2/1,0,2/11,0,3 yxmvyyxmvxmvv yxF′+′+′′+′++′= − δδδ

( ) 2/,3,1,2,1 FFFFF vvvvd −+−=

A similar average distance computation is then performed

for the background pixels to yield:

αβ

σ

δ

gapα

βσ

δ

gap

αgapgap

(a)

0 frame

2/1 frame

Fv ,1

Fv ,2

Fv ,3

Bv ,1

0,2 =Bv

0,3 =Bv

αβ

σ

δ

gapα

β

δ

gap

(b)

2/1 frame

2/1 frame

α

βσ

δ

(c)

( )( )

′−′×+

++′′×+=′′

),())/((

),())/((),(

0,2/1

0,2/1

0,2/1yxmv

yxmvyxmv

B

βσβσ

σσββ

( )),(),,( ,0,2/1,0,2/11,0,1 yxmvyyxmvxmvv By

BxB ′′+′′′+′= −

( )),(),,( ,0,2/1,0,2/11,0,2 σσσ +′′++′+′′+′= − yxmvyyxmvxmvv yxB

( )),(),,( ,0,2/1,0,2/11,0,3 yxmvyyxmvxmvv yxB ′−′+′′−′+−′= − βββ

( ) 2/,3,1,2,1 BBBBB vvvvd −+−=

If the average distance for the foreground (background)

pixels exceeds the average distance for the background

(foreground) pixels, the background (foreground) pixels’

motion vectors are used to interpolate the motion vector of

the gap pixel.

{ }( )j

FBjdj

,minarg*∈

=

)','()','(*

0,2/10,2/1 yxmvyxmvj=

The above decision rule relies on the assumption of

consistency of motion vectors of spatially adjacent pixels of

the same identity in frames 0 and ½.

Once the motion vector )','(0,2/1 yxmv is determined, it

is used to transfer bilinear interpolated pixel value from

frame 0 to frame ½ similar to (1)-(5).

For a pixel in frame ½ at coordinates )','( yx with one

motion vector assigned to it, both )','(1,2/1 yxmv and

)','(0,2/1 yxmv are used to transfer the average of bilinear

interpolated values from frames 1 and 0. Hence for such a

pixel the motion compensation is bidirectional.

The method described above has been integrated with the

foreground/background segmentation algorithm of [14] that

is appropriate for videos for which camera view is

stationary.

3. EXPERIMENTAL RESULTS

In order to show the effectiveness of the proposed method,

performance comparisons with [1] and [2] on seven un-

compressed video sequences were conducted. The sequences

used in the simulations were Akiyo, Foreman, Carphone,

Garden, Mobile, News, and Stefan with respective

resolutions of 352 x 288, 176 x 144, 176 x 144, 352 x 240,

352 x 288, 352 x 288, and 352 x 240. First 50 interpolated

frames of Carphone, Foreman, Garden, News, and Stefan

and 150 interpolated frames of Akiyo were used to perform

performance comparisons with reported results of [1], [2]

and [16]. For Mobile, first 50 interpolated frames were used

to perform a comparison with [1] and 150 frames were used

to perform a comparison with [2].

Alongside the standard PSNR quality measure, the

structural similarity test (SSIM) has also been used to assess

performance. SSIM compares local patterns of pixel

intensities that have been normalized for luminance and

contrast and is defined in [18] as

))((

)2)((2),(

22

22

112

22

1

22,112121

CC

CCIISSIM

++++

++=

σσµµ

σµµ

where iµ and 2iσ are the mean and variance of luma values

in a 8x8 window of i’th image and ji,σ is the covariance

between luma values in corresponding 8x8 windows of i’th

and j’th images. 1C and 2C are constants based on the

dynamic range of the luma values to stabilize the division

with weak denominator.

Tables 1 and 2 present the comparison of PSNR and

SSIM results obtained with the methods in [1] and [2],

respectively. When compared to [2], the proposed method

gives nearly 6.5 dB higher peak signal to noise ratio (PSNR)

for Akiyo. Moreover, gains of 2 dB for Foreman, 1 dB for

Carphone, 4 dB for Mobile and 3.5 dB for Garden are

observed when the proposed method is compared to [1].

Table 3 shows a comparison of the proposed method

with the correlation based motion vector processing method

proposed in [16]. While the performance of [16] is better

than the proposed method for the Football video, the

performances of [16] and the proposed method are

comparable for the other two videos tested.

Table 4 shows the performance gain with the

foreground/background segmentation algorithm in the

proposed method. In the no segmentation case used as a

reference, motion vectors of gap pixels are reconstructed by

bilinear interpolation of the motion vectors of the nearest

four non-gap pixels. It is seen that PSNR results for four of

the seven sequences are significantly better when

segmentation is integrated.

Figure 2 presents a frame of Akiyo interpolated with the

proposed method and the corresponding frame interpolated

with [2] for comparison. The artifacts observed at the lips

with [2] do not appear with the proposed method. In Figure

3, even though the contour details are sharper with less

ringing artifacts for the method of [2], the interpolatation

with the proposed method does not suffer from sharp,

unexpected edge discontinuities and blocking artifacts as

does [2].

4. CONCLUSION

In this paper, we introduced a foreground/background

segmentation based frame rate up-conversion method. For

MC-FRUC, the difficulty of identifying covered-uncovered

regions introduces artifacts into the interpolated frame. In

this work, the foreground/background segmentation data was

integrated with the MC-FRUC algorithm as a solution to this

problem. Experimental results on several test video

sequences have shown that this new method has better

performance in terms of PSNR and SSIM than that of [1]

and [2], and comparable performance to that of [16] on

interpolated frames. The proposed method is free of

blocking artifacts and thus provides a high subjective view

quality.

Owing largely to the segmentation algorithm integrated,

the proposed method is currently designed to work on only

video with stationary background. Future research is aimed

at extending the proposed method by integrating it with

more advanced segmentation algorithms that can handle

panning or more complex motions of the background as well

as overlapping motions of multiple foreground objects.

Table 1. Performance comparisons with the method of [1].

Video

Sequences

Method in [1] Proposed Method

PSNR(dB) SSIM PSNR(dB) SSIM

Carphone 33.6726 0.9542 34.066 0.970

Foreman 33.2036 0.9543 35.016 0.970

Mobile 23.7419 0.9106 27.670 0.942

Garden 24.6681 0.8938 28.117 0.945


Video

Sequences


PSNR(dB) PSNR(dB) SSIM

Akiyo 39.458 46.752 0.996

News 32.386 37.065 0.981

Mobile 20.757 28.252 0.951

Stefan 22.347 26.813 0.916


Video

Sequences


PSNR(dB) SSIM PSNR(dB) SSIM

Football 25.110 0.780 22.237 0.670

Foreman 31.700 0.960 32.343 0.927

Stefan 26.490 0.900 26.568 0.905

Table 4. Performance gain with the use of segmentation.

Video

Sequences

NoSegmentation Segmentation

PSNR SSIM PSNR SSIM

Akiyo 46.743 0.996 46.752 0.996

News 37.080 0.981 37.065 0.976

Foreman 30.832 0.917 32.343 0.927

Carphone 33.092 0.963 34.066 0.967

Mobile 28.077 0.946 28.252 0.951

Garden 27.430 0.939 28.117 0.945

Stefan 26.284 0.913 26.813 0.916

(a)

(b) (c)

Fig. 2. Akiyo sequence subjective comparison of a) Original frame

40 b) Method in [2] and c) Proposed Method

(a)

(b) (c)

Fig. 3. Mobile sequence subjective comparison of a) Original

frame 40 b) Method in [2] and c) Proposed Method

5. REFERENCES

[1] X. Gao, Y. Yang, B. Xiao, “Adaptive frame rate up-

conversion based on motion classification,” Elsevier,

International Journal of Signal Processing, vol. 88, no.

12, pp. 2979-2988, 2008.

[2] S.J Kung, D.G Yoo, S.K. Lee, Y.H Kim, “Design and

Implementation of Median Filter based Adaptive

Motion Vector Smoothing for Motion Compensated

Frame Rate Up-Conversion,” IEEE 13th International

Symposium Consumer Electronics ISCE’09, pp.745-

748, Kyoto, Japan, May 2009.

[3] A. N. Netravali and J. D. Robbins, “Motion-adaptive

interpolation of television frames,” Proc. Picture

Coding Symposium, June 1981.

[4] G. De Haan, Paul W.A.C. Biezen, H. Huijgen, O. A.

Ojo, “True Motion Estimation With 3-D Recursive

Search Block Matchingş” IEEE Trans. Circuit Sys.

Video Tech., pp. 368-388, Oct. 1993.

[5] J. K. Su, R. M. Mersereau, “Motion estimation methods

for overlapped block motion compensation,” IEEE

Trans. Image Proc., vol. 9, no.9, pp. 1509-1521, Sept.

2000.

[6] A. M. Huang and T.Q. Nguyen, “Correlation-based

motion vector processing for motion compensated

frame interpolation,” IEEE Trans. Image Proc., vol. 17

no.5, pp. 694–708, May 2008.

[7] M. Cetin and I. Hamzaoglu, “An adaptive true motion

estimation algorithm for frame rate conversion of high

definition video,” International Conference on Pattern

Recognition, pp.4109-4112, Istanbul, Turkey, August

2010.

[8] M. H. Chan, Y. B. Yu, and A. G. Constantinides,

“Variable size block matching motion compensation

with application to video coding,” Proc. Inst. Elect.

Eng. , vol. 137, no. 4, pp. 205–212, Aug. 1990.

[9] B.-D. Choi, J.-W. Han, C.-S. Kim, S-J. Ko, “Motion-

Compensated Frame Interpolation Using Bilateral

Motion Estimation and Adaptive Overlapped Block

Motion Compensation,” IEEE Trans. Circuits and

Systems for Video Tech., vol. 17, no. 4, pp. 407-416,

April 2007

[10] Wei Hong, “Low-Complexity Occlusion Handling for

Motion-Compensated Frame Rate Up-Conversion,” Int.

Conf. Consumer Electronics 2009, ICCE’09, pp. 1-2,

Las Vegas, NV, January 2009.

[11] B. Cizmeci and H.F. Ates. “Occlusion Aware Motion

Compensation for Video Frame Rate Up-Conversion,”

Proc. IASTED International Conf. on Signal and Image

Processing (SIP), Maui, Hawaii, August 2010.

[12] P.M Jodoin, C. Rosenberger, M. Mignotte, “Detecting

Half-Occlusion with a Fast Region Based Fusion

Procedure,” British Machine Vision Conference, pp.

417-426, Edinburgh, UK, 2006.

[13] S. Ince and J.Konrad, “Geometry-Based Estimation of

Occlusions From Video Frame Pairs,” IEEE Trans.

Acoustics, Speech and Signal Processing, vol 2, pp.

ii/933-ii/936, March 2005.

[14] Zoran Zivkovic, “Improved adaptive Gaussian mixture

model for background subtraction,”, Proceedings of

17th Intl. Conf. Pattern Recognition, ICPR 2004, vol 2,

pp. 28-31, Cambridge, UK, August 2004.

[15] R. Cucchiara, M. Piccardi and A.Prati, “Detecting

Moving Objects, Ghosts, and Shadows in Video

Streams,” IEEE Pattern Analysis and Machine

Intelligence, vol. 25, no. 10, pp. 1337-1342, Sept. 2003.

[16] A.M. Huang, T. Nguyen, “Correlation-Based Motion

Vector Processing with Adaptive Interpolation Scheme

for Motion-Compensated Frame Interpolation,” IEEE

Transactions on Image Processing , vol. 18, no. 4, pp.

740 752, April 2009.

[17] A. Hamosfakidis and Y. Paker, “A Novel Hexagonal

Search Algorithm for Fast Block Matching Motion

Estimation,” EURASIP Journal on Applied Signal

Processing vol. 2002 pp. 595-600, no. 6, June 2002.

[18] H. R. Sheikh, E.P Simoncelli, Z. Wang, A.C Bovik,

“Image quality assessment: from error visibility to

structural similarity,” IEEE Trans. Image Proc., vol. 13

no. 4pp. 600–612, April 2004.

[19] Y.M Chen, Ivan.V. Bajic, C.Qian, “Frame Rate Up-

Conversion of Compressed Video Using Region

Segmentation and Depth Ordering,” Proc. IEEE

PacRim'09, pp. 431-436, Victoria, BC, August 2009.

[IEEE 2011 IEEE International Conference on Multimedia and Expo (ICME) - Barcelona, Spain...

Documents

Transcript of [IEEE 2011 IEEE International Conference on Multimedia and Expo (ICME) - Barcelona, Spain...