A novel algorithm to segment foreground from a similarly colored background

10
Int. J. Electron. Commun. (AEÜ) 63 (2009) 831 – 840 www.elsevier.de/aeue A novel algorithm to segment foreground from a similarly colored background Xiang Zhang , Jie Yang Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai 200240, China Received 26 February 2008; accepted 16 June 2008 Abstract Color similarity between the background and the foreground causes most moving object detection algorithms to fail. This paper proposes a novel algorithm designed to segment the foreground from a similarly colored background. Central to this algorithm is that the motion cue of the moving object is useful for foreground modeling. We predict the position of the moving object in the current frame using historical motion information, and then use the prediction information to construct a predictive model. The mixture foreground model is a union of the predictive model and the general foreground model. Final segmentation is obtained by combining a likelihood modification technique and the mixture foreground model. Experimental results on typical sequences show that the proposed algorithm is efficient. 2008 Elsevier GmbH. All rights reserved. Keywords: Foreground segmentation; Color similarity; Foreground modeling; Confusion point 1. Introduction Foreground segmentation from an image sequence is an important precursor of many computer vision applications. Color similarity between the background and the foreground causes most algorithms to fail. A technical report on au- thors’ earlier work shows that the confusion point is the essence of the misclassification caused by color similarity, and developed a likelihood modification technique (LMT) to deal with color similarity to some extent. Examples of color similarity are shown in Fig. 1a. Because of the simi- lar color distributions between the foreground and the back- ground, in Fig. 1b many foreground pixels are misclassified by Sheikh’s algorithm [1], which may be the best moving object detection algorithm if color similarity is ignored. All foreground segmentation algorithms can be classified as unsupervised algorithms or supervised algorithms. Most algorithms tend to fall into the first category of unsupervised Corresponding author. Tel./fax: +86 21 34204033. E-mail address: [email protected] (X. Zhang). 1434-8411/$ - see front matter 2008 Elsevier GmbH. All rights reserved. doi:10.1016/j.aeue.2008.06.009 algorithms, while a few algorithms used for layer extraction and tracking are supervised, such as [2,3]. Background modeling is the early criterion exploited for moving object detection. According to whether the back- ground is modeled at each pixel or a common model is kept for all pixels, background modeling approaches can be classified into two categories: pixel-level approaches and global-level approaches. In [4,5], the color of each pixel is modeled as a single Gaussian distribution. Mixture of Gaus- sians (MoG) model is presented later in [6,7]. The nonpara- metric statistical model was first introduced by Elgammal et al. [8], and it has become the most widely used model, for it is capable of modeling multi-modalities of dynamic scenes. Given a nonparametric statistical model, the kernel density estimation (KDE) is used to calculate the member- ship probabilities of observations. The general KDE is com- putationally expensive, so several fast KDE are developed in [9,10]. Although the entire background is represented by a single distribution in [1], it should be classified as a pixel- level method, since location is used for modeling, which

Transcript of A novel algorithm to segment foreground from a similarly colored background

Page 1: A novel algorithm to segment foreground from a similarly colored background

Int. J. Electron. Commun. (AEÜ) 63 (2009) 831–840

www.elsevier.de/aeue

Anovel algorithm to segment foreground from a similarlycolored background

Xiang Zhang∗, Jie YangInstitute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai 200240, China

Received 26 February 2008; accepted 16 June 2008

Abstract

Color similarity between the background and the foreground causes most moving object detection algorithms to fail. Thispaper proposes a novel algorithm designed to segment the foreground from a similarly colored background. Central to thisalgorithm is that the motion cue of the moving object is useful for foreground modeling. We predict the position of themoving object in the current frame using historical motion information, and then use the prediction information to construct apredictive model. The mixture foreground model is a union of the predictive model and the general foreground model. Finalsegmentation is obtained by combining a likelihood modification technique and the mixture foreground model. Experimentalresults on typical sequences show that the proposed algorithm is efficient.� 2008 Elsevier GmbH. All rights reserved.

Keywords: Foreground segmentation; Color similarity; Foreground modeling; Confusion point

1. Introduction

Foreground segmentation from an image sequence is animportant precursor of many computer vision applications.Color similarity between the background and the foregroundcauses most algorithms to fail. A technical report on au-thors’ earlier work shows that the confusion point is theessence of the misclassification caused by color similarity,and developed a likelihood modification technique (LMT)to deal with color similarity to some extent. Examples ofcolor similarity are shown in Fig. 1a. Because of the simi-lar color distributions between the foreground and the back-ground, in Fig. 1b many foreground pixels are misclassifiedby Sheikh’s algorithm [1], which may be the best movingobject detection algorithm if color similarity is ignored.All foreground segmentation algorithms can be classified

as unsupervised algorithms or supervised algorithms. Mostalgorithms tend to fall into the first category of unsupervised

∗Corresponding author. Tel./fax: +862134204033.E-mail address: [email protected] (X. Zhang).

1434-8411/$ - see front matter � 2008 Elsevier GmbH. All rights reserved.doi:10.1016/j.aeue.2008.06.009

algorithms, while a few algorithms used for layer extractionand tracking are supervised, such as [2,3].Background modeling is the early criterion exploited for

moving object detection. According to whether the back-ground is modeled at each pixel or a common model iskept for all pixels, background modeling approaches can beclassified into two categories: pixel-level approaches andglobal-level approaches. In [4,5], the color of each pixel ismodeled as a single Gaussian distribution. Mixture of Gaus-sians (MoG) model is presented later in [6,7]. The nonpara-metric statistical model was first introduced by Elgammalet al. [8], and it has become the most widely used model,for it is capable of modeling multi-modalities of dynamicscenes. Given a nonparametric statistical model, the kerneldensity estimation (KDE) is used to calculate the member-ship probabilities of observations. The general KDE is com-putationally expensive, so several fast KDE are developedin [9,10]. Although the entire background is represented bya single distribution in [1], it should be classified as a pixel-level method, since location is used for modeling, which

Page 2: A novel algorithm to segment foreground from a similarly colored background

832 X. Zhang, J. Yang / Int. J. Electron. Commun. (AEÜ) 63 (2009) 831–840

Fig. 1. (a) Shows two frames with color similarity and (b) shows segmentations by Sheikh’s algorithm.

implicitly provides pixel level information. A few predictivemodels [11,12] should also be classified as pixel-level mod-els. Predictive models can be created using many predic-tive theories such linear dynamic system [13], but they arequite inefficient in practical scenarios since their limitationin representing real dynamic scenes. Unlike pixel-level ap-proaches, a 3D Gaussian distribution is shared by the entirebackground in [2]. In [3], a 2D Gaussian distribution is usedfor modeling the color of the entire foreground. Generallyspeaking, better performance can be obtained by pixel-levelmodels, but they encounter difficulties in computation effi-ciency and pixel misalignment. As recent developments incomputer device, many pixel-level approaches, such as MoGand statistical models, are able to achieve real-time speed.By combining pixel-level modeling with neighborhood in-formation, nominal pixel misalignment, such as those causedby wind and ground vibration, is successfully draft with asin [1]. Thus pixel-level approaches are more popular thanglobal-level approaches.In recent years, foreground modeling [1,2,14] has been

used in conjunction with background modeling for accu-rate detection. Foreground models can be constructed in a

consistent fashion with background models. The nonpara-metric statistical model is still the most widely used modelfor foreground modeling.Once membership probabilities are obtained, the decision

of pixels’ labels is usually made by direct thresholding in thepast [6,8]. After foreground modeling is exploited for ob-ject detection, an energy function is created by incorporatingmembership probabilities together with prior information,which can then be minimized with various energy minimiza-tion tools, such as graph cut [15,16] and belief propaga-tion [17], to make an overall optimal inference about pixels’labels. Another key factor of modeling is feature selection.Intensity and color are the most frequently used features.Some more complex features have been exploited for mod-eling in recent years, such as contrast, stereo, motion, harriscorner [18], etc. contrast [14] has been shown to be a use-ful cue to make segmentation align with contours of highimage contrast. However, it is detriments to segmentationof similarly colored objects, since there is no distinct edgebetween a similarly colored foreground and background.Criminisi et al. [19] apply stereo for layer extraction, andlater they exploit spatial and temporal derivatives to model

Page 3: A novel algorithm to segment foreground from a similarly colored background

X. Zhang, J. Yang / Int. J. Electron. Commun. (AEÜ) 63 (2009) 831–840 833

motion instead of optical flow. A combination of optical flowand color is used to model the background in [20]. Anothercombination of color and texture is used for moving objectmodeling in Ercan’s algorithm [3]. Ercan’s algorithm canalso be used to detect similarly colored objects. Here ROI isselected in the initialization stage, and then a Gibbs–Markovrandom field is used for modeling the target in ROI, andso Ercan’s algorithm is supervised, while the proposed al-gorithm is unsupervised. The foreground model in Ercan’salgorithm is a global-level model, while the proposed modelis a pixel-level model. Ercan’s algorithm classifies observa-tions by thresholding, but the proposed algorithm uses anenergy minimization tool for classification. Since the topicof Ercan’s algorithm is object tracking, background model-ing is not used, and segmentation is performed at the coarseregion level. In line with the demand on tracking, Ercan’salgorithm shows good results of tracking similarly coloredobjects. In contrast the proposed algorithm models both thebackground and the foreground simultaneously, and aims tosegment each similarly colored foreground pixel to obtainhigher detection accuracy.Motion prediction has long been used for tracking [3],

while we use it for segmentation. However, motion informa-tion is not used for statistical modeling, but used to shift theforeground model. The centroid of the moving object in thecurrent frame is predicted by a Kalman filter. Then, the pre-diction information is used to construct a predictive modelby shifting a sample set. Considering the possible predic-tion error, the mixture foreground model is constructed bycombining the predictive model with the general foregroundmodel. Final segmentation is obtained by combining LMTand the mixture foreground model.This paper is organized as follows: modeling and the con-

cept of confusion point are given in Section 2. LMT is in-troduced in Section 3. Section 4 presents details of the mix-ture foreground model. Experimental results are shown inSection 5, followed by the conclusion in Section 6.

2. Modeling

An input frame of an image sequence is represented as anarray I =(I1, I2, . . . , In, . . . , IN ), where n is the index of theimage lattice. The objective of object detection is to assigneach pixel a label from the set (background, foreground).Background and foreground are modeled as two indepen-dent nonparametric statistical models in this paper, with a5D feature space composed of 2D location space and 3Dcolor space, as in [1]. Consider pixel In at time instant t , be-fore which all pixels in a neighborhood of position n form asample set �bn=(y1, . . . , ym, . . . , yM ). This sample set is thenonparametric background model of In . Background proba-bility, the probability that In belongs to the background, canbe estimated by

p(In|�bn) = M−1∑

M

fH (In − ym) (1)

where f is a d-variate kernel function and H is the band-width matrix. In order to adapt to background changes, asliding window of length �b frames is maintained.

Consider the same pixel In ; all pixels labeled as fore-ground in a neighborhood of position n before time t formthe general foreground model �fn = (y1, . . . , ym, . . . , yM ).The foreground probability can be estimated by

p(In|�fn) = �� + (1 − �)M−1∑

M

fH (In − ym) (2)

where � is a weight coefficient and � is a random variablewith uniform probability. Similarly another sliding windowof length �f is maintained for �fn . Given the backgroundand foreground models, an energy function is constructed asin [1], which is then minimized by the graph cut techniqueto classify the observations in this paper.The likelihood ratio � of In is defined as

�(In) = −ln(p(In|�fn))/(−ln(p(In|�bn))) (3)

The foreground pixel which exhibits similar color againstthe background is defined as a fallible pixel. Confusion point(Fig. 2a) is the essence of the misdetections of fallible pixelscaused by color similarity. Assume the sum of p(In|�fn) andp(In|�bn) is 1, the negative log-likelihood (NL) of p(In|�fn)is denoted as

−ln(p(In|�fn)) = −ln(1 − p(In|�bn)) (4)

with p(In|�bn) as the x-axis, the confusion point lies onthe cross-point of the two negative log-likelihood functions(NLFs): NLF of the foreground and NLF of the background.

3. LMT

Experiments show that all misclassified fallible pixels andall background pixels lie in two partially overlapped inter-vals: a fallible interval (x2, x3) with the confusion point asthe center, and a background interval (x4, x5). Fig. 2a showsthat pixels in the left and right of the confusion point tendto be classified as foreground and background, respectively.Fallible pixels will be rightly classified if the confusion pointcan be moved toward the right side of the fallible interval.This can be achieved by weighting NLFs (Fig. 2b). Fore-ground NLF and background NLF are weighted by cf andcb, respectively, in Fig. 2b, with cf ∈ (0, 1) and cb ∈ (1, ∞).However, weighting in the whole definition area results inan increase in false alarms. LMT (Fig. 2c) is developed toshift the confusion point by weighting NLFs only in a localinterval. The shifted confusion point x1 is defined as

x1 = x3 + (x4 − x3)/4 (5)

Thus the shifted confusion point is close to the lower limit ofthe overlap interval. cb and cf can be computed as follows:

cb/cf = x1 (6)

cb + cf = 2 (7)

Page 4: A novel algorithm to segment foreground from a similarly colored background

834 X. Zhang, J. Yang / Int. J. Electron. Commun. (AEÜ) 63 (2009) 831–840

Fig. 2. (a) Shows the confusion point, (b) shows weighted NLFs in the whole definition area and (c) shows LMT.

Fig. 3. (a) and (b) shows the histogram of � of all misclassified fallible pixels and the histogram of � of all background pixels in theupper image of Fig. 1b. (c) shows the histogram of � of all misclassified fallible pixels in the lower image of Fig. 1b.

Eq. (6) accounts for the fact that the weighted NLF ofthe foreground is equal to the weighted NLF of the back-ground when In = x1. An additional constraint is imposedthat the absolute coefficient differences between the origi-nal and current weight coefficients, which are |1 − cf | and|1 − cb|, are equal for the two NLFs, resulting in Eq. (7).Given weight coefficients, the weighting operation of

LMT is performed in the weighting interval (x2, x3 − �).The misdetection rate drops with a small �, but the falsealarm rate rises at the same time. We use a fixed learningsequence to look for the best � from a group of predefined�, with all candidates being in the interval (0, 0.3), basedon an empirical conclusion that the best � is always inthis interval. Foreground segmentation is performed on thelearning sequence with each �. Then we choose such a �,whose false alarm rate is the smallest of all false alarm ratesthat are larger than a predefined threshold.Let us compute the histogram of � of all misclassified fal-

lible pixels, and also the histogram of � of all backgroundpixels in the upper image of Fig. 1b, which are shown inFigs. 3a and b, respectively. Fig. 3 verifies the existence ofthe fallible interval and the background interval. The his-togram in Fig. 3a should be centered at �= 1, but obviouslythe center point is larger than 1 due to the setting of modelparameters. However, all parameters can be learned in prac-tice, and thus the analysis in the above text is still applicable.

Fig. 3c shows the histogram of � of all misclassified fal-lible pixels in the lower image of Fig. 1b. Model parametersused in Fig. 1 are �b = 20, �f = 3 and � = 0.015. If anothergroup of parameters are used with the same �b and �f but�=0.2, all fallible pixels will lie in another interval (0.7, 1.3)for the two frames in Fig. 1a. These experiments and moreexperiments show that the fallible interval is robust to scenechanges but dependent on modeling. Given fixed modelparameters, � is also robust to scenes, but occasionally weneed to adjust � minutely for moving objects with differentvelocities. This relies on the observation that fallible pix-els are sparsely distributed and densely distributed in thefallible interval for fast-moving objects and slow-movingobjects, respectively. If moving objects in the learningsequence move very slowly, segmentations of fast-movingobjects with learned parameters may not be good enough.However, the moving object in the learning sequence weused has a moderate velocity, and there are not always veryfast-moving and very slow-moving objects in the scene andso we do not adjust � in all experiments in this paper.

4. The mixture foreground model

Let us redefine the general foreground model inSection 2 as a sample set �f comprising all pixels labeled as

Page 5: A novel algorithm to segment foreground from a similarly colored background

X. Zhang, J. Yang / Int. J. Electron. Commun. (AEÜ) 63 (2009) 831–840 835

Fig. 4. (a) Shows two frames with color similarity. (b) shows segmentations by Sheikh’s algorithm. (c) shows segmentations by the newforeground model. (d) shows segmentations by the proposed algorithm.

Fig. 5. (a) Shows a frame with color similarity. (b) shows segmentation by the proposed algorithm with � = 0.3. (c) shows segmentationby the proposed algorithm with � = 0.7.

foreground in the last �f frames. When computing the fore-ground probability of In , we firstly choose the subset �fnfrom �f , and then Eq. (2) is used. �f can be written as�f=(Y t−1, Y t−2, . . . , Y t−�f ), where Y t−i includes all pixelslabeled as foreground in frame t . Please note that LMT isnot used for Y t−i .If partial elements in �f are substituted by seg-

mentations with LMT, a new foreground model is

created. The new foreground model can be written as(Y t−1

L , . . . , Yt−�LL , Y t−�L−1 . . . , Y t−�f ), where Y t−i

L is a setof pixels labeled as foreground with LMT in frame t − i .Motion information has long been used for tracking andbackground modeling, but it is not considered by all preced-ing foreground models, including the general foregroundmodel and the new foreground model. However, we assertthe motion information should be useful for foreground

Page 6: A novel algorithm to segment foreground from a similarly colored background

836 X. Zhang, J. Yang / Int. J. Electron. Commun. (AEÜ) 63 (2009) 831–840

Fig. 6. (b) Shows segmentation of frame 2 in Fig. 4a with learned �. (a) and (c) shows segmentations with a smaller � and larger �compared with learned �, respectively. Please note that the foreground model used in this experiment is the general foreground model.

Fig. 7. The first test sequence.

modeling, and use two examples shown in Fig. 4 to verifythis conclusion.Fig. 4a shows two frames with color similarity, with frame

1 and frame 2 being used to represent the upper and thelower images, respectively. The color of the hair, frock andskirt of the pedestrian is similar to that of the background.Figs. 4b and c show segmentations by Sheikh’s algorithmand the new foreground model, respectively. Let us focus

only on fallible pixels on the skirt. Most fallible pixels onthe skirt are misclassified by Sheikh’s algorithm. Almost allfallible pixels on the skirt of frame 1 are detected by the newforeground model, but most of them still cannot be detectedin the lower image of Fig. 4c.

The reason for the misdetections in Fig. 4c is not only theconfusion point but also the rapid motion of the pedestrian.Since the pedestrian is moving on the ground plane, the

Page 7: A novel algorithm to segment foreground from a similarly colored background

X. Zhang, J. Yang / Int. J. Electron. Commun. (AEÜ) 63 (2009) 831–840 837

Fig. 8. The second test sequence.

vertical displacement can be ignored. The horizontal dis-placement of the pedestrian from the previous frame to frame1 is less than 10pixel, whereas it is more than 25pixel forframe 2. The pixels in the foreground model of frame 2,no matter whether it is the general foreground model or thenew foreground model, are far away from their correspond-ing pixels in frame 2. Consequently, many fallible pixels inframe 2 are still misclassified in Fig. 4c. In contrast, pixelsin the foreground model of frame 1 are not too far from theircorresponding pixels in frame 1, and therefore most falli-ble pixels are detected. This example indicates that a fore-ground model without considering motion information is notenough for moving object detection. It is a natural way toassume that shifting the foreground model to the position of

the moving object will lead to reduction of the misdetectionrate.The movement of an object is reduced to the movement

of its centroid in this paper. The centroid C of the movingobject is denoted as C=(ch, cv), where ch and cv are the hor-izontal coordinate and the vertical coordinate, respectively.The centroid of a moving object in frame t is denoted asCt . All centroids of the moving object before the currentframe form a centroid vector. For simplicity, only single ob-ject detection is considered, and so the computation of thecentroid is directly carried out on the final segmentation ofeach frame.Based on the centroid vector (Ct−1,Ct−2, . . . ,C0) of

current frame t , the centroid of the moving object in the

Page 8: A novel algorithm to segment foreground from a similarly colored background

838 X. Zhang, J. Yang / Int. J. Electron. Commun. (AEÜ) 63 (2009) 831–840

Fig. 9. The third test sequence.

current frame can be predicted by many prediction algo-rithms. A simple Kalman filter is used here. The predictedcentroid is denoted as PCt . Then a predictive foregroundmodel is constructed by shifting all elements in a sampleset (Y t−1

L , . . . , Yt−�f1L ) from their centroids to the position

of PCt . The predictive foreground model �p is denoted as

�p = (Y t−1L (Ct−1, PCt ), . . . , Y

t−�f1L (Ct−�f1 , PCt )) (8)

where Y t−iL (Ct−i , PCt ) is constructed by shifting the cen-

troid of Y t−iL from Ct−i to PCt . The shift of the centroid

is very simple. We just move all pixels in Y t−iL along the

positive direction of the horizontal axis with (pctv − ct−iv )

pixels, and then move all pixels along the positive direction

of the vertical axis with (pcth − ct−ih ) pixels. Zero is filled

in the blanks left by shifted pixels, and those shifted pixelsout of the image are discarded directly. Please note anotherlearning rate �f1 is used in Eq. (8). The selection of �f1 isimportant. Generally speaking, we should choose a small

�f1. When the shape of the moving object changes slowly,

a relatively large �f1 can be considered.The mixture foreground model is developed for the

consideration that the segmentation by �p is not very re-liable if a prediction error occurs. is a union of the

Page 9: A novel algorithm to segment foreground from a similarly colored background

X. Zhang, J. Yang / Int. J. Electron. Commun. (AEÜ) 63 (2009) 831–840 839

predictive model and the general foreground model, and so = (�p, �f ). Final segmentation is obtained by combin-ing LMT and . However, we do not use to computemembership probabilities directly. Since pixels densely clus-ter around the predicted centroid in �p, but are relativelysparsely distributed in �f , the foreground probability com-puted using �p is larger than that computed using �f . Thatis to say, the contribution of �p is larger than that of �f tothe foreground probability computed using directly. Con-sequently, the segmentation by directly using is similarto the segmentation using only �p. To balance the contribu-tions between �p and �f , �p and �f are used to compute theforeground probabilities separately, and the weight sum ofthe two probabilities are considered as the final foregroundprobability. The final foreground probability is denoted as

p(In|) = �p(In|�f ) + (1 − �)p(In|�p) (9)

where � is a constant used to balance the two probabilities.We choose such a �, where 0.5< �< 1. This accounts forthe fact that p(In|�p) is larger than p(In|�f ). To weakenthe contribution of p(In|�p), a large � is necessary. Thefinal segmentations of Fig. 4a by the proposed algorithmare shown in Fig. 4d. Another two experiments are given atthe end of this section to show how parameters � and � canimpact segmentation results. Segmentations with different� are shown in Fig. 5. Segmentations with different � areshown in Fig. 6. Please note that the mixture foregroundmodel is not used in Fig. 6.

5. Experimental results

Quantitative analysis of three typical sequences is pre-sented in this section. The proposed algorithm can processabout 8 fps for a frame size of 240 ∗ 320 on a 2.8G P4processor with 1G RAM. The most time-consuming step isthe computation of KDE. Without further optimization, theproposed algorithm should run on live video by droppingframes. It is expected that faster KDE will be developed inthe future.A white van is moving in front of a parked white car in

the first sequence, which is from the PETS 2001 database.Two typical frames are shown in the top row of Fig. 7. Seg-mentations of four sequential frames in this sequence bySheikh’s algorithm and the proposed algorithm are shownin the middle and the bottom rows respectively. This experi-ment shows almost all fallible pixels can be rightly classifiedby the proposed algorithm.The second sequence is from the Georgia-Tech gait

dataset. Since the pedestrian is a small object in the sec-ond sequence, only sub-images including the pedestrianare shown in the top row of Fig. 8. The white T-shirt ofthe pedestrian is easy to be confused with the background.Segmentations by Sheikh’s algorithm, the new foregroundmodel and the proposed algorithm are shown from thesecond row to the bottom row, respectively. Fig. 8 shows

the proposed algorithm is also efficient for small objectdetection.Two typical frames of the third sequence are shown in

the top row of Fig. 9. The white garment of the pedestrianis easy to be misclassified as a background. Segmentationsof four frames in this sequence by Sheikh’s algorithm andthe proposed algorithm are shown in the second row andthe third row, respectively. Segmentations by the mixtureforeground model without using LMT are shown in the bot-tom row. Fig. 9 shows segmentations by the proposed algo-rithm are close to the ground truth data, and both LMT andthe mixture foreground model are necessary for this goodresult.

6. Conclusions

An algorithm by combining LMT and a mixture fore-ground model is developed in this paper to segment fore-ground from a similarly colored background. The principalproposition in this work is that the motion information ofthe moving object is important, and we provide a mixtureforeground model to take advantage of the motion informa-tion. Experiments show that the proposed algorithm has sat-isfactory performance in several challenging settings, suchas indoor and outdoor scenes, dynamic and static scenes,small and large objects.

Acknowledgments

The authors are grateful to the anonymous reviewers fortheir comments, which have helped us to greatly improve thisarticle. This study is supported by the China 863 High Tech.Plan (no. 2007AA01Z164), and supported by the NationalNatural Science Foundation of China (no. 60602012, no.60772097 and no. 60675023).

References

[1] Sheikh Y, Shah M. Bayesian modeling of dynamic scenesfor object detection. IEEE Trans Pattern Anal Mach Intell2005;27:1778–92.

[2] Criminisi A, Cross G, Blake A, Kolmogorov V. Bilayersegmentation of live video. In: IEEE conference on computervision and pattern recognition, New York, 2006. p. 53–60.

[3] Ozyildiz E, Krahnstover N, Sharma R. Adaptive textureand color segmentation for tracking moving objects. PatternRecognition 2002;35:2013–29.

[4] Wren C, Azarbayejani A, Darrel T, Pentland A. Pfinder: realtime tracking of the human body. IEEE Trans Pattern AnalMach Intell 1997;19:780–5.

[5] Koller D, Weber J, Huang T, Malik J, Ogasawara G, Rao, B,et al. Towards robust automatic traffic scene analysis in real-time. In: Proceedings of international conference on patternrecognition, Israel, 1994. p. 126–31.

Page 10: A novel algorithm to segment foreground from a similarly colored background

840 X. Zhang, J. Yang / Int. J. Electron. Commun. (AEÜ) 63 (2009) 831–840

[6] Friedman N, Russel, S. Image segmentation in videosequences: a probabilistic approach. In: Proceedings of 13thconference uncertainty in artificial intelligence, Providence,1997. p. 175–81.

[7] Stauffer C, Grimson W. Learning patterns of activity usingreal time tracking. IEEE Trans Pattern Anal Mach Intell2000;22:747–57.

[8] Elgammal A, Duraiswami R, Davis LS. Background andforeground modeling using non-parametric kernel densityestimation for visual surveillance. Proc IEEE 2002; 1151–63.

[9] Elgammal A, Duraiswami R, Davis LS. Efficient kerneldensity estimation using the fast gauss transform withapplications to color modeling and tracking. IEEE TransPattern Anal Mach Intell 2003;25:1499–504.

[10] Yang C., Duraiswami R, Gumerov N, Davis LS. Improved fastgauss transform and efficient kernel density estimation. In:IEEE conference on computer vision and pattern recognition,Madison, 2003. p. 664–71.

[11] Zhong J, Sclaroff S. Segmenting foreground objects froma dynamic textured background via a robust Kalman filter.In: IEEE international conference on computer vision, Nice,2003. p. 44–50.

[12] Monnet A, Mittal A, Paragios N, Ramesh V. Backgroundmodeling and subtraction of dynamic scenes. In: IEEEinternational conference on computer vision, Nice, 2003.p. 1305–12.

[13] Doretto G, Chiuso A, Wu Y, Soatto S. Dynamic textures. IntJ Comput Vision 2003;2:91–109.

[14] Sun J, Zhang W, Tang X, Shum HY. Background cut.In: European conference on computer vision, Graz, 2006.p. 628–41.

[15] Kolmogorov V, Zabih R. What energy functions can beminimized via graph cuts. IEEE Trans Pattern Anal MachIntell 2004;26:147–59.

[16] Boykov Y, Veksler O, Zabih R. Fast approximate energyminimization via graph cuts. IEEE Trans Pattern Anal MachIntel 2001;23:1222–39.

[17] Mahamud S. Comparing belief propagation and graph cuts fornovelty detection. In: IEEE conference on computer visionand pattern recognition, New York, 2006. p. 1154–59.

[18] Zhu Q, Avidan S, Cheng K. Learning a sparse, corner-basedrepresentation for time-varying background modeling. In:IEEE interanational conference on computer vision, Beijing,2005. p. 678–85.

[19] Komogorov V, Criminisi A, Blake A, Cross G, Rother C.Bi-layer segmentation of binocular stereo video. In: IEEEconference on computer vision and pattern recognition, SanDiego, 2005. p. 1186.

[20] Mittal A, Paragios N. Motion-based background subtractionusing adaptive kernel density estimation. In: IEEE conferenceon computer vision and pattern recognition, Washington,2004. p. 302–09.

Xiang Zhang received his B.E. andM.S. degrees in Communication Engi-neering from the University of Elec-tronic Science and Technology ofChina. Since 2006 he has been a Ph.D.candidate at the Institute of ImageProcessing and Pattern Recognition,Shanghai Jiaotong University, China.His research interests include imageand video Processing, video coding anddecoding.

Jie Yang received his Ph.D. degree incomputer science from the University ofHamburg, Germany. Now he is a pro-fessor and doctoral supervisor at the In-stitute of Image Processing and PatternRecognition, Shanghai Jiaotong Univer-sity, China. His research interests are inthe areas of image processing, patternrecognition and data mining.