Intelligent Multimedia Systems Mean-Shift tracker · 2015-03-29 · Intelligent Multimedia Systems...

10
Intelligent Multimedia Systems Mean-Shift tracker Lena Bayeva Patrick Diviacco December 21, 2009 Abstract Mean-shift algorithm is an efficient real-time object tracking algorithm that can be used in a wide range of object-tracking applications. In this work we evaluate mean-shift tracker performance with three color schemes normalized rgb, opponent, and grey-scale in different image domains. We discuss implementation, strengths and weaknesses of the algorithm, and show that the performance can be improved when specific color models are applied for particular imaging domains, both color and grey- scale. We show that mean-shift performs best with rgb color space for high contrast color domains, with opponent color space for lower contrast color domains, and grey-scale color model is the only one that works for black and white image frames. The smallest mean-shift performance error was obtained for 12 and 16 histogram bins. 1 Introduction The visual tracking of objects is an important and challenging problem with many practical applications such as tracking of humans at various scales in surveillance, video conferencing, and man-machine interaction, tracking cars and air-planes in traffic control, and particles in medicine. Mean-shift is a real-time tracking algorithm used to track objects in video sequences.[4] Mean-shift tracking algorithm is an iterative scheme based on comparing the histogram of the original object in the current image frame and histogram of candidate regions in the next image frame. The aim is to maximize the correlation between two histograms. Different color systems can be used to represent the model of the object. We investigate tracking perfor- mance with normalized rgb, opponent, and gray-scale color models. Visual tracking needs to address changes in visual appearance of the images as well as occlusions. We tested the robustness of the algorithm by tracking objects in different domains: a player in a soccer match, and a biker on the road. We first provide some background on tracking algorithms like brute-force and mean shift in section 2. We then describe target object representation via histograms created for three different color models. In section 4 we include implementation details and pseudocode, and in section 5, 6, and 7 we discuss the results and draw conclusions. 1

Transcript of Intelligent Multimedia Systems Mean-Shift tracker · 2015-03-29 · Intelligent Multimedia Systems...

Page 1: Intelligent Multimedia Systems Mean-Shift tracker · 2015-03-29 · Intelligent Multimedia Systems Mean-Shift tracker Lena Bayeva Patrick Diviacco December 21, 2009 Abstract Mean-shift

Intelligent Multimedia Systems

Mean-Shift tracker

Lena BayevaPatrick Diviacco

December 21, 2009

Abstract

Mean-shift algorithm is an efficient real-time object tracking algorithm that can be used in a widerange of object-tracking applications. In this work we evaluate mean-shift tracker performance withthree color schemes normalized rgb, opponent, and grey-scale in different image domains. We discussimplementation, strengths and weaknesses of the algorithm, and show that the performance can beimproved when specific color models are applied for particular imaging domains, both color and grey-scale. We show that mean-shift performs best with rgb color space for high contrast color domains, withopponent color space for lower contrast color domains, and grey-scale color model is the only one thatworks for black and white image frames. The smallest mean-shift performance error was obtained for 12and 16 histogram bins.

1 Introduction

The visual tracking of objects is an important and challenging problem with many practical applications suchas tracking of humans at various scales in surveillance, video conferencing, and man-machine interaction,tracking cars and air-planes in traffic control, and particles in medicine.Mean-shift is a real-time tracking algorithm used to track objects in video sequences.[4]

Mean-shift tracking algorithm is an iterative scheme based on comparing the histogram of the originalobject in the current image frame and histogram of candidate regions in the next image frame. The aim isto maximize the correlation between two histograms.

Different color systems can be used to represent the model of the object. We investigate tracking perfor-mance with normalized rgb, opponent, and gray-scale color models.

Visual tracking needs to address changes in visual appearance of the images as well as occlusions. Wetested the robustness of the algorithm by tracking objects in different domains: a player in a soccer match,and a biker on the road.

We first provide some background on tracking algorithms like brute-force and mean shift in section 2. Wethen describe target object representation via histograms created for three different color models. In section4 we include implementation details and pseudocode, and in section 5, 6, and 7 we discuss the results anddraw conclusions.

1

Page 2: Intelligent Multimedia Systems Mean-Shift tracker · 2015-03-29 · Intelligent Multimedia Systems Mean-Shift tracker Lena Bayeva Patrick Diviacco December 21, 2009 Abstract Mean-shift

2 Related Works

We have studied previously proposed tracking algorithms like brute-force and mean-shift that will be de-scribed in sections below, and selected mean-shift for our purpose of efficient real time object tracking.

2.1 Brute-force algorithm

Brute-force algorithm performs exhaustive search comparing all candidate histograms of every possible lo-cation in the current frame with the target histogram of the object to track in the previous frame.

Since this in infeasible for every possible location in the entire image, the search space is constrained byonly looking in a region around the previous location. If the target at the previous frame was at location(x,y), then your search space is restricted from (x-width, y-height) to (x+width, y+height), where (width,height) determines the size of the search window.

Brute-force search is simple to implement, and always finds a solution if it exists - the object to track isstill in the image. However, since its cost is proportional to the search window size it is not very useful forreal-time tracking.

2.2 Kernel-Based Object Tracking

The mean-shift algorithm is an iterative scheme based on comparing a color model of the target object withthat of the candidate at each frame of the video sequence. The aim is to maximize the correlation betweenthe two models in order to localize the object in the video.

To find a candidate object location corresponding to the target in the current frame, mean-shift algorithmminimizes the distance between a target and a model candidate given by a similarity function. Since thesimilarity function is smooth, the algorithm uses the gradient information provided by the mean-shift vectorto find a local optimum solution.

A summary of the algorithm that employs a metric derived from the Bhattacharyya coefcient as similaritymeasure and a mean shift procedure to perform the optimization is given below, for a detailed analysis see[4]:

• For the region defined as the target model, normalize pixel locations to be in the range [-1,1] to obtain{x∗i }i=1..n. Compute an isotropic kernel k(||x2

i ||) with a convex and monotonic decreasing kernel profilek(x), that assigns smaller weights to pixels farther away from the center.[4] We use Epanechnikov profilekernel:

k(x) =

{12c

−1(d+ 2)(1− x) if x ≤ 10 otherwise

(1)

• Given the initial target location in the first frame centered at y0, compute a weighted histogram ofpixel values by summing the weights defined by pixel locations in the previously computed isotropickernel. Thus, we obtain a weighted histogram q(y0) = {qu(y0)}u=1..m (u is a histogram bin index)with a bias towards central pixels. Such a bias allows to overcome occlusion and minor object changesfrom frame to frame.

• Compute a weighted histogram for the next frame p(y0) using the same procedure.

• Estimate distance between q and p using a similarity function give by an estimate of the Bhattacharyyacoefficient: ρ[p(y0), q] =

∑mu=1

√pu(y0qu).

• Derive the weights {wi}i=1..n, where wi =∑

u=1..m

√qu

pu(y0)σ[b(xi − u)], b(xi) is a bin index of xi, and

σ[b(xi − u)] determines bin membership - zero when input is non-zero and one otherwise.

• Find the next location of the target candidate, a weighted average given by y1 =P

i=1..n xiwiPi=1..n wi

.

2

Page 3: Intelligent Multimedia Systems Mean-Shift tracker · 2015-03-29 · Intelligent Multimedia Systems Mean-Shift tracker Lena Bayeva Patrick Diviacco December 21, 2009 Abstract Mean-shift

• Compute {pu(y1)}u=1..m and evaluate ρ[p(y1), q].

• While ρ[p(y1), q] < ρ[p(y0), q] update y1 ← 12 (y0 − y1) and evaluate ρ[p(y1), q]. This prevents the

algorithm from jumping too far below or above the center and thus makes it more efficient.

• Stop when a threshold ε is reached: ||y1 − y0|| < ε.

3 Target Representation

The target and candidate images are represented by histograms, enabling their dimensionality reduction andeasy comparison. A color histogram - a representation of the distribution of colors of the object to track -provides a compact summarization of the distribution of data in an image.Among valuable properties of a histogram are its relative invariance to translation and rotation about theviewing axis, and only a slight variance with the angle of view. By comparing histograms signatures oftwo images and thus matching the color content of one image to the other, the color histogram is particu-larly well suited for the problem of recognizing an object of unknown position and rotation within a scene. [3]

If the set of possible color values is sufficiently small, each of those colors may be placed on a range byitself; then the histogram is merely the count of pixels that have each possible color. Most often, the space isdivided into an appropriate number of ranges or bins often arranged as a regular grid and containing similarcolor values.

Many color invariant models can be used to implement a color histogram, to achieve robustness againstillumination and changes in object appearance. In our implementation of the mean-shift algorithm we useda normalized rgb, gray-scale and opponent color models, described in the following sections.

3.1 Normalized rgb color model

The rgb color system has three color features r, g, and b obtained from RGB color space by normalizing colorchannels with intensity. Thus, rgb color system is insensitive to surface orientation, illumination direction,and intensity and depends only on highlights and change in the color of illumination.[1] These propertiesmake it suitable for tracking objects in color image frames of ”soccer” and ”motorcycle” domains.

r =R

(R+G+B)(2)

g =G

(R+G+B)(3)

b =B

(R+G+B)(4)

(5)

3.2 Opponent color model

The opponent color space is represented by three channels, that can be used to construct 1-D histogramscombined into a single image histogram.[2] The first two channels (O1 and O2 see below) contain the colorinformation, and the third channel represents intensity. We will apply 2-D opponent color space - O1 O2 totrack color images, leaving out intensity values. This color space is invariant to highlight and intensity, butis sensitive to object geometry and shading.

O1 =(R−G)√

2(6)

O2 =(R+G− 2B)√

6(7)

(8)

3

Page 4: Intelligent Multimedia Systems Mean-Shift tracker · 2015-03-29 · Intelligent Multimedia Systems Mean-Shift tracker Lena Bayeva Patrick Diviacco December 21, 2009 Abstract Mean-shift

3.3 Grey-scale color model

The grey-value system is based on image intensity values. Even though grey-value is heavily influenced byimaging conditions (viewing direction, object geometry, direction of illumination, intensity and color of theilluminant), it proves to be quite useful for object detection when color information is not available and othercolor spaces fail to discriminate objects from the background, i.e. with high contrast black and white imageframes.

The grey-value color model is calculated from the original R,G,B values from the corresponding red,green and blue channels of an image. The transformation from a RGB color system is given by:

GREY = 0.299R+ 0.587G+ 0.144B (9)

We apply this color model to track an object in the gray-scale domain video of the ”motorcycle”.

4 Implementation and pseudo code

We have implemented a mean-shift tracking algorithm using MATLAB. The program allows the user toselect an object in the first frame of the video. After that the algorithm runs tracking the object thatappears inside a rectangular frame.

4.1 Pseudocode

The pseudocode for the mean-shift tracker with the main functions is given below:

meanShift:

• normalizeImage - normalize target image for a given color space

• getEpiKernel - compute Epanechnikov kernel for a target region

• getTargetPDF - compute weighted histogram for a given color space

• for all frames:

read current frame

normalizeImage - normalize current frame for a given color space

locatePlayerMS - find new candidate center coordinates

draw a rectangle around a new center (to display tracking result)

4.2 Locate Player with Mean-shift

The locatePlayerMS function implements the core of the mean shift tracker. The algorithm starts byinitializing the candidate location with coordinates of the object in the previous frame. It then computesweighted histograms for the target and candidate images and applies them to the normalized pixel coordinates(previously computed using theEpanechnikov kernel), and the horizontal and vertical shifts are computed bysumming the weighted normalized pixel coordinates. Thus, the candidate location is shifted. This procedureis applied iteratively until the shift distance between previous and newly calculated candidate location issmaller than a threshold. Finally, the algorithm corrects for overshooting, i.e. the mean-shift step is toolarge, by finding a middle point between the previously and newly calculated locations and computing anew histogram. The candidate histograms are compared to the target histogram via a similarity function(Bhattacharyya distance), the algorithm stops when the candidate distance is small.

4

Page 5: Intelligent Multimedia Systems Mean-Shift tracker · 2015-03-29 · Intelligent Multimedia Systems Mean-Shift tracker Lena Bayeva Patrick Diviacco December 21, 2009 Abstract Mean-shift

4.3 Normalization

We normalize image pixel values inside the normalizeImage function according to one of the three colorsystems we have selected: rgb, opponent, or grey-scale. The pixel values for all color systems are normalizedsuch that they fall into the [0,1] range. This allows us to use a general histogram construction algorithmand easily compare mean shift performance for different color models.

In addition to converging RGB values to rgb we also take into account black pixel values, setting valuesthat are smaller than some threshold value (i.e. 0.03) to that threshold. For the opponent color space wenormalize pixel values to be in the range [0,1], thus eliminating negative values for histogram construction.Finally, grey-value normalization computes intensity image values.

5 Results and evaluation

The mean-shift tracker was evaluated for three different color systems: normalized rgb, opponent, and grey-scale on videos from three different domains.

The following tests were performed:

• Tracking a football player in a soccer match video. The imaging conditions are pretty favorable totracking - high contrast colors (i.e. orange and white t-shirts against green grass) and no changes inillumination or viewing direction are present. The main challenge in this domain is tracking in presenceof occlusions and slight changes in object appearance. We show tracker performance with occlusions,i.e. a player is blocked by the players of the opposite team.

• Tracking a biker on the road filmed from an helicopter. The imaging conditions play a role for thisdomain, since the color contrast is low and illumination and viewing conditions vary.

• The same biker is tracked, but the frames were digitally edited to grayscale. Now, we only rely onintensity values in image frames.

Figures 1, 2, and 3 show samples of all three video sequences. Each sequence starts with the first frame ofthe video in which the user selects a target object by drawing a rectangle around it. In the following framesa red rectangle shows the region of the image selected by the mean-shift algorithm as a current location ofthe object.

Figure 1: Tracking sequence in biker domain

5

Page 6: Intelligent Multimedia Systems Mean-Shift tracker · 2015-03-29 · Intelligent Multimedia Systems Mean-Shift tracker Lena Bayeva Patrick Diviacco December 21, 2009 Abstract Mean-shift

Figure 2: Tracking sequence in gray-scale biker domain

Figure 3: Tracking sequence in soccer domain

The tracked object and the ground truth coordinates have been plotted in 2-D graphs. The X and Yaxis represents the x and y coordinates of an object in the video frame.

Figure 4 shows three plots with the coordinates of the tracked soccer player for each of the three colorsystems discussed above. We can see that for all color systems the difference between ground truth coordi-nates and mean-shift discovered centers is large in two places (around 300 and 420 on the X axis). This isdue to the occlusion that lasts for a few frames. Here the bottom part of the soccer player is covered by theopponent player. As a consequence, the red rectangle moves up and select only part of the player and thegrass of the field. This is explained by the fact that the initial target model includes the color of the grasssurrounding the player but not the color of opponent player’s t-shirt.

Although mean-shift tracker deviates from the true object coordinates in case of the occlusion, it is stillable to track the object and recovers its performance once the occlusion disappears. This shows that thealgorithm is robust to slight changes in object appearances, partly due to isotropic kernels that favor pixelvalues closer to the center, and partly due to the fact that the color variation in the object is minimal. Thetracker, however risks to lose the object if it reappears in a location far away from the last observed location.

6

Page 7: Intelligent Multimedia Systems Mean-Shift tracker · 2015-03-29 · Intelligent Multimedia Systems Mean-Shift tracker Lena Bayeva Patrick Diviacco December 21, 2009 Abstract Mean-shift

Figure 4: Soccer domain: ground truth and tracking coordinates with grey (left), opponent (middle) andnormalized rgb (right) color model

Figure 5 shows the three plots with the coordinates of the biker in the color video, while figure 6 shows aplot with the coordinates of the biker in the greyscale video using the grey color model. For the latter, theplots for the normalized rgb and opponent color spaces have been omitted, since mean-shift failed to trackthe object with these color models. In all the plots we can see a noticeable divergence between ground truthand mean-shift Y coordinates for X values around 350-400. This is explained by the fact that the object sizein the video progressively decreases as the viewpoint of the scene changes, while the size of the target modelregion remains constant. As a consequence, the red rectangle selects a region bigger than the object itself.This issue can be addressed by incorporating scaling into the mean-shift tracker algorithm as discussed in [4].

Figure 5: Biker domain: Ground truth and tracking coordinates with grey (left), opponent (middle) andnormalized rgb (right) color model

The objects have been correctly tracked in all experiments, except for the gray-scale video. In the latter,only the tracker with the built-in grey-scale or intensity model succeeded follow the biker, see Figure 6.

In order to compare the accuracy of the mean-shift algorithm for different image domains and colorspaces we computed an error represented by the mean of the distances between mean-shift and ground truthcoordinates over all frames, see Figure 7. The ground truth was obtained by manually labeling object centercoordinates.

7

Page 8: Intelligent Multimedia Systems Mean-Shift tracker · 2015-03-29 · Intelligent Multimedia Systems Mean-Shift tracker Lena Bayeva Patrick Diviacco December 21, 2009 Abstract Mean-shift

Figure 6: Biker domain (gray scale): Ground truth and tracking coordinates with grey color model

We can see that normalized rgb color system gives the best performance in the ”soccer” domain whileopponent color system does better in the ”biker” domain. In both cases the grey-scale color model performsworse as it is heavily influenced by the imaging conditions. It is, however, the only model that was able totrack the object in the grey video frames.

domaincolor model Soccer Biker Biker (gray-scale)

x err y err x err y err x err y errnormalized rgb 3.62 4.52 11.73 4.04 - -

opponent 4.29 4.21 7.31 3.45 - -

grey 4.9 5.49 18.52 6.22 18.21 6.76

Figure 7: Mean error table for different color systems and image domains

In addition to evaluating mean-shift performance for different color systems and image domains, we haveran the experiments to see if the number of histogram bins affects the tracker performance. We computedthe mean error for the number of bins in a range 4-52 with step 4 for the soccer match video frames. Wedidn’t find any particular trend in the error for a different number of bins, the error variance was only 0.44avg. pixels. The smallest error was obtained for 12 bins (8.53 avg. pixels), which only slightly out performed16 bins (8.14 avg. pixels).

8

Page 9: Intelligent Multimedia Systems Mean-Shift tracker · 2015-03-29 · Intelligent Multimedia Systems Mean-Shift tracker Lena Bayeva Patrick Diviacco December 21, 2009 Abstract Mean-shift

Finally, we compared the computation performances of exhaustive search (brute-force) and mean-shiftalgorithms, on the soccer match and biker domains. The results are summarized in the table in Figure 8.The mean-shift algorithm is 4.43 and 9.36 times faster in the soccer and biker domains respectively. Themean-shift algorithm sequences have been tested on a 2GHz machine with 2GB RAM. We used normalizedrgb color models and 16 bins histograms for both trackers. We varied the size of the search window in thebrute force algorithm to optimize the search to the size of the objects in each video. As, expected mean-shiftalgorithm performance was by orders of magnitude better than brute-force search.

Soccer Biker

mean-shift 44s 31s

brute-force 195s 290ssearch window 40x40 pixels 100x70 pixels

Figure 8: Computational time for brute-force and mean-shift trackers

6 Discussion

With our experiments we have confirmed the inherent properties of the mean-shift tracker along with itsweaknesses. Among the weaknesses of the algorithm are poor performance with occlusions and changesin object appearance. When occlusion occurs, it risks to lose the object if it reappears far away from thelast observed location. This weakness is inherent in the algorithm as the search for the candidate objectis done in the neighborhood of its location in the previous frame. It might be possible to augment the al-gorithm with heuristics that will explore areas that are farther away from the center and rediscover the object.

Our implementation of the mean shift is also not very robust to changes in the object appearance, i.e.changes in size and viewing angle. Changes in size can be addressed by incorporating scaling into the al-gorithm as discussed in [4]. In addition to that, the target histogram can be updated from time to time tomake the algorithm robust to changes in object appearance.

It is also clear from the experiments that mean-shift tracker has to take into account image domainsfor which the tracking is done and select the appropriate color system to obtain the best performance.For example, 2-D opponent color system is best to use for images with changing lighting conditions as itis invariant to shift-invariant with respect to intensity [2]. Normalized rgb is sensitive to highlights andchanges in color of the illumination, but is independent of surface orientation, and illumination intensityand direction [1], which makes it suitable for high-contrast color images with uniform illumination. Finally,grey-scale color model is the best fit for black and white color domains, where only intensity values areavailable.

9

Page 10: Intelligent Multimedia Systems Mean-Shift tracker · 2015-03-29 · Intelligent Multimedia Systems Mean-Shift tracker Lena Bayeva Patrick Diviacco December 21, 2009 Abstract Mean-shift

7 Conclusions

We have implemented mean-shift tracker and evaluated its performance for different image domains (i.e. colorand grey video frames) and with different color systems (normalized rgb, opponent, and grey-scale). Wehave obtained the best performance for the high-contrast colors (soccer match) domain with the normalizedrgb color system, low-contrust color (biker on the freeway) domain with the opponent color system, andblack and white image domain with the grey-scale color system. We have also observed that changing thenumber of histogram bins, didn’t influence tracker performance that much - standard deviation in the meanpixel error was only 0.44 avg. pixels. The error was smallest for 12 and 16 histogram bins. The performancewas evaluated against manually annotated ground truth. We have observed several tracker weaknesses, i.e.sensitivity to occlusions and object changes and noted of possible improvements to the algorithm, i.e. scalingand target histogram updates. The tracker, therefore will perform best when configured for a specific imagedomain.

8 References

[1] Color survey paper Th. Gevers, Color in ”Image Search Engines Survey on color for image retrievalfrom Multimedia Search,” ed. M. Lew, Springer Verlag, January, 2001. http://staff.science.uva.nl/ gev-ers/pub/survey color.pdf

[2] Evaluating Color Descriptors for Object and Scene Recognition, Koen E. A. van de Sande, StudentMember, IEEE, Theo Gevers, Member, IEEE, and Cees G. M. Snoek, Member, IEEE,http://www.science.uva.nl/research/publications/2010/vandeSandeTPAMI2010/vandeSandeTPAMI2010-EvaluatingColorDescriptors.pdf

[3] Image Search Engines, An Overview Th. Gevers and A. W. M. Smeulders, Faculty of ScienceUniversity of Netherlands

[4] D. Comaniciu, V. Ramesh, and P. Meer, ”Kernel-Based Object Tracking,” IEEE trans. on PatternRecognition and Machine Intelligence, May 2003, vol. 25, number 5, pp. 564- 578, 2003,http://www.caip.rutgers.edu/c̃omanici/Papers/KernelTracking.pdf

10