Object Tracking in Video Sequence Images Based on Color...

International Journal of Mathematics and Computer Sciences (IJMCS) ISSN: 2305-7661 Vol.34 Oct 2014 www.scholarism.net

990

Object Tracking in Video Sequence Images Based on Color Histogram and Central Voting

Ahmad samaeifar1 iman attarzadeh

2 bita shadgar

3

1Department of computer, Dezful Branch , Islamic Azad university, Dezful, Iran,Email:[email protected]

2Department of computer, Dezful Branch, Islamic Azad university, Dezful, ran,Email:[email protected]

3Department of computer Shahid chamran Aahvaz University of, Iran Email:[email protected]

Abstract

Machine vision with combination of methods related to image processing and machine learning tools makes

computer to be able to meanings intelligent understanding and image contents. Object tracking is a fundamental

operation for many high level machine vision applications such as recognition based on movement, automatic

monitoring, indexing of video files, human-computer mutual communications, traffic monitoring and vehicle

guidance that nowadays places at the highest attention level. In this paper, we introduce a new algorithm for

tracking in video sequence. We present a specific color histogram model for showing the target under tracking.

By using this particular model, a central voting method based on generalized Hough transform is used to

estimate object position in each frame. In this method, from each available pixel in the previous frame target

area, one vote is selected to determine the new position of target center in new frame and the candidate pixel

specifies the target center with maximum vote of new position. After specifying the new center position, a

recursive process is done for separating the target from the background and updating the model parameters. The

simulation results indicate the successful target tracking, even in the situations that target has changes in size or

when target has the same color with the image background.

Key words

Machine vision, Object tracking, Generalized Hough transform, Color histogram, Central voting.

mailto:Email%[email protected]




991

1-Introduction

There are several problems in object tracking in video images sequence that we intend to overcome them. First,

conventional systems of object tracking may be confused in image that its background is changing. The next

problem is complexity of movement in desired object. In addition, being rigid or non-rigid of objects under

tracking, local blocking of the image and the image lighting change are also topics that challenge system [1].

Two main parts that determine the performance of video tracking system are the target show model and target

position detection algorithm in the image. The meaning of target show model is a framework that target is

defined in this framework and or in more precisely definition, this framework defines which features of the

target are as the representative of that target in algorithm that is under investigation, and the position detection

algorithm also states how to search in image for finding target. Based on what model is used for target show,

there are different algorithms for searching and finding target.

Tracking by contour algorithm performs the tracking operation very well even if object is not rigid [2 and 3], but

this method has very high computational processing and usage of this algorithm is not possible in online

tracking states. Furthermore this method is not able to distinguish target from background in positions that target

has the same colors with background. Tracking by feature points model method had good results so far [4, 7] in

which the target had the complicated color structure. In this method, we will be able to find the target position in

the desired frame using definition of the image by feature points and using a Newton-Raphson optimization

algorithm. Tracking with this method is fast and reliable. However, the performance quality of this algorithm is

reduced when the target position changes or turns around itself or when the image has blocking.

The usage of color histogram display model for target display is a very popular and common method in tracking

applications, because of its stability against noise and sudden position change of the target under tracking [8-

10]. An algorithm is presented for tracking of video targets using color histogram display model that is called

CAM_shift and traces the targets under study using their body color histogram method. For finding the target

position from one frame to the next one, a series of repetitive processing are performed and after finding the

target position, a rectangular area is placed around it, a rectangular shape position is determined in each frame

and puts it in a new position. Although this algorithm has been considered for face tracking, but it can be used

for tracking of other objects [8]. In [9] the presented algorithm uses of kernel coefficients and it applies the color

histogram model to target as weighted form so that pixels which are close to the target boundary that expose to

background color and more likely errors have lower coefficients for influencing on results than to pixels located

in center.

In this paper we present a new algorithm for both important stage of tracking means target model display and

finding its position. To display of target model, we present a new class of color histogram method that has

derived from the development of Spatiogram method. In this method in addition to relying on which pixels are

placed in which bin of histogram, the distance of pixels is also considered in obtaining Spatiogram. Central

voting method based on Hough procedure has been used for obtaining the position of the object in each image.

After obtaining the position of the object, its dimensions is estimated and is stored for usage in the next frame

and the desired object is separated from the background in a recursive stage and used parameters is also updated.


992

In the following, we explain the procedure of color histogram method and central voting using Hough transform

and also other required concepts for studying the presented method in this thesis.

2-Generalization Hough transform

The first Hough transform had designed for finding rupture points of lines and their connection to each other,

but by its generalization in other applications, this transform is widely used for detecting the shapes in images.

This transform consists of two parts:

Formation of R-Table is for display of shape and the second part, usage of voting method is for detecting the

desired shape in image.

2-1-Formation of R-Table:

In an arbitrary shape in the image under investigation, for each point on the boundary of forming

shape perimeter, the value of directional gradient and also the position vector are calculated

from a reference point . Then the position vector is stored in a table called R-Table as a

function of directional gradient. Figure (1) and (2) show this procedure:

Figure (1): Implementation of Hough method for finding the center of shapes.

Table (1): R-Table


993

3-Central voting to determine the position of the object

For each pixel from shape in figure (1-b), the directional gradient is calculated. Then by using this

directional gradient and according to equation for each pixel, the central point is calculated

and is stored in R-Table. Then in a search algorithm, the points of shape are specified that the more numbers of

pixels have selected them as their central point, and the points with the maximum vote are selected as the

existing shape centers in image. Pay attention to this note that if the desired shape is circular then all points in

perimeter of circle, present the circle center as their center, but Hough transform finds the center for each other

shape, because in other shapes although all pixels do not concentrate on one center, but several points are

presented as center that the point with maximum vote is selected as center under investigation. In figure (1-C),

the lightness of points has been shown according to the amount of vote for each point, as can be seen centers are

more highlight than other points.

3-1-Tracking and determination of object position by Spatiogram:

The presented Spatiogram in [10] represents the object under tracking according to equation (1):

In which represents the number of pixels that locate in bin bth of object color histogram and is average

position vector in bin bth, and is also covariance matrix of pixels in bin bth. Number B shows total numbers

of pixels in rectangular. Assume that we have the Spatiogram information of target object h from pervious

frame, object position in current frame is found by algorithm that calculates the amount of similarity between all

Spatiograms y in current frame image and Spatiogram h. Spatiogram y is defined as follows:

In which y is a shape with dimensions of target shape (dimensions of target shape are on hand from pervious

frame), in each point that the similarity between object Spatiogram y and h is maximized that point is presented

as object position.

The similarity between two Spatiogram is calculated according to equation (2):

In which weight parameter is calculated from the following equation:


994

In which is normal Gaussian constant and also for we have:

Position y is presented as target object position if can maximize the similarity function. Therefore the procedure

in positioning with central voting method and Hough transform is as follows: at first user specifies the position

of object shape in the first frame manually, then tracking algorithm applies Hough transform to object shape and

gives the results to central voting function for tracking and according to what was said before, the position of

target object is found. Equation (2) is also computed by descending gradient method, but central voting method

using Hough function operates about three times faster. In next section we present an algorithm that develops

the presented method in this chapter and enhances the performance and precision.

4-Proposed algorithm

The problem of video tracking is defined in such a way that by having a series of object information in current

frame, we intend to track them in the next frames. It is assumed that the object under tracking in the first frame

is determined by operator and or the tracking system finds the object position in primary frame according to the

features that object has given to it. Anyway it is assumed that the system in primary frame has the object

position with rectangular perimeter surrounding object, and the problem solution is to find this rectangular in the

next frames.

4-1-Object display model

We assume is position of all pixels that belongs to object in image. represents object

center and and are half of length and width the rectangular that has the object, figure (2-a).

Figure (2)


995

We use a particular histogram model as for object model display,

in which represents average position of all existing pixels in bin bth and cluster kth. Here cluster is said

to all existing pixels in one bin of histogram that their Euclidean distance from each other is not more than a

threshold. In fact, in this display model not only the color density of pixels has been considered as target

characteristic, but also their distance amount has been considered.

represents the existing pixels probability in cluster th, or in other words is the result of

division the number of existing pixels in cluster th on all forming pixels of target shape. The following

mathematical equation makes this problem clear:

In which C is normalization constant that makes us certain that , for this purpose, the

equation for calculating C is as follows:

is equal to one when is a pixel that places in bin bth histogram and the distance of these pixels from

each other is less than a threshold distance and this value is equal to zero in other pixels. The following

algorithm demonstrates how calculates:

Algorithm 1: Calculation of

Input: Position of pixel

Output: Cluster

1. Calculation bin b for pixel

2.Create a new cluster with name if there is not any other cluster in bin bth.

3.Otherwise, for each k:

(a).Calculate distance from .


996

(b).Place pixel in bin bth if the calculated distance in previous step is less than .

(c).Update .

4.Create a new cluster as if does not place in none of above clusters.

We use kernel function [9] in order to overcoming the possibility of object homochromatic error with image

background and also this homochromatic may mislead the system. As a matter of fact what does this function is

in such way that it does not influence all existing pixels in target rectangular equally in process of target position

estimation, but the pixels in target rectangular center have more coefficients in comparison with pixels that are

in rectangular margin. This technique causes that the margin pixels which have greater role in misleading the

system, have less impact on the calculations. Kernel function is defined according to the following equation:

In which is the normalized Euclidean distance between pixel and target object center (c) that is

calculated from the following equation:

Equation (7) is ellipse like equation of shape 1. In fact if , it is like that we are seeking a elliptic

shape with center C and with large and small diameter and , figure (2-b). For pixels located outside this

ellipse that most of them are image background pixels, their normalized distance from center is more than 1 and

according to equation (7) their kernel coefficient is equal to zero. On the other hand the pixels inside the ellipse

have coefficients between zero to one so that existing pixels in center have larger coefficient, figure (2-c).

Unlike the presented Spatiogram in previous section that each bin had an average vector and a covariance, in

presented algorithm in this chapter, each bin includes several average vectors that each vector is average of all

pixels that locate in one bin and their distance is less than a threshold . In fact covariance has been deleted in

this algorithm but in return distance factor of pixels from each other has been affected in measurement. For

example consider a target object in figure (3-a).


997

Figure (3)

Table (2)

Latin letters in the image show bins related to target rectangular pixels. Color histogram related to this target has

been presented in table (2). For simplicity, we use uniform kernel coefficients for all pixels in this example.

Note that two pixels with the same bin a, do not locate in one cluster due to long distance from each other,

therefore the bin related to a has two average vectors. On the other hand, since two pixels inside bin e are close

in the image, so they have placed in one cluster and average vector of this bin has been made of average position

of these two pixels. Indeed, bin a has been made of two cluster but bin e made of one cluster.

4-2- Determination of target position:

Determination of target position includes two steps of central voting and recursive stage. In central voting step,

all pixels in target rectangular that have obtained from previous frame, contribute in voting of object center

determination according to the procedure that will present. Rules relating to central voting step are as follows:

1.Only pixels whose color is available in obtained model histogram in the previous step are able to vote.

2.Each pixel that its color places in bin bth can add one vote to pixel as center.

3.Reliable pixels vote has larger coefficient than unreliable pixels. We explain later about being or not being

reliable. Figure (3-b) shows central voting process. Arrows in this image represent that each pixel votes to which

pixel as shape center. Pixels that have shown with represent that the color of these pixels does not place at


998

none of the existing bins in object histogram. Pixels in position that have located in bins b and c and e, vote

to pixels and and respectively. In this example

coordinate center has been considered point d. According to this coordinate center, position of pixels has been

presented in table (2). Furthermore in this table, average vector has obtained from averaging the position of

pixels in a cluster in directions of X and Y. Now in figure (3-b) that is a further time frame of shape frame (3-a),

we deal with the central voting process using presented rules and table (2) that has obtained from previous

frame. According to this example we understand that two stages of target display model determination (table

(2)) and determination of position (central voting) implement in separated frame and determination of position

of a frame is always ahead of target display model stage. After computing that each pixel gives its vote to which

pixel as target object center, a maximum-making is implemented on pixels and pixel with maximum vote is

introduced as target object center, that in this example, pixel in bin d has maximum vote and is selected as

center. The new center is introduced as target position coordinate in current frame and it is used for next

calculation in next frames.

In this example, a simple assumption was considered that target object pixels have not the same color that this is

irrational and in most cases this does not happen and makes the tracking algorithm fail when target has pixels

with the same color to background environment. This problem is soluble by taking consideration rule 3 and the

point that give more chance to reliable pixels in determination of center compared to unreliable pixels. Reliable

pixel is said to pixel that its color is available in target shape histogram but does not exist in image background.

In our algorithm, pixels with higher reliability have more chances for center determination compared to pixels

with lower reliability. This more chance is achieved by giving higher impact factor to these pixels. According to

this definition, the impact factor for pixels that have average vector located in bin bth from

target histogram is defined as follows:

In which was represented in equation (4) and is probability amount of those background pixels

that has placed in bin bth and is obtained from dividing the number of these pixels on total number of

background pixels.

Consider set of pixels as all pixels that place between ellipse with small diameter

and large diameter and ellipse with small diameter and large diameter and both ellipses

have located in center C, figure (2-b). Possibility amount is calculated as follows:


999

And

Is a normalization constant and is one when is a pixel that places in histogram bin bth and is zero in

other pixels. is also kernel function that is defined with following equation:

and is normalized Euclidean distance in equation (7) with the difference that and

have been used instead of and .

Developed kernel function which was introduced above, is attributed to all pixels zero weight in

background environment as can be seen in figure (2-b). Note that background environment area is determined

by factor that can be variable as a function of target dimensions. Here, the amount is obtained from the

following equation:

In which is a coefficient that is specified according to the speed of object movement and it depends on it. In

most tracking systems is selected equal to 2 in which target displacement in two consecutive frames is not

more than target dimensions. Larger amounts should be considered when the target speed is more.

We face a challenge in choice of so that we have to select large for high speeds and also in these speeds it

is needed to raise the computational speed. But choice a large means that a larger area of pixels is selected for

processing and as a result, large η means that computational process and time is higher that is in contradiction


1000

with high computational speed. Therefore we have to make a compromise between these two issues and obtain

the optimal state .

Set of pixels that algorithm performs processing on them are considered for finding center as algorithm range.

Algorithm range that has the direct relationship with parameter , it should be selected such that not lose the

target tracking system. For example if target position in current frame is close to target position in previous

frame, then a short range can lead to target tracking, but range should select more in high speeds of target. For

instance in figure (3-b) dotted lines that demonstrate a rectangular with center C represent the range of tracking

system so that area of this range is equal to .

The algorithm is repeated after finding the new center. So far algorithm tracks targets well but there is also other

problem that if object is not rigid and its dimensions are variable with time, so we use parameter in order to

considering these variations in the algorithm. When target dimensions gradually increase, the algorithm should

add new pixels to voting algorithm that have right to vote and also a series of pixels should be removed of

voting process in state of getting smaller of dimensions. is a parameter that determines in each frame all pixels

that their distances from center C is less than or equal to can be contributed in voting and otherwise they are

considered in background pixels. Moreover, it is allowed that these new pixels that have placed in desired

distance have right to being candidate for object centrality. This stage of algorithm places in algorithm recursive

stage. The following algorithm shows this process:

Algorithm 2: Recursive stage

1.Calculate distance from C for all pixels of system range rectangular member in point C.

2.If distance is less than , then place pixel as system range, otherwise is considered as background.

The result of adding above algorithm to the system is to consider the variations of target dimensions in tracking

algorithm. Therefore we obtain the format of new pixels in each frame after obtaining target center for using in

next frame in a recursive stage in order to contributing them in central voting and we subsequently update kernel

function parameters. We obtain the new dimensions of target rectangular using this algorithm as follows:

In which and are new dimensions of target rectangular and and are half of the length and width of

a background rectangular that is calculated by kernel function. Parameter determines that old dimensions and

the updated values with what proportion have the effect in obtaining new dimensions.


1001

5-Updating model parameters

Target model and image background parameters should update during tracking process because the related

variations can influence the model. By assuming that is image background histogram in current frame that

has been calculated by kernel function in object with centrality C, the updated value is calculated as

follows:

In which represents the impact ratio of values and in determination of new value of .

Whatever is larger so the impact of previous in determination of new is smaller and is

responsible for determining of new , and whatever is smaller so the system pay no attention to updating

value and relies on previous for determination . = 0.5 is usually used when background is changing

rapidly.

Updating the histogram model parameters has been made of three parts; combining, adding and pruning.

Considering as the calculated object histogram in current frame,

histogram model parameters are updated during the following procedure by placing kernel function in center C

by values :

1.At first combine with by taking an average if they matched, it means their distance is less

than a threshold between clusters , and subsequently the probability value is also updated according to the

following equation:

If distance is much than all values of and any match was not found, for new value we have:

2.Add to model histogram if any match was not found in histogram h.

3.Delete from model if the probability value of is less than a threshold . In practice =

0.0001 can be a good choice.


1002

Parameter is one of the model parameters that behaves like . In situations which model dimensions do not

change rapidly, we select small because the model does not change much, but in situations that we have large

variations from one frame to another, we select large so that the system can change the model smartly and

according to the presented algorithm, and also can adapt itself with variations. In practice = 0.1 in situations

with low variations ,and in much variations we select it larger. After updating the model parameters, the new

values of should be normalized by dividing them on and their values place between zero

to one so that is equal to one on all s.

The algorithm was presented in this paper is the result of developing and combining two methods of Spatiogram

tracking and Hough transform that were presented in section 2. In this algorithm unlike other methods that target

position is obtained according to Spatiogram and Hough transform, we used an adaptive voting method based on

Hough transform. It should be noted that before using this algorithm, we need to have Spatiogram results in a

table like R-Table.

The presented algorithm in this paper has several advantages compared to other methods that we introduced

them. The first difference between this algorithm and similar tracking algorithms is that we give the chance to

each bin of histogram to have several average vector that it causes we have more complete information from

histogram and the tracking process operates more accurate. Secondly the central voting method that is applied

adaptively to target rectangular pixels, can perform the tracking well and the system does not make mistake

when target object has the same color with image background. Third, the target separation process from

background is simply performed in recursive stage.

6- Experimental Results

Simulations have been accomplished in MATLAB programming environment and the obtained results from

presented algorithm operations have been presented on two fields of video images in this section. Each video

frame size was in dimensions of 480 * 720 and for simulations has been used of computerized system with a

processor (cpu) dual 3 GHz and also temporary memory (RAM) 3 GHz. Moreover used MATLAB software

version was Matlab 2013-b. In this section we also deal with comparing the procedure of presented algorithm

performance with kernel algorithm in tracking of object in video and compare it with similar algorithms in terms

of computational complexity and other features such as performance at image blocking, image light change, and

target dimensions change.

For applying the desired algorithm to video images field under investigation, at first in preprocessing stage, the

address of placing target rectangular in the image has been determined manually and is applied in main code.

The address of target rectangular includes the length and width of target rectangular vertices pixels, that is

extracted from the primary video frame and enter in Matlab code. A histogram 16*16*16 is considered in color

model RGB for images frame by frame. In all tracking stage in each three video, and have


1003

been selected. The extracted images that are presented in the following, have represented with distance 30 frame

due to their numerous multiplicity.

For studying the algorithm performance when object under tracking has the similar color and histogram to

background space, algorithm was applied to a football game video. There are many players with the same cloth

color in football game video and, moreover the position of target is permanently changing compared to camera

that this causes the system confronts with the error. But we can overcome this problem in the proposed

algorithm using adaptive coefficients in central voting that was introduced in previous section and as can be

seen in figure (4) the system has tracked the target well:

Figure (4)

The next video is for studying the performance of tracking system in state that the object under tracking is not a

rigid object in video images sequence and its dimensions is variable during time. The target dimension change is

one of the most challenging subjects that makes the tracking system with trouble. The target example that its

dimensions change with time in video images is a tracking of a vehicle by a constant camera. As can be seen in

images of figure (5), the system seeks the target vehicle roughly as far as the target is faded from image. In this

video not only the dimensions of target vehicle get smaller by getting away from the camera, but also its state

and position will be changed:


1004

Figure (5)


1005

Figure (6) illustrates the obtained results of target tracking using conventional Spatiogram method. In this

algorithm by changing target dimensions, the system does not adapt itself with the new model, and it roughly

loses the target and it makes mistake:

Figure (6)

For studying computational complexity amount of the presented algorithm and its comparison with other

algorithms, we calculate the average amount of required time for processing a frame of a video and results have

been presented in table (1-5). Calculations have been performed on a system with processor power dual core

3GHz and also secondary memory 3 GHz. Although our algorithm has more computational complexity than

some presented conventional algorithms in this table, but it’s still fast enough for online and real-time tracking.

Table (3): Comparison the presented algorithm with other algorithms considering the processing speed [10]


1006

7-Conclutions

In this paper, we introduce an algorithm for tracking object in video sequence. In this algorithm, object color

histogram has been used as object specific feature for its detection in video sequence. In the following, we

introduced the central Hough transform and Spatiogram model and studied how to use these algorithms in a

general algorithm. Furthermore, we developed these two algorithms that led to a new algorithm which has more

capabilities and flexibilities compared to the conventional Spatiogram. In new method both modeling and

positioning method have been developed. In addition to developing two modeling and positioning stages, the

other section has been added in each frame called model parameters updating that makes the system to be able

to track if target and background environment has the same color and also if target is a non-rigid object. The

obtained results of algorithm operations in this paper have been presented in several series of different video

frame and in various conditions. At first, we studied its performances in pursuit of football player as a condition

that target has the same color with background, and the system could accomplish it well. In the next step, we use

the system for a vehicle tracking, this test was performed for evaluating the system performance in non-rigid

tracking. Finally, we compared the system with conventional Spatiogram algorithm in terms of performance and

also compared with several target tracking algorithms in view of computational complexity. Our results indicate

the performance improvement.

References

[1] A. Yilmaz, O. Javed, M. Shah, Object tracking: a survey, ACM Comput. Surv. 38 (2006) 13.

[2] M. Isard, A. Blake, CONDENSATION—conditional density propagation for visual tracking, Int. J. Comput.

Vision 29 (1998) 5–28.

[3] Y. Shi, W.C. Karl, Real-time tracking using level sets, Proceedings of IEEE Conference on Computer Vision

and Pattern Recognition, 2, 2005, pp. 34–41.

[4] J. Shi, C. Tomasi, Good features to track, Proceedings of IEEE Conference on Computer Vision and Pattern

Recognition, 1994, pp. 593–600.

[5] C. Tomasi, T. Kanade, Detection and tracking of point features, Technical Report, Carnegie Mellon

University, 1991.

[6] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vision 60 (2004) 91–

110.

[7] H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (surf), Comput. Vis. Image Underst.

110 (2008) 346–359.

[8] G.R. Bradski, Real time face and object tracking as a component of a perceptual user interface, IEEE

Workshop on Applications of Computer Vision, 1998, pp. 214–219.

[9] D. Comaniciu, V. Ramesh, P. Meer, Kernel-based object tracking, IEEE Trans. Pattern Anal. Mach. Intell.

25 (2003) 564–575.

[10] Li, X., Hu, W., Shen, C., Zhang, Z., Dick, A., & Hengel, A. V. D. (2013). A survey of appearance models

in visual object tracking. ACM Transactions on Intelligent Systems and Technology (TIST), 4(4), 58.

Object Tracking in Video Sequence Images Based on Color...

Documents

Transcript of Object Tracking in Video Sequence Images Based on Color...