Monocular Vehicle Detection and Trackingacsweb.ucsd.edu/~yuw176/report/vehicle.pdf · Monocular...

Monocular Vehicle Detection and Tracking

Yufei WangDepartment of Electrical and Computer Engineering

University of California San DiegoLa Jolla, California 92037

E-mail: [email protected]

Abstract—This project implements a vehicle detection andtracking system. This framework uses Haar cascade classifierfor vehicle detection, and uses car-light features for validation.In order to smooth the detection and refine the detection resultfurther, Kalman tracking is applied to every car hypothesis, ad-ditionally, a three-stage tracking is employed. Experiment showsthat the car-light validation can largely reduce the false alarms,meanwhile preserving the detection rate; and that tracking canimprove the detection result further. Given limited training set,although the Haar cascade classifier produces a large amount offalse alarms, the detection system still yields satisfactory resultwith the refinement and tracking step.

I. INTRODUCTION

Worldwide, millions of people are killed or injured inmotor vehicle collisions, and financial costs to both societyand individuals are significant. Therefore, a pre-crash vehiclesystem is of a great interest to researchers as well as vehiclemanufactures.

The pre-crash vehicle systems is very challenging. Oneof its main challenges is that it requires accurate detectionof on-road vehicles. The detection problem is a real worldproblem, the variety of vehicles’ appearance and the changingenvironments put great obstacles to the tackle of this problem.In the past decades, many techniques have been used todetect on-road vehicles, including radar, lidar, and computervision techniques. Thanks to the development of cameras andcomputational devices, the computer vision approach can beimplemented in real time and has captured more and moreattention. Many approaches have been developed to deal withmonocular vision-based vehicle detection. For feature extrac-tion, popular features such as HOG features, SIFT, SURF, andHaar-like features were tried, and a variety of classifiers wereemployed, such as SVM, Adaboost classifier, hidden Markovmodel classification, etc. ( [5]). For tracking, particle filteringand Kalman filtering are among the popular methods ( [5] [4]).

The main novelty of this project is the usage of car-lightfeature for target validation. The system involves three steps.Car targets are generated with Haar cascade classifier; Targetvalidation is implemented with car-light feature; Trackingsmooth and refine the detection results further. The detectionand tracking result is evaluated on real-world video.

The rest of this report is organized as follows. In SectionII a brief overview of related works in vehicle detection andobject detection is presented. Section III gives the frameworkof the vehicle detection and tracking system. Section IV givesthe experiment result and analysis of the detection and trackingsystem. Section V gives some future work that can be done.

II. RELATED RESEARCH

Many features and classification methods have been triedon vehicle-detection problem. Among them, Haar-like featuresand boosted cascade are preferred by many researchers.

In [1] haar-like features is used for object detection andthe results are compelling. The advantage is that it is easy andfast to compute and the rectangular feature is representative forvehicle detection. Adaboost classifier is first proposed in [2].By combining weak classifiers with certain weights, AdaBoostcan achieve good results. [6] used haar-like features andAdaboost as classifier for face detection. However, the problemof computational time remains: there are too many haar-likefeatures in a small patch (with different sizes and positions).Although by using integral image the computing time of haar-like features is reduced greatly, it is still too time consuming tocompute every haar-like feature for all the patches. Therefore,boosted cascade is proposed. It consists of several stages. Foran object candidate, only when it passes one stage can it enterthe next stage, and if it fails any stage, it will be classifiedas background. This largely reduces the computing time. Thehaar-like feature & boosted cascade produces very compellingresults in face detection. Rear facing cars has many rectanglepatterns which may be easily represented by haar-like features,and boosted cascade is very efficient classifier so that thedetection system can run in real time. That is the reason Ichoose haar-like features and boosted cascade. Also, manyhave used haar cascade for vehicle detection with satisfactoryresults ( [4]).

Vehicle detection normally consists of two step: first, allregions that can be viewed as vehicle candidates are identified;second, the candidates are verified and tracked. In the firststage, many researchers use shadows underneath vehicles asa clue indicating the presence of a vehicle. [3] suggests agradient based method: due to shadows, wheels and bumpersin the bottom rear view of a vehicle, there will be a negativehorizontal gradient.

III. METHOD

The framework is shown in Fig. 1. In this section, haircascade is introduced briefly first. Then validation using car-light feature and tracking steps are detailed.

A. Haar Cascade Classification

The combination of haar-like features and boosted cascadeclassifier is used for object detection by many. Haar-likefeatures can be divided into three class: two-rectangle featureswhich can detect edges; three-rectangle features to detect

Fig. 1: Framework of System Algorithm

Fig. 2: Car candidates detected by Haar cascade

lines; four-rectangle features to detect diagonal edges. Boostedcascade is a cascade of weak classifiers (mostly decisionstumps), each stage consists of several weak classifiers, and forevery candidate, it is put through each stage. The candidateswhich passes through all the stages are classified as positive,while those which are rejected by any stage are classified asnegative. The advantage of the cascade is that majority of thecandidates are negative, and they usually cannot pass the firstfew stages, therefore the computing time is greatly reduced bythe early-stage rejection.

B. Car-light Refinement

After the sub images in a frame are classified as non-vehicle or vehicle, the vehicle candidates are generated. Asis shown in Fig. 2, there are many false alarms among thecandidates. That is due to the limitation of the training set thatis used, which will be elaborated in the experiment section.

To reduce the false alarm to the largest, car-light feature isintroduced.

Fig. 3: Car light feature. Top row: detected cars. Middle row:Cr component of top row. Bottom row: Thresholding of themiddle row.

On-road tracking of vehicles mainly deals with rear-viewvehicles that appear in front of the camera. Regardless of theshape, texture of color of the vehicles, the rear view of themshares a common features: they have red lights in the middle.As shown in Fig. 3, despite the variation of the color of thevehicle or the lighting condition, the rear view of the vehiclehas the red car light on left and right side of it.

To extract the car-light feature described above, the colorspace is first transformed to YCbCr color space. Y compo-nent corresponds to illumination, and Cb and Cr componentcorrespond to red and blue chroma components. In this colorspace, the influence of illumination difference is reduced. Crcomponent is of our interest, and the Cr component of thearea is shown in the second row of Fig. 3. It can be observedthat illumination changes can still impact the Cr value of carlight’s. Otsu’s method is used to obtain an adaptive thresholdof the Cr sub image. With the Otsu’s method, impact ofillumination is reduced largely, and the two lights are extracted.The employment of Otsu’s method is illustrated in bottom rowof Fig. 3. The area to be segmented only contains the middle ofthe original vehicle area. Finally, car-light feature is extracted.Currently, a heuristic but effective threshold is applied to theblack-and-white car-light image:

CarLight =

⇢1, E

bw

(r)� E

bw

(m) > T or E

bw

(l)� E

bw

(m) > T

0, otherwise

(1)where r/l/m is the three sub images (right/left/middle) ofthe bottom row of Fig. 3, and E

bw

(area) stands for meanvalue of the black-and-white thresholded area. T is predefinedthreshold. CarLight = 1 verifies the car candidate, whereasCarLight = 0 rejects the candidate.

However, when the car is red, as is shown in the firstcolumn of Fig. 3, when the vehicle itself is red, the abovemethod is invalid. Therefore a different thresholding is em-ployed: thresholding is applied to Cr image, and if the meanCr value of the candidate area is larger than a predefinedthreshold, then the area is validated as detected car area:

RedCarLight =

⇢1, E

Cr

(area) > T

Cr

0, otherwise

(2)

The car-light feature has the advantage of using color infor-mation to eliminate false alarms, which is a good complementto cascade classifier (which uses only gray level information).

C. Tracking

After detection, tracking is employed to smooth and refinethe detection result further.

1) Kalman tracking: Kalman tracking is employed tosmooth the detection result.

I describe the state with 6 dimensions: X =[s

x

, s

y

, width, height, v

x

, v

y

]0, where (sx

, s

y

) denotes the co-ordinate of the center of the area, (width, height) denotes thesize of the area, and (v

x

, v

y

) denotes the velocity of its center.

For the measurement Z, we use 4D vectors:Z =[z

x

, z

y

, z

width

, z

height

]0, where (zx

, z

y

) represents the positionof the observed vehicle, and (z

width

, z

height

) denotes theobserved size of the vehicle.

2) Three-stage hypothesis tracking: On top of Kalmantracking, I assume there are three stages of a target hypothesis:hypothesis generation, hypothesis tracking, and hypothesisremoval.

• Hypothesis generation: when a newly detected areaappears and last for more than n1 frames, a vehiclehypothesis is generated.

• Hypothesis tracking: Kalman tracking of every vehiclehypothesis, which consists of two stage: prediction,where the state of each hypothesis is predicted, andupdate, where the state of each hypothesis is updatedbased on prediction and observation.

• Hypothesis removal: when a hypothesis is not detectedfor more than n2 frames, the hypothesis is removed.

n1 and n2 are predefined parameters. The three stages oftracking refines the detection result further, for it can eliminatethe false alarms that doesn’t survive long enough, and can keeptrack of the vehicles which are shortly missing in detectionstep.

IV. RESULTS AND ANALYSIS

A. Experiment Dataset

The dataset for training and test is the LISA-Q Front FOVdata set, which consists of three video sequence, consisting of1600, 300, and 300 frames respectively. The three video clipshave different lighting condition and traffic condition. The firstdataset is used as training data, and the other two are used astest data. This is because the number of vehicles in the firstvideo is relatively large.

B. Experiment Parameters

For cascade training, 2000 positive images are randomlychosen from the 1600-frame training dataset. 1300 negativeimages are chosen from 325 non-vehicle images, the scenesof which fall into highway,coast, mountain, open country,building, and street. The training images are resized to 40*40patches. The number of sages is 20. The maximum false alarmrate for each stage is 0.5, which is relatively loose, and theminimum detection rate for each stage is 0.995. The weakclassifiers are decision stumps.

For car-light refinement, choose T = 20, TCr

= 200.

For tracking, choose n1 = 2, n2 = 1.

Fig. 5: Result of car-light refinement: left two columns: vali-dated candidates; right two columns: rejected candidates

Fig. 6: Result of car-light refinement: left: before validation;right: after validation

C. Result

The system is tested on the two videos, 300 frames of each.The result is shown in Fig. 4. I use performance metrics putforward by [4]: TPR = Detected vehicles / Total vehicles. FDR= False positives/(Total vehicles+False positives). FP/Frame =False positives / Frames. TP/Frame = True positives / Frames.FP/Object = False positives / True vehicles.

The result is compared with the result using ALVeRTsystem in [4], as is shown in Table I and Table II.

D. Result Analysis

1) Car-light refinement: Car-light refinement can reducemost of the false alarm generated by the previous stage. Someexamples are shown in Fig. 5. The left two candidates arevalidated, while the right two false alarms are successfullyeliminated. The effectiveness can be illustrated more clearly byFig. 6. Before validation, there are many false alarms, whileafter validation, only the true vehicle is validated. This canalso be illustrated in Table I and Table II, where my methodhas lower false alarm rate than ALVeRT in [4].

2) Tracking: The advantage of tracking method is that itcan smooth and refine the detection result. As is shown inFig. 7, without tracking method, the red car in the middle ismissed in frame i, whereas it can still be tracked with tracking,because it is detected and tracked in the last frame.

3) Failure: There are several reasons for failed case:

• shadows or severe illumination changes

• cars that are not strictly rear-faced

• sometimes part of the cars can be mistakenly detected

Fig. 4: Experiment Result. Each row shows one experiment result. Each column corresponds to intermediate result of certainstage. Column: 1. input frame; 2. detection using cascade classifier; 3. refinement with car-light feature; 4. after tracking.

Tracking System TPR FDR FP/Frame TP/Frame FP/ObjectDetection-tracking system 98.67% 0.33% 0.003 0.99 0.003ALVeRT 91.7% 25.5% 0.39 1.14 0.31

TABLE I: Data Set 2: March 9, 2009, 9 A.M., Urban, Cloudy

Tracking System TPR FDR FP/Frame TP/Frame FP/ObjectDetection-tracking system 93.33% 4.05% 0.127 2.8 0.042ALVeRT 99.8% 8.5% 0.28 3.17 0.09

TABLE II: Data Set 3:April 21, 2009, 12.30 P.M., Highway, Sunny

Fig. 7: Result of tracking: left: without tracking; right: withtracking

V. CONCLUSION AND FUTURE WORK

The project builds a vehicle detection and tracking systembased on monocular vision. Haar cascade is used for targetgeneration, and a ”car-light feature” is found useful for targetvalidation, and a three stage tracking combined with Kalmantracking is used. The main valuable future work that I wouldlike to do is to further refine the car-light features. The featurecan run quite well on the two testing videos, but when applyingto more datasets that are in more diverse lighting conditionsand with more vehicles, it can be foreseen that some problems

will occurs, because the car-light features I developed currentlyare heuristic and doesn’t use machine learning algorithms, andit can be generalized to build a more robust feature.

REFERENCES

[1] M. Oren C. Papageorgiou and T. Poggio. A general framework for objectdetection. In International Conference on Computer Vision, 1998.

[2] Yoav Freund and Robert E. Schapire. A decision-theoretic generalizationof on-line learning and an application to boosting. 2011.

[3] A. Khammari; F. Nashashibi; Y. Abramson; C. Laurgeau. Vehicle detec-tion combining gradient analysis and adaboost classification. IntelligentTransportation Systems, 2005. Proceedings., pages 66–71, 2005.

[4] M.M. Trivedi S. Sivaraman. A general active-learning framework for on-road vehicle recognition and tracking. Intelligent Transportation Systems,IEEE Transactions on, 11(2):267–276, June 2010.

[5] S. Sivaraman; M.M. Trivedi. Looking at vehicles on the road: Asurvey of vision-based vehicle detection, tracking, and behavior analysis.Intelligent Transportation Systems, IEEE Transactions on, 14(4):1773–1795, 2013.

[6] M. Viola, P.; Jones. Rapid object detection using a boosted cascade ofsimple features. In Proceedings of the 2001 IEEE Computer SocietyConference on, volume 1, pages 511–518, 2001.

Monocular Vehicle Detection and Trackingacsweb.ucsd.edu/~yuw176/report/vehicle.pdf · Monocular...

Documents

Transcript of Monocular Vehicle Detection and Trackingacsweb.ucsd.edu/~yuw176/report/vehicle.pdf · Monocular...