Real-Time Eye, Gaze, and Face Pose Tracking for Monitoring ...qji/rti.pdf · Real-Time Eye, Gaze,...

21
Real-Time Eye, Gaze, and Face Pose Tracking for Monitoring Driver Vigilance T his paper describes a real-time prototype computer vision system for monitoring driver vigilance. The main components of the system consists of a remotely located video CCD camera, a specially designed hardware system for real-time image acquisition and for controlling the illuminator and the alarm system, and various computer vision algorithms for simultaneously, real-time and non-intrusively monitoring various visual bio-behaviors that typically characterize a driver’s level of vigilance. The visual behaviors include eyelid movement, face orientation, and gaze movement (pupil movement). The system was tested in a simulating environment with subjects of different ethnic backgrounds, different genders, ages, with/without glasses, and under different illumination conditions, and it was found very robust, reliable and accurate. # 2002 Elsevier Science Ltd. All rights reserved. Qiang Ji 1 and Xiaojie Yang 2 1 Department of Electrical, Computer, and System Engineering Rensselaer Polytechnic Institute, Troy, NY 12180, USA. E-mail: [email protected] 2 Department of Computer Science, University of Nevada, Reno, NV 85977 USA. E-mail: [email protected] Introduction The ever-increasing number of traffic accidents in the US due to a diminished driver’s vigilance level has become a problem of serious concern to society. Drivers with a diminished vigilance level suffer from a marked decline in their abilities of perception, recognition, and vehicle control, and therefore pose serious danger to their own life and the lives of other people. Statistics show that a leading cause for fatal or injury-causing traffic accidents is due to drivers with a diminished vigilance level. In the trucking industry, 57% fatal truck accidents are due to driver fatigue. It is the number 1 cause for heavy truck crashes. Seventy percent of American drivers report driving fatigued. With the ever-growing traffic conditions, this problem will further deteriorate. For this reason, developing systems actively monitoring a driver’s level of vigilance and alerting the driver of any insecure driving conditions is essential to prevent accidents. Many efforts [1–9] have been reported in the literature for developing active safety systems intended for reducing the number of automobile accidents due to 1077-2014/02/$35.00 r 2002 Elsevier Science Ltd. All rights reserved. Real-Time Imaging 8, 357–377 (2002) doi:10.1006/rtim.2002.0279, available online at http://www.idealibrary.com on

Transcript of Real-Time Eye, Gaze, and Face Pose Tracking for Monitoring ...qji/rti.pdf · Real-Time Eye, Gaze,...

Real-Time Imaging 8, 357–377 (2002)doi:10.1006/rtim.2002.0279, available online at http://www.idealibrary.com on

Real-Time Eye, Gaze, and Face PoseTracking for Monitoring Driver Vigilance

This paper describes a real-time prototype computer vision system for monitoring drivervigilance. The main components of the system consists of a remotely located video CCDcamera, a specially designed hardware system for real-time image acquisition and for

controlling the illuminator and the alarm system, and various computer vision algorithms forsimultaneously, real-time and non-intrusively monitoring various visual bio-behaviors thattypically characterize a driver’s level of vigilance. The visual behaviors include eyelid movement,face orientation, and gaze movement (pupil movement). The system was tested in a simulatingenvironment with subjects of different ethnic backgrounds, different genders, ages, with/withoutglasses, and under different illumination conditions, and it was found very robust, reliable andaccurate.

# 2002 Elsevier Science Ltd. All rights reserved.

Qiang Ji1and Xiaojie Yang

2

1Department of Electrical, Computer,and System Engineering Rensselaer Polytechnic Institute,

Troy, NY 12180, USA.E-mail: [email protected]

2Department of Computer Science,University of Nevada, Reno, NV 85977 USA.

E-mail: [email protected]

Introduction

The ever-increasing number of traffic accidents in theUS due to a diminished driver’s vigilance level hasbecome a problem of serious concern to society. Driverswith a diminished vigilance level suffer from a markeddecline in their abilities of perception, recognition, andvehicle control, and therefore pose serious danger totheir own life and the lives of other people. Statisticsshow that a leading cause for fatal or injury-causingtraffic accidents is due to drivers with a diminishedvigilance level. In the trucking industry, 57% fatal truck

1077-2014/02/$35.00

accidents are due to driver fatigue. It is the number 1cause for heavy truck crashes. Seventy percent ofAmerican drivers report driving fatigued. With theever-growing traffic conditions, this problem will furtherdeteriorate. For this reason, developing systems activelymonitoring a driver’s level of vigilance and alerting thedriver of any insecure driving conditions is essential toprevent accidents.

Many efforts [1–9] have been reported in the literaturefor developing active safety systems intended forreducing the number of automobile accidents due to

r 2002 Elsevier Science Ltd. All rights reserved.

358 Q. JIANDX.YANG

reduced vigilance. Among different techniques, the bestdetection accuracy is achieved with techniques thatmeasure physiological conditions like brain waves,heart rate, and pulse rate [8,10]. Requiring physicalcontact with drivers (e.g. attaching electrodes) toperform, these techniques are intrusive, causing annoy-ance to drivers. Good results have also been reportedwith techniques that monitor eyelid movement and gazewith a head-mounted eye tracker or special contact lens.Results from monitoring head movement [11] with ahead-mount device are also encouraging. These techni-ques, though less intrusive, are still not practicallyacceptable. A driver’s state of vigilance can also becharacterized by the behaviors of the vehicle he/sheoperates. Vehicle behaviors including speed, lateralposition, turning angle, and moving course are goodindicators of a driver’s alertness level. While thesetechniques may be implemented non-intrusively, theyare, nevertheless, subject to several limitations includingthe vehicle type, driver experiences, and driving condi-tions [2].

People in fatigue exhibit certain visual behaviorseasily observable from changes in their facial featureslike the eyes, head, and face. Typical visual character-istics observable from the image of a person withreduced alertness level include slow eyelid movement[12, 13], smaller degree of eye openness (or even closed),frequent nodding [14], yawning, gaze (narrowness in theline of sight), sluggish in facial expression, and saggingposture. To make use of these visual cues, anotherincreasingly popular and non-invasive approach formonitoring fatigue is to assess a driver’s vigilance levelthrough visual observation of his/her physical condi-tions using a camera and state-of-the-art technologies incomputer vision. Techniques using computer vision areaimed at extracting visual characteristics that typicallycharacterize a driver’s vigilance level from his/her videoimages. In a recent workshop [15] sponsored by theDepartment of Transportation (DOT) on driver’svigilance, it is concluded that computer vision representsthe most promising non-invasive technology to monitordriver’s vigilance.

Many efforts have been reported in the literatureon developing active real-time image-based fatiguemonitoring systems [1–5, 7, 8, 13, 16–19]. These effortsare primarily focused on detecting driver fatigue. Forexample, Ishii et al. [7] introduced a system forcharacterizing a driver’s mental state from his facialexpression. Saito et al. [1] proposed a vision system todetect a driver’s physical and mental conditions from

line of sight (gaze). Boverie et al. [3] described a systemfor monitoring driving vigilance by studying the eyelidmovement. Their preliminary evaluation revealed pro-mising results of their system for characterizing a driver’svigilance level using eyelid movement. Ueno et al. [2]described a system for drowsiness detection by recogniz-ing whether a driver’s eyes are open and closed, and, ifopen, computing the degree of openness. Their studyshowed that the performance of their system is compar-able to those of techniques using physiological signals.

Despite the success of the existing approaches/systemsfor extracting characteristics of a driver using computervision technologies, current efforts in this area, however,focus on using only a single visual cue such as eyelidmovement or line of sight or head orientation tocharacterize driver’s state of alertness. The systemrelying on a single visual cue may encounter difficultywhen the required visual features cannot be acquiredaccurately or reliably. For example, drivers with glassescould pose serious problem to those techniques based ondetecting eye characteristics. Glasses can cause glare andmay be total opaque to light, making it impossible forcamera to monitor eye movement. Furthermore, thedegree of eye openness may vary from people to people.Another potential problem with the use of a single visualcue is that the obtained visual feature may not always beindicative of one’s mental conditions. For example, theirregular head movement or line of sight (like brieflylook back or at the mirror) may yield false alarms forsuch a system.

All those visual cues, however imperfect they areindividually, if combined systematically, can provide anaccurate characterization of a driver’s level of vigilance.It is our belief that simultaneous extraction and use ofmultiple visual cues can reduce the uncertainty andresolve the ambiguity present in the information from asingle source. The system we propose can simulta-neously, non-intrusively, and in real-time monitorseveral visual behaviors that typically characterize aperson’s level of alertness while driving. These visualcues include eyelid movement, pupil movement, and faceorientation. The fatigue parameters computed from thisvisual cues are subsequently combined probabilisticallyto form a composite fatigue index that can robustly,accurately, and consistently characterize one’s vigilancelevel. Figure 1 gives an overview of our driver vigilancemonitoring system.

The paper focuses on the development of computervision algorithms and the necessary hardware compo-

Extract Visual Cues

Information Fusion (Bayesian Networks)

WarningActivate

Yes

No

InformationContextual

Operator's Images

physical fitnesssleep historytime of daytemperature

face orientationeye lid movement

gaze

Fatigue ?

Figure 1. A flowchart of the proposed vigilance monitoringsystem.

REAL-TIMEDRIVERVIGILANCEMONITORING 359

nents to extract the needed visual cues. The issue ofsensory data fusion will be dealt with in a separatepaper. Figure 2 gives an overview of our visual cuesextraction system for driver fatigue monitoring. Thesystem starts with pupil detection and tracking, which isthen used for eyelid movement monitoring, gaze

monitoringFace orientationGaze

Determination

Pupil and glint tracking Eye closure

speed PERCLOS

monitoringEyelid movement

Yes

Success ?No

trackingPupil detection &

properties of pupils

Images Video

Figure 2. Overview of the driver vigilance monitoring system.

estimation, and face orientation determination, respec-tively.

Image Acquisition System

Introduction

Image understanding of visual behaviors starts withimage acquisition. The purpose of image acquisition isto acquire the video images of the driver face in realtime. The acquired images should have relativelyconsistent photometric property under different cli-matic/ambient conditions and should produce distin-guishable features that can facilitate the subsequentimage processing. To this end, the person’s face isilluminated using a near-infrared (NIR) illuminator.The use of infrared (IR) illuminator serves threepurposes: first, it minimizes the impact of differentambient light conditions, therefore ensuring imagequality under varying real-world conditions includingpoor illumination, day, and night; second, it allows toproduce the bright pupil effect, which constitutes thefoundation for detection and tracking the proposedvisual cues. Third, since NIR is barely visible to thedriver, this will minimize any interference with thedriver’s driving.

According to the original patent from Hutchinson[20], a bright pupil can be obtained if the eyes areilluminated with a NIR illuminator beaming light alongthe camera optical axis at certain wavelength. At theNIR wavelength, pupils reflect almost all IR light theyreceive along the path back to the camera, producing thebright pupil effect, very much similar to the red eye effectin photography. If illuminated off the camera opticalaxis, the pupils appear dark since the reflected light willnot enter the camera lens. This produces the so-calleddark pupil effects. Figure 3 illustrates the principle ofbright and dark pupil effects. An example of the bright/dark pupils is given in Figure 4.

To produce the desired bright pupil effect, the IRilluminator theoretically must locate along the opticalaxis of the lens. Our experiment reveals that it isphysically difficult to place IR light-emitting diodes(LEDs) along the optical axis since it may block the viewof the camera, limiting the camera’s operational field ofview. Others [11] suggested the use of a beam splitter toachieve this purpose without blocking camera view. Theset-up, however, is complex.

CCD camera

CCD camera

infrared source

infrared source

Pupil

Eyes

Pupil

Bright Pupil

Dark Pupil

reflected infrared light

incident infrared light

reflected infrared light

incident infrared light

Figure 3. Principle of bright and dark pupil effects.

IR illuminator

Adjustable lens

CCD Camera

Infrared bandpass filter

Outer ring IR LEDS Inner ring IR LEDs

Frontview Sideview

Figure 5. IR light source configuration.

360 Q. JIANDX.YANG

We present a simple geometric disposition of the IRLEDs similar to that of morimoto et al. [21] that canachieve the same bright pupil effect but with minimalreduction in camera operational view. Specifically, our IRilluminator consists two sets of IR LEDs, distributedevenly and symmetrically along the circumference of twocoplanar concentric rings as shown in Figure 5. The centerof both rings coincides with the camera optical axis.

Mounted in front of the lens, the optimal sizes of theLED rings are determined empirically so that a darkimage is produced if the outer ring is turned on and abright pupil image is produced if the inner ring is turnedon. It represents a trade-off between producing the idealbright/dark pupil images and minimizing the occlusionof the camera view. The ring configuration achieves thesame bright-pupil effect as with LEDs mounted alongthe optical axis. Besides the advantages of easyinstallation and a minimal reduction in camera’soperational field of view, since LEDs are arrangedsymmetrically around the camera optical axis, they can

Figure 4. IR illuminated eyes: (a) dark pupil image generatedby IR LEDs off the camera optical axis; (b) bright pupil imagegenerated by IR LEDs along the camera optical axis.

cancel shadows generated by LEDs. Furthermore, theuse of multiple IR LEDS can generate a strong lightsuch that the IR illumination from the illuminatordominates the IR radiation exposed to the driver’s face,therefore greatly minimizing the IR effect from othersources. This ensures the bright pupil effect underdifferent climatic conditions. The use of more than oneLED also allows to produce the bright pupil for subjectsfar away (3 f) from camera. To further minimizeinterference from light sources beyond IR light and tomaintain uniform illumination under different climaticconditions, a narrow bandpass NIR filter is attached tothe front of the lens to attenuate light beyond NIR range(700–900 nm). A physical set-up of the IR illuminator isshown in Figure 6.

The IR light source illuminates the user’s eye andgenerates two kinds of pupil images: bright and dark

Figure 6. An actual photograph of the two rings IRilluminator configuration.

Figure 7. (a) Bright and (b) dark pupil images with glints.

Figure 8. Examples of the acquired images with the desiredbright pupil effect. (a) without glasses, (b) with glasses; (c) withsunglasses.

REAL-TIMEDRIVERVIGILANCEMONITORING 361

pupil images as shown in Figure 7. The bright pupilimage is produced when the inner ring of IR LEDs isturned on and the dark image is produced when theouter ring is turned on. Note, the glint* appears on boththe dark and bright pupil images. Figure 8 presentsadditional examples of the acquired images using theimage acquisition system described above. These imagesdemonstrate the robustness of the system in that thedesired bright pupil effect is clear for images at differentdistances, orientations, magnifications, with and with-out glasses. It even works to certain degree withsunglasses.

For implementation in vehicle, we propose to use twoCMOS miniature CCD cameras embedded on thedashboard of the vehicle as shown in Figure 9. The firstcamera is a narrow angle camera, focusing on thedriver’s eyes to monitor eyelid movement while thesecond camera is a wide angle camera that focuses onthe driver’s head to track and monitor head movement.Two cameras may be motorized so that their movementscan be adjusted through a controller to obtain the bestpicture of the driver. Both cameras should have a ratherlarge depth of field, so that they can stay focused from aworking distance of 0.8–1.5 m. The field of angle isbetween 7451 on both the horizontal and verticaldirections.

*The small bright spot near the pupil, produced by cornealreflection of the IR light.

Pupil Detection and Tracking

The goal of pupil detection and tracking is forsubsequent eyelid movements monitoring, gaze determi-nation, and face orientation estimation. A robust,accurate, and real-time pupil detection is thereforecrucial. Pupil detection and tracking starts with pupildetection.

For this research, we detect pupils based on theirintensity, shape, and size. Due to the use of special IRillumination, pupils appear distinctively brighter thanthe rest of the face. Pupil intensity is, therefore, theprimary feature employed to detect pupils. To furtherseparate pupils from other bright objects in the images,additional properties of the pupils are used. Theseinclude pupils size, shape, the spatial relationshipsbetween pupils, and their motion characteristics. Giventhe detected pupils, pupils are then tracked efficientlyfrom frame to frame in real time based on Kalman

Das

hboa

rdCamera 2 andinfrared LEDs

Camera 1 andinfrared LEDs

Figure 9. Camera configuration inside a vehicle.

Figure 11. Background illumination interference removal(a) the even image field obtained with both ambient and IRlight; (b) the odd image field obtained with only ambientlight; (c) the image resulting from subtraction of (b) from (a).

362 Q. JIANDX.YANG

filtering. Figure 10 gives an overview of the eye trackingsystem.

The system has two phases-pupil detection and pupiltracking. Pupil detection starts with a pre-processing toremove external illumination interference, followed by aglobal search of the whole image to locate the first pairof pupils in the image. Pupil tracking locally searches thepupils based on the pupils’ positions in prior frames. Inthe sections to follow, we discuss in detail each aspect.

Pupil Detection

Pupil Tracking

No

Yes

Yes

2) Localization via globa1) Background removal

3) Verification

2) Localization via local 1) Prediction via Kalman

3) Verification

NoSuccess ?

Success ?

Video Images

Figure 10. Pupil detection and tracking system flowchart.

Pupil detection

Pupil detection involves locating pupils in the image. Itconsists of two steps: illumination interference removaland pupil detection.

Illumination interference removal via image subtraction.The detection algorithm starts with a pre-processing tominimize interference from illumination sources otherthan the IR illuminator. This includes sunlight andambient light interference. Figure 11(a) shows an imagewhere parts (upper left) of the background look verybright, almost as bright as the pupil. A simple globalthresholding of the image solely based on intensity maynot always uniquely detect the pupils due to the presenceof other bright regions in the image. To uniquely detectpupils, other bright areas in the image must be removed

intensity

shapesize

motion

l search

search

Contextual knowledge

Figure 13. External IR Illumination interference removal:(a) the image field obtained with both the external and IRinternal lights; (b) the image field obtained with only theexternal IR and the ambient light; (c) the image resulting fromsubtraction of (b) from (a).

REAL-TIMEDRIVERVIGILANCEMONITORING 363

or they may adversely affect pupil detection. Thebackground clusters removal is accomplished by sub-tracting the image with only ambient light illuminationfrom the one illuminated by both the IR illuminator andthe ambient light. The resultant image contains theillumination effect from only the IR illuminator, there-fore with bright pupils and relatively dark background.This method has been found very effective in improvingthe robustness and accuracy of our eye trackingalgorithm under strong external illumination interfer-ence, even under strong incoming light or another IRsource nearby, as shown in Figures 11–13.

For real eye tracking, the image subtraction must beimplemented efficiently in real time. To achieve this, wedesign a video decoder that detects from each interlacedimage frame (camera output) the even and odd fieldsignals, which is then used to alternately turn the outerand inner IR rings on to produce the dark and brightpupil image fields. A program was then written toseparate each frame into two image fields (even andodd), representing the bright and dark pupil imagesseparately. The even image field is then digitallysubtracted from the odd image field to produce thedifference image. Images in Figures 11–13 are suchproduced. Figure 14 depicts the block diagram of theimage subtraction controller.

Determining the initial pupils positions. Given the imageresulted from the removal of the external illuminationdisturbance, pupils may be detected by searching the

Figure 12. Strong incoming illumination interference re-moval (a) the image field obtained with both incoming lightand IR light; (b) the image field obtained with only theincoming light; (c) the image resulting from subtraction of(b) from (a).

entire image to locate two bright regions that satisfycertain size, shape, and distance constraints. To do so, asearch window scans through the image. At eachlocation, the portion of the image covered by thewindow is examined to determine the number ofmodality of its intensity distribution. It is assumed thatthe intensity distribution follows an unimodal distribu-tion if the pupil is not covered by the window andfollows a bimodal intensity distribution if the windowincludes the pupil. A thresholding is then applied to thewindow image if its intensity distribution is determinedto be bimodal. The threshold is determined automati-cally by minimizing Kullback information distance [22].This yields a binary image consisting of binary blob thatmay contain a pupil. The binary blob is then validatedbased on its shape, size, its distance to the other detectedpupil, and its motion characteristics to ensure it is apupil. The validation step is critical since some regionsof the image such as the glares of the glasses (seeFigure 15, glares cannot be removed by the imagesubtraction procedure) are equally bright. They may bemistaken for pupils without the verification procedure.The window moves to next position if the validationfails. The centroids of the blob are returned as theposition of the detected pupil if the validation succeeds.This then repeats to detect another pupil.

Pupil tracking via Kalman filtering

To continuously monitor the person, it is important totrack his/her pupils from frame to frame in real time.

CCD Camera

evensignal odd

signal

videodecoder

frame grabber

IR Illuminator computer

Figure 14. Block diagram of the image subtraction circuitry.

364 Q. JIANDX.YANG

This can be done by performing a pupil detection ineach frame. This brutal force method, however, willsignificantly slow down the speed of pupil tracking,making real-time pupil tracking impossible since it needsto search the entire image for each frame. This can bedone more efficiently by using scheme of prediction andlocalization. Prediction involves determining the ap-proximate locations of pupils in next frame based on itscurrent location. Localization determines the exactlocation via a local search. The first step ensures efficientaccurate localization for the second step since it limitssearch area in subsequent frames to a small area.

Two important factors need be considered whenimplementing this scheme. The first one is the searchwindow size for the second step. A large search windowresults in unnecessary search and is time consumingwhile a too small search window may easily lose thepupil information. Several factors may influence the

Figure 15. The glares on the eye frame are equally as brightas the pupils. Verification can eliminate them from beingconsidered as pupils.

search window size including pupil size and uncertaintywith the predicted position. Pupil size varies withdistance and subjects, and positional uncertainty de-pends on the feature detector and the noise character-istics of the image. An effective way is to use an adaptivesearch window, whereby the search area is determinedautomatically based on pupil size and location error.Kalman filtering [23] provides a mechanism to accom-plish this. A Kalman filter is a set of recursive algorithmswhich estimate the position and uncertainty of movingobjects in the next time frame, that is, where to look forthe pupils, and how large a region should be searched inthe next frame, around the predicted position, to be sureto find the pupils with certain confidence. It recursivelyconditions current estimate on all of the past measure-ments and the process is repeated with the previousposterior estimates used to project or predict the new apriori estimates. This recursive nature is one of the veryappealing feature of the Kalman filter–it makes practicalimplementation much more feasible.

Our pupil tracking method based on Kalman filteringcan be formalized as follows. A sequence of imageframes is captured. The image sequence is sampled ateach frame t, which is then processed to determine pupilposition. The state of a pupil at each time instance(frame) can be characterized by its position and velocity.Let (ct, rt) represent the pupil pixel position (its centroid)at time t and (ut, vt) be its velocity at time t in c and r

directions, respectively. The state vector at time t can,therefore, be represented as xt=(ct rt ut vt)

t.

According to the theory of Kalman filtering [24], xtþ1,the state vector at the next time frame t+1, linearlyrelates to current state xt by the system model as follows:

xtþ1 ¼ Fxt þ wt ð1Þ

where F is the state transition matrix and wt representssystem perturbation, normally distributed aswt � Nð0;QÞ.

If we assume pupil movement between two consecu-tive frames is small enough to consider the motion ofpupil positions from frame to frame as uniform, thestate transition matrix can be parameterized as

F ¼

1 0 1 00 1 0 10 0 1 00 0 0 1

2664

3775

We further assume that a fast feature detector estimateszt ¼ ðcct; rrtÞ, the estimated pupil position at time t.

REAL-TIMEDRIVERVIGILANCEMONITORING 365

Therefore, the measurement model in the form neededby the Kalman filter is

zt ¼ Hxt þ vt ð2Þ

where matrix H relates current state to currentmeasurement and vt represents measurement uncer-tainty, normally distributed as vt � Nð0;RÞ. For sim-plicity and since zt only involves position, H can berepresented as

H ¼1 0 0 00 1 0 0

� �

The feature detector (e.g. thresholding or correlation)searches for the region determined by the covariancematrix

P�1tþ1 to find the feature point at time tþ 1. The

detected point is then combined with the predictionestimation to produce the final estimate. Search regionautomatically changes based on

P�1tþ1.

For subsequent discussion, let us define a few morevariables. Let x�1

tþ1 be the estimated state at time t+1,resulted from using the system model. It is often referredto as the prior state estimate. xtþ1 differs from x�1

tþ1 inthat it is estimated using both the system model (Eqn(1)) and the measurement model (Eqn (2)). xtþ1 isusually referred as posterior state estimate. Let

P�1tþ1 andP

tþ1 be the covariance matrices for the state estimatesx�1tþ1 and xtþ1, respectively. They characterize the

uncertainties associated with the prior and posteriorstate estimates. The goal of Kalman filtering is thereforeto estimate xtþ1 and

Ptþ1 given xt,

Pt, zt, the system,

and measurement models. The Kalman filtering algo-rithm for state prediction and updating may besummarized below.

State prediction. Given current state xt and its covar-iance matrix

Pt, state prediction involves two steps:

state projection (x�1tþ1) and error covariance estimation

(P�1

tþ1) as summarized below:

x�1tþ1 ¼ Fxt ð3Þ

X�

tþ1¼ F

XtFt þQt ð4Þ

State updating. Given the prior estimate x�1tþ1, its

covariance matrixP�

tþ1, and current measurement ztþ1

resulted from a feature detection (via either a simplethresholding or correlation method) in the neighbor-hood determined by

P�tþ1, state updating can be

performed to derive the posterior state and its covar-iance matrix.

The first task during the measurement update is tocompute the Kalman gain Ktþ1. It is performed asfollows:

Ktþ1 ¼

P�tþ1 H

T

HP�

tþ1 HT þ R

ð5Þ

The gain matrix K can be physically interpreted as aweighting factor to determine the contribution ofmeasurement ztþ1 and prediction Hx�1

tþ1 to the posteriorstate estimate xtþ1. The next step is to actually measurethe process to obtain zt, and then to generate aposteriori state estimate xtþ1 by incorporation themeasurement into Eqn (1). xtþ1 is computed as follows:

xtþ1 ¼ x�tþ1 þ Ktþ1ðztþ1 �Hx�tþ1Þ ð6Þ

The final step is to obtain a posteriori error covarianceestimate. It is computed as follows:

Xtþ1

¼ ðI � Ktþ1HÞX�1

tþ1ð7Þ

After each time and measurement update pair, theKalman filter recursively conditions current estimate onall of the past measurements and the process is repeatedwith the previous posterior estimates used to project orpredict a new a priori estimate.

In order for the Kalman filter to work, the Kalmanfilter needs be initialized. First, we need to specify aninitial state. We start Kalman filter tracker after wedetected the pupils successfully in two consecutiveframes. Let the two frames be t and t+ 1. The initialstate vector x0 can be specified as

c0 ¼ ctþ1

r0 ¼ rtþ1

u0 ¼ ctþ1 � ct

v0 ¼ rtþ1 � rt

We also need to specify the initial covariance matrixP

0

for the initial state x0. SinceP

t is updated iteratively aswe acquire more images, we can initialize it to largevalues. Suppose the predicted position has 710 pixelserror from the true value in both u and v directions, andthe speed has 75 pixels error from the true value in u

and v directions, therefore, the estimator error covar-iance

P0is defined as

X0¼

100 0 0 00 100 0 00 0 25 00 0 0 25

2664

3775

50

100

150

200

250

300

350

80 90 100 110 120 130 140 150

row

s

coloums

30 samples of kalman filtering pupil tracking trjectory

'km_true.dat''km_esti.dat'

Figure 17. Trajectory of the real and estimated pupil posi-tions in a 30-frame sequence. Crosses indicate the estimatedpupil positions via Kalman filter. Small circles indicate theactual tracked pupil positions. It is apparent that the two pupilpositions match very well.

366 Q. JIANDX.YANG

Besides x0 andP

0, we also estimate the system andmeasurement error covariance matrices Q and R. Basedon observation of the pupil motion, we can safelyassume the system perturbation model as follows. Thestandard deviation from positional system error to be 4pixels for both u and v directions. We further assumethat the standard deviation for velocity error to be 2pixels/frame. Therefore, the state covariance matrix canbe quantified as

Q ¼

16 0 0 00 16 0 00 0 4 00 0 0 4

2664

3775

Similarly, we can also assume the error for measurementmodel as 2 pixels for both x and y directions. Thus,

R ¼4 00 4

� �

Both Q and R are assumed be stationary (constant).

Using the state prediction and updating equations, inconjunction with the initial conditions, the state vectorand its covariance matrix at each frame is estimated. Weobserve the covariance matrix

Pt gradually stabilize as

expected after a few image frames. The state covariancematrix

Pt gives the uncertainty of pupils position in tth

frame. The search area can be determined byP

t asshown in Figure 16, where the ellipse represents thesearch region and the major and minor axes of theellipse is determined by the two eigenvectors of

Pt. In

practice, to speed up the computation, the values ofPt½0�½0� and

Pt½1�½1� are used to compute the search

detectedpupil attime t

predicted pupiland search areaat time t+1

(ct, rt)

(c-t+1, r-t+1)

Σ-t+1

Figure 16. Pupil detection and tracking using Kalman filter-ing, where ðcctþ1; rrtþ1Þ are the predicted position and

Ptþ1 are

the uncertainty associated with ðcctþ1; rrtþ1Þ.

area size. Specifically, the search area size is chosen as50+2*

P½0�½0� and 50+2*

P½1�½1�, where 50 50 is the

basic window size. This means the larger theP

[0][0] andP[1][1], the more the uncertainty of the estimation, and

the larger the search area is. The search area is,therefore, adaptively adjusted. The pupil detectionprocedure will be reactivated if tracking fails. Trackingfails if the tracked region has a unimodal intensitydistribution (only background) as indicated by itsvariance or the binary blob detected in the trackedregion does not satisfy certain geometric constraintssuch as size and shape.

To study the validity of the Kalman filter for pupiltracking, we study the differences between thepredicted and the actual pupil locations as shownin Figure 17 which shows the trajectory of the actualand estimated pupil position in a 30-frame imagesequence using Kalman filter. It is clear from this figurethat the predicted and the actual pupil positionsmatch well.

Tracking results. The Kalman filter tracker has beenimplemented and tested on a 300 MHz Ultra 30 Sunworkstation, with image resolution of 640 480. Thepupil tracking program and algorithm is found to berather robust under different face orientations anddistances. Specifically, our pupil tracking programallows the subject to freely move in the view of the

Figure 18. The pupil tracking program tracks the pupils asthe face rotates.

Figure 19. Tracking result in 10 consecutive frames withKalman filtering track scheme.

REAL-TIMEDRIVERVIGILANCEMONITORING 367

camera under the illumination of both an IR sourceand ambient light. It automatically finds and tracksthe pupils and can recover from tracking-failures.For pupils temporarily out of the camera view,it can instantly relocate the pupils as soon as theyreappear in the camera view. The system runsat a frame-rate of about 20 frames/s. To demon-strate the pupil tracking program, Figure 18 shows asequence of consecutive frames with rectanglesindicating the locations of the detected pupils. Addi-tional images are shown in Figures 19 and 20, whichshows pupil tracking for people with glasses. Figure 21shows the pupil tracking result under strong externalillumination interference from a light in front of thesubject. Video demonstration of our eye tracking systemmay be found at http://www.cs.unr.edu/qiangji/fati-gue.html.

The major advantages of the eye detectionand tracking approach include its simplicity, robust-ness, and accuracy. It is non-invasive and can,therefore, be achieved without even the knowledgeof the subject. Furthermore, the techniquedoes not require any physical or geometric model. It isbased on the spectral (reflective) properties of pupil.Although special illumination is required, the illumina-tion set-up is simple and the scene background isirrelevant. Pupils can be detected and tracked in realtime in a wide range of scales and illuminationconditions.

Computation of Eyelid Movement Parameters

Introduction

Eyelid movement is one of the visual behaviors thatreflect a person’s level of fatigue. The primary purposeof pupils tracking is to monitor eyelid movements and tocompute the relevant eyelid movement parameters.There are several ocular measures to characterize eyelid

Figure 20. Kalman filtering tracking result with glasses.

Maximum Pupil Size

80% of Maximum Pupil Size

20% of Maximum Pupil Size

t1 t2 t3Time

Pupil Size

t4

Figure 22. Definition of eye closure duration and eye open/close speed.

368 Q. JIANDX.YANG

movement such as eye blink frequency, eye closureduration, eye closure speed, and the recently developedparameter PERCLOS. PERCLOS measures percentageof eye closure over time, excluding the time spent onnormal closure. It has been found to be the most validocular parameter for characterizing driver fatigue [13].Study performed by Wierwillw et al. [25] shows thatalert drivers were reported to have much lowerPERCLOS measurement than a drowsy driver. Anotherocular parameter that could potentially be a goodindicator of fatigue is eye closure/opening speed, i.e. theamount of time needed to fully close the eyes and tofully open the eyes. Our preliminary study indicates thatthe eye closure speed is distinctively different for adrowsy and an alert subject. For this research, we focuson real-time computation of these two parameters tocharacterize eyelid movement.

To obtain these measurements (PERCLOS and eyeclosure speed), we propose to continuously track thesubject’s pupils and determine in real time the amount ofeye closure based on the area of the pupils that havebeen occluded by the eyelids. Specifically, an eye closure

Figure 21. Examples of pupil tracking under strong externalillumination interference.

occurs when the size of detected pupil shrinks to afraction (say 20%) of its nominal size. As shown inFigure 22, an individual eye closure duration is definedas the time difference between two consecutive timeinstants, t2 and t3, between which the pupil size is 20%or less of the maximum pupil size. And an individual eyeclosure speed is defined as the time period of t1 to t2 or t3to t4, during which pupil size is between 20% and 80%of nominal pupil size, respectively.

Computation of PERCLOS and average eye closure speed(AECS)

Individual eye closure duration and eye closure speedcan reflect driver’s level of alertness. Both PERCLOSand AECS at a particular time instance are computedover a fixed time interval (30 s). Average eye closure/opening speed is computed as arithmetic average of alleye closure speed over a fixed time period (30 s). A one-time measurement of either parameter, however, cannotaccurately quantify a person’s alertness because of itsrandom nature. A more accurate and robust way is tocompute the running average (time tracking) of eachparameter. Running average computes the two para-meters at a time using current data and the data atprevious time instances. To obtain running average ofPERCLOS measurement for example, the programcontinuously tracks the person’s pupil size and monitorseye closure at each time instance. The cumulative eyeclosure duration over time is used to compute PER-CLOS. We compute the PERCLOS measure based on

0

5

10

15

20

25

30

35

40

45

50000 100000 150000 200000 250000 300000 350000

PE

RC

LOS

(P

ercl

ent)

Time (msec)

PERCLOS Plot Based On 6 Minutes Simulation

'perclose.dat'

Figure 24. Running average PERCLOS measurement over6 min.

REAL-TIMEDRIVERVIGILANCEMONITORING 369

the percentage of eye closure in 30 s. This time limit isarbitrary and can be adjusted by the user.

Eye closure simulation result analysis

In this section, we describe the experiments weperformed aimed at studying the validity of the twoeyelid movement parameters. To produce realistic realdata, a human subject deliberately blinks her eyesdifferently in front of our system to simulate differenteyelid movement patterns possibly associated withdifferent levels of fatigue. Note we are not measuringfatigue itself, rather different eyelid movement patterns.The first experiment studied the difference in eye closurespeed between an alert individual and a drowsyindividual. Figure 23 shows the average time neededfor an alert individual to close eyes vs. the time neededfor a drowsy person to close eyes. It is clear from thesimulation data that the eye closure speed for a fatigueindividual is much slower (longer time) than that of anvigilant individual. This reveals that eye closure speedcould potentially be used as a metric to quantify thelevel of fatigue subject to more rigorous validation.

The second experiment lasted for 6min, with thesubject being alert during the first 2.5 min and beingfatigued afterwards. The goal of this experiment is tostudy: (1) whether the change in fatigue level can bedetected by both parameters; and (2) whether the twoparameters correlated. Our system recorded the eyelidmovement and computed the two parameters over the

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.70

200

400

600

800

1000

1200

1400

time (sec)

Ave

rage

pup

il si

ze (

pixe

l num

ber)

Normal eye closure speed

unnormal eye closure speed C

D

A

B

(from C to D in Figure 8)

(from A to B in Figure 8)

Figure 23. Eye closure speed comparison. The normal eyeclosure only takes 0.17 s, but abnormal eye closure takes aslong as 1.68 s.

period in real time. Figures 24 and 25 plot PERCLOSand AECS parameters over the entire 6-min period.

Figure 24 shows that in the early session of simulation(before 150,000 ms or 2.5 min), PERCLOS measurementis below 30%, which represents the alert state. However,beyond 150,000ms, PERCLOS measures over 40%, asignificant increase, representing the fatigue state.Interestingly enough, Figure 25 follows the similartrend, i.e. it takes much shorter time to close eyes whenthe subject is alert (before 150,000ms) and much longertime to close eyes when the subject is drowsy (after

20

40

60

80

100

120

140

160

180

200

220

240

50000 100000 150000 200000 250000 300000 350000

Run

ning

Ave

rage

Eye

Clo

sure

Spe

ed (

mse

c)

Time (msec)

Running Average Eye Closure Speed Plot Based On 6 Minutes Simulation

'perclose.dat'

Figure 25. Running average eye closure speed measurementsover 6 min.

370 Q. JIANDX.YANG

150,000 ms). We can conclude from this experiment: (1)both parameters can detect different levels of vigilance;(2) PERCLOS and AECS covariate, which demon-strates that the two parameters correlate to certaindegree. The covariation between the two parameters,however, needs be further validated using humansubjects with real fatigue.

In summary, our experiments show there is asignificant difference in eyelid movement measured interms of PERCLOS and AECS for different levels ofvigilance. If a driver is alert while driving, theparameters for PERCLOS and running average eyeclosure speed should be less than 30% and 0.5 s,respectively.

We have successfully developed a prototype hardwareand software system that tracks pupils and computesPERCLOS and AECS parameters in real time. Thesystem will issue an alarm (a repeated and loud beeping)if the computed PERCLOS and AECS measures exceedcertain given threshold (e.g. 430% and 40.5 s) to alertthe driver. We have tested this system with the subject’sface at different orientations, distances, and substantialface movements. It works rather robustly under eachsituation.

This is a very significant progress in that (1) it is realtime; (2) it is robust; (3) it computes the most validfatigue measure PERCLOS, a measure recommended bythe US DOT for monitoring driver fatigue [13] (4) it isnon-invasive, it can be executed without the knowledgeof the user due to its use of IR illumination. It can alsofind applications such as in human–computerinteraction, assisting people with disability, and study-ing people’s eye blink patterns.

Face (head) Orientation Estimation

Face (head) pose contains information about one’sattention, gaze, and level of fatigue. Face posedetermination is concerned with computation of thethree dimensional (3D) face orientation and position todetect such head behaviors as head tilts. The nominalface orientation while driving is frontal. If the driver’sface orientation is in other directions (e.g. down orsideway) for an extended period of time or occursfrequently (e.g. various head tilts), this is either due tofatigue or inattention. Face pose estimation, therefore,can detect both fatigue and inattentive drivers.

Methods for face pose estimation can be classifiedinto three main categories: model-based, appearance-based, and feature-based. Model-based approachestypically recover the face pose by establishing therelationship between 3D face model and its two-dimensional (2D) projection [26–28]. Appearance-basedapproaches are based on view interpolation and theirgoal is to construct an association between appearanceand face orientation [29–31]. Although appearance-based methods are simpler, they are expected to be lessaccurate than model-based approaches and are mainlyused for pose discrimination. Feature-based approachdetermines face pose using some facial features and theirimage, and then determine the pose using the conven-tional point-based pose estimation method. The majorchallenge with feature-based approach is to robustlydetect and track the required facial features from frameto frame under varying illumination conditions, facialexpressions, and different head orientations.

In our application, the precise degree of headorientation is not necessary, what we are interested isto detect if the driver head deviates from its nominalposition and orientation for an extended time or toofrequently.

Face orientation determination

We propose a new model-based approach. Ourapproach recovers 3D face pose from a monocular viewof the face with full perspective projection. Our studyshows that there exists a direct correlation between 3Dface pose and properties of pupils such as pupils size,inter-pupil distance, and pupils shape. Figure 26 showspupil measurements under different head orientations.The followings are apparent from these images:

K The inter-pupil distance decreases as the face rotatesaway from the frontal orientation.

K The ratio between the average intensity of two pupilseither increases to over one or decreases to less thanone as face rotates away or rotates up/down.

K The shapes of two pupils become more elliptical asthe face rotates away or rotates up/down.

K The sizes of the pupils also decrease as the facerotates away or rotates up/down.

The above observations serve as the basis for estimatingface orientation from pupils.

Based on the above observations, we can develop aface pose estimation algorithm by exploiting therelationships between face orientation and these pupil

clusters will be more distinctive. So a pose can bedetermined by the projection of pupil properties in PFS.Figure 26. Pupil images for different head orientations. (a)

frontal head position, (b) turn head left, (c) turn head right. Itis clear that pupil properties such as size, intensity, and shapevary with face orientations.

Pose Classes in Eigenspace-45

-0.50

0.5-0.1

-0.05

0

0.05

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

the first PCthe second PC

the

third

PC

-30-15

0153045

Figure 28. Projection of pose classes in eigen-PFS.

REAL-TIMEDRIVERVIGILANCEMONITORING 371

parameters. We build a so-called pupil feature space(PFS) which is constructed by seven pupil features:inter-pupil distance, sizes of left and right pupils,intensities of left and right pupils, and ellipse ratios ofleft and right pupils. To make those features scaleinvariant, we further normalize those parameters bydividing over corresponding values of the front view.Figure 27 shows sample data projections in PFS, fromwhich we can see clearly that there are five distinctiveclusters corresponding to five face orientations (5 yawangles). Note that although we can only plot 3-D spacehere, PFS is constructed by seven features, in which the

Pose Clusters in Feature Space

pose angle = 0

0.5

1 0.20.3

0.40.5

0.60.7

0.80.9

0.30.40.50.60.70.80.9

11.11.21.3

intensity ratiodistance ra

tio

ellp

se r

atio

pose angle = 15pose angle = 30pose angle = 45pose angle = 60

Figure 27. Face pose clusters in pupil feature space.

To maximize the separation of different clusters, weneed to find a representation of the PFS by whichdifferent pose classes are most apart from each other. Awell-known method to achieve this goal is principalcomponent algorithm (PCA), or eigenspace algorithm,which is to find the principal components of thedistribution of poses, or the eigenvectors of thecovariance matrix of the set of poses. The eigenvectorsare ordered, each one accounting for a different amountof the variation among the poses, and each individualpose can be represented exactly in terms of a linearcombination of the eigenvectors. Training data arecollected to build the eigen-PFS, and store severalmodels representing typical poses, which are, in ourexperiments, vary between �451 and 451. Figure 28shows the distribution of the models in eigen-PFS,where a 3-D projection is used while the actualdimensions of the eigen-PFS are seven. The faceorientation of an input face can then be mapped toone of the clusters based on its Euclidean distance to thecenter of each cluster.

Experimental results

Here we present results of experiments with real data.The experiment involve a human subject rotating herface left and right to produce difference face orienta-tions. Our face pose estimation technique is used toestimate each pose. The face orientation is quantizedinto seven angles: �451, �301, �151, 01, 151, 301, and451. A total of 300 data points were collected for eachface orientation, representing the seven face orienta-

Table 1. Overall performance: statistics of pose estimationoutput. Seven feature ratios, median filtering, leave-15-outtraining

Groundtruth

Total no.of data

No. oftest data

Overall correctestimation

Correctness

�45 300 15 100.00�30 300 14.95 99.67�15 300 14.9 99.330 300 15 14.8 98.6715 300 14.45 96.3330 300 15 100.0045 300 14.2 94.67

372 Q. JIANDX.YANG

tions. Each data point is comprised of seven rawfeatures (inter-pupil distance, sizes of left and rightpupils, intensities of left and right pupils, and ellipseratios of left and right pupils), which are collected andpre-processed through a median filtering to removeoutliers. The pre-processed data are subsequentlynormalized by dividing each feature data by theaccording data of the front view. The leave-N-outtechnique was used to train and test the algorithm sincethere are not enough training data available. It works byrepeatedly using part of the data for training and theother part for test. This repeats until all data have beentraining and testing data. In our case, 95% data are usedfor training and 5% for test in each iteration. Theexperimental results are summarized in Table 1.

To further validate the face pose estimation method,we apply it to monitor the drivers attention whiledriving. Figure 29 shows the running average face pose

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

50 100 150 200 250 300 350 400 450

side

view

per

cent

age

time (seconds)

Running average sideview percentage simulated in 7 minutes

'sideview.dat'

Figure 29. Face orientation monitoring over time.

estimation for a period of 6 min. As can be seen, mosttimes during this period, face pose is frontal. But thereare times when an extended period of time is spent onother directions (left or right), representing inattention.For a real-time demo of the face orientation determina-tion, refer to

http://www.cs.unr.edu/Bqiangji/fatigue.html.

Similar results were obtained for head rotating up anddown (head tilt).

Eye-gaze Determination and Tracking

Gaze has the potential to indicate a person’s level ofvigilance. A fatigued individual tends to have a narrowgaze. Gaze may also reveal one’s needs and attention.Gaze estimation is important not only for fatiguedetection but also for identifying a person’s focus ofattention which can be used in the area of human–computer interaction.

Of the numerous techniques proposed for gazeestimation [31–33], the one proposed by Ebisawa [34]appears very promising and is directly applicable to thisproject. Their technique estimates the gaze directionbased on the relative position between pupil and theglint. Based on an improvement of this technique, wehave developed a video-based, contact-free eye-gazeestimation algorithm that can: (1) track and detectthe eye position from the face image in real time; (2)estimate the gaze direction by computing the Cartesiancoordinates difference of the glint center and the pupilcenter; (3) map the pixel coordinates difference intogaze.

Pupil and glint detection and tracking

The gaze estimation algorithm consists of three parts:pupil–glint detection and tracking, calibration, and gazemapping. For this research, the gaze of a driver can bequantized into nine areas: frontal, left, right, up, down,upper left, upper right, lower left and lower right, asshown in Figure 30. Gaze estimation starts with pupil–glint detection and tracking. For gaze estimation, wecontinue using the IR illuminator as shown in Figure 5.To produce the desired pupil effects, the two rings areturned on and off alternately via the micro-controllerdescribed in the Pupil detection section to produce theso-called bright and dark pupil effects as shown inFigure 7(a) and (b). The pupil looks bright when the

down

right

up

left front

leftupper

rightupper

rightleftlower lower

Figure 30. Quantized eye gaze regions.

Figure 31. Relative spatial relationship between glint andbright pupil center used to determine eye-gaze position: (a)bright pupil images, (b) glint images; (c) pupil–glint relation-ship generated by superimposing glint to the thresholdedbright pupil images. The relative positions between glint andpupil determines gaze.

REAL-TIMEDRIVERVIGILANCEMONITORING 373

inner ring is turned on as shown in Figure 7(a) and thepupil looks dark when the outer ring is turned on asshown in Figure 7(b). Note glint appears on bothimages. Algorithmwise, glint can be detected much moreeasily from the dark image since both glint and pupilappear equally bright and sometimes overlap on thebright pupil image. This explains why we need both thedark and bright pupil images. Unlike pupil tracking,gaze tracking requires a close-up view of the eyes toensure accuracy and robustness in glint and pupildetection and the accuracy of the gaze estimate. Thefield of the view of camera, therefore, focuses on theregion of the face near the eyes.

Given a bright pupil image, the pupil detection andtracking technique described earlier can be directlyapplied. The location of a pupil at each frame ischaracterized by its centroid. Only one pupil needs to betracked since both pupils give the same gaze direction.Given the dark pupil image, the pupil detection andtracking technique can be adapted to detect and trackglint. The center of glint can be computed and is used tospecify the location of the glint. Figure 31 gives anexample of bright pupils (a); dark pupils with glint (b);and the detected pupil and glint (c).

Gaze mapping

Given the relative position between pupil and glint, thescreen (actual) coordinates of the gaze can be deter-mined via a linear mapping procedure. The conventionalapproach for gaze mapping only uses coordinatesdisplacement of pupil center and glint position [35, 36]as a pupil–glint vector. The main drawback with thismethod is that the subject must keep his or her headstationary, or the glint position in the image will change.In practice, it is difficult to keep head still and the

existing gaze tracking methods will produce incorrectresult if the head moves, even slightly. Head movementmust, therefore, be incorporated in the gaze estimationprocedure. In this section, we introduce a new gazeestimation procedure that tolerates slight translationalhead movement.

According to our mapping procedure, the pupil–glintvector is represented by

g ¼ Dx Dy gx gy 1� �

where Dx and Dy are the pupil–glint displacement, gxand gy are the glint image coordinates. Unlike theexisting methods which only uses Dx and Dy, ourprocedure also includes the glint position. This effec-tively reduces the head movement influence. Thecoefficient vector c is represented by

c ¼ a b g l y½ �T

Assuming the gaze point is located at one of the ninelocations on the screen as illustrated in Figure 30.The pupil–glint vector measured during runtime can bemapped to the image screen locations through thefollowing equations:

i ¼ g c ¼ aDxþ bDyþ ggx þ lgy þ y

where i is the gaze region index from 1 to 9 representingone of nine directions. The coefficients a, b, g, l, and y

-10

-5

0

5

10

15

20

-15 -10 -5 0 5 10 15

y-di

ffere

nce

(pix

els)

x-difference (pixels)

Pupil-Glint Difference Distributation Based on 200 Samples (NormarDrive Case)

'pg100.dat'

Histogram Based On 2000 Samples (Normal Drive Case)

160140120100

80604020

03 2 0 7 11 11

1

TL ML BL BM BRMMTM TR TR

142

0 4

(b)

(a)

Figure 32. Normal driving case: Look front. (a) Gazedistribution under normal driving. Gaze most times is frontal.(b) Histogram plot for normal driving.

374 Q. JIANDX.YANG

are determined via a simple calibration procedure asdiscussed below.

Calibration procedure. The calibration procedure is verysimple and brief. We divided calibration panel of intonine regions to represent the nine gaze directionscorresponding to Figure 30. During calibration, theuser is asked to fixate his/her gaze from target 1 to 27.After each target is finished, the system will sound abeeper to remind user to move his gaze to next target.On each fixation, three sets of pupil-glint vectors arecomputed and stored, so that a 27 5 pupil–glint matrixA is obtained. The transformation from pupil–glintmatrix A to the target vector B is given by

A c ¼ B ð8Þ

where A is 27 5 matrix represented as

A ¼

Dx1 Dy1 gx1 gy1 1

Dx2 Dy2 gx2 gy2 1

Dx3 Dy3 gx3 gy3 1

Dx4 Dy4 gx4 gy4 1

Dx5 Dy5 gx5 gy5 1

Dx6 Dy6 gx6 gy6 1

..

. ... ..

. ... ..

.

Dx25 Dy25 gx25 gy25 1

Dx26 Dy26 gx26 gy26 1

Dx27 Dy27 gx27 gy27 1

26666666666666666666666664

37777777777777777777777775

and B is 27 1 vector represented by

B ¼ 1 1 1 2 2 2 3 9 9 9½ �T

Coefficient vector c can be acquired by solving Eqn (8)using the least-squares method.

Table 2 Confusion table for gaze estimation with 100 gaze sam

Ground truth (target no.) Estimation result

1 2 3 4

1 95 1 1 02 7 84 5 13 0 3 90 04 0 0 1 985 1 0 0 06 0 0 1 07 0 0 0 08 0 0 0 09 1 0 0 0

Experimental Results and Analysis

The gaze tracker is currently running on Sun Ultra10(300 MHz) in near real time (15 frames/s). In operation,the subject faces directly to camera and changes his orher gaze direction after finishing calibration. Table 2 is

ples

(mapping target no.) Correctness rate (%)

5 6 7 8 9

0 0 0 1 2 953 0 0 0 0 840 5 0 2 0 900 0 1 0 0 98

94 0 1 4 0 946 93 0 0 0 930 1 98 1 0 980 0 0 100 0 1000 1 0 1 97 97

-10

-5

0

5

10

15

20

-15 -10 -5 0 5 10 15

y_di

ffere

nce

(pix

els)

x_difference (pixels)

Pupil_Glint Difference Distributation Based on 200 Samples (Inattentation Case)

'pg300.dat'

(a)

Histogram Plot Based On 200 Samples (Inattentation)807060

68

REAL-TIMEDRIVERVIGILANCEMONITORING 375

the confusion table showing the correction rate for eachdirection based on 100 samples. The result is verypromising. Another experiment was conducted tosimulate different driving state: normal, drowsy andinattention. Two hundred data samples were taken foreach case. The gaze distribution and its histogram plotsare shown in Figures 32–34. Figure 32 shows most gazepoints are located in region 5, which reflects normaldriving case. Figure 34 shows most gaze points arelocated in region 8, which reflects drowsy driving case.Figure 33 shows most gaze points are distributed inregions 4–6, which reflects inattentive driving case.

Conclusions

Through research presented in this paper, we developeda non-intrusive prototype computer vision system for

-10

-5

0

5

10

15

20

-15 -10 -5 0 5 10 15

y_di

ffere

nce

(pix

els)

x_difference (pixels)

Pupil_Glint Difference Distributation Based on 200 Samples (Drowsy Case)

'pg200.dat'

Histogram Plot Based On 200 Samples (Drowsy)

122

1124 24

2 3000

140

120

100

80

60

40

20

0

TL TM TR ML MM MR BL BM BR(b)

(a)

Figure 33. Drowsy: look down. (a) Gaze distribution forfatigued driving. Most gaze is down. (b) Histogram plot forfatigued driving.

(b)

504030

2010

01 1

51

23

1

TL TM TR ML MM MR BL BM BR

Figure 34. Inattention: look left or right. (a) Gaze distribu-tion for inattentive driving. Significant gaze are toward left andright. (b) Histogram plot for inattentive driving.

real-time monitoring a driver’s vigilance. We focus ondeveloping the necessary hardware and imaging algo-rithms that can simultaneously extract multiple visualcues that typically characterize a person’s level offatigue. These visual cues include eyelid movement,gaze, and face orientation. The main components of thesystem consists of a hardware system for real timeacquisition of video images of the driver and variouscomputer vision algorithms and their software imple-mentations for real-time eye tracking, eyelid movementparameters computation, face pose discrimination, andgaze estimation.

Each part of our fatigue monitor system was tested ina simulating environment with subjects of differentethnic backgrounds, different genders, ages, and underdifferent illumination conditions. The system was foundvery robust, reliable and accurate. We are nowcollaborating with Honda to port the codes to PC andto install it system on a vehicle to evaluate its

376 Q. JIANDX.YANG

performance under real driving conditions. Anotherfuture task is to develop new gaze estimation algorithmto tolerate large head movement. This requires improv-ing the calibration procedure and to incorporate facepose into gaze estimation to compensate for headmovement.

With our active hardware system, we can achieve thefollowings that are difficult to achieve with the conven-tional passive appearance-based methods:

K fast and real-time eye and face tracking;K less sensitive to external illumination interference;K more robust and accurate;K allow fast head/face movement.

Acknowledgements

The research is partially supported by a grant from theAir Force Office of Scientific Research and a grant fromHonda R & D Americas.

References

1. Saito, H., Ishiwaka, T., Sakata, M. & Okabayashi, S.(1994) Applications of driver’s line of sight to auto-mobiles – what can driver’s eye tell. Proceedings of 1994Vehicle Navigation and Information Systems Conference,Yokohama, Japan, August 1994, pp. 21–26.

2. Ueno, H., Kaneda, M. & Tsukino, M. (1994) Develop-ment of drowsiness detection system. Proceedings of 1994Vehicle Navigation and Information Systems Sonference,Yokohama, Japan, August 1994, pp. 15–20.

3. Boverie, S., Leqellec, J.M. & Hirl, A. (1998) Intelligentsystems for video monitoring of vehicle cockpit. 1998International Congress and Exposition ITS: AdvancedControls and Vehicle Navigation Systems, pp. 1–5.

4. Kaneda, M. et al. (1994) Development of a drowsinesswarning system. The 11th International Conference onEnhanced Safety of Vehicle, Munich.

5. Onken, R. (1994) Daisy, an adaptive knowledge-baseddriver monitoring and warning system. Proceedings of1994 Vehicle Navigation and Information Systems Con-ference, Yokohama, Japan, August. 1994, pp. 3–10.

6. Feraric, J., Kopf, M. & Onken, R. (1992) Statistical versusneural bet approach for driver behavior description andadaptive warning. The 11th European Annual Manual,pp. 429–436.

7. Ishii, T., Hirose, M. & Iwata, H. (1987) Automaticrecognition of driver’s facial expression by image analysis.Journal of the Society of Automotive Engineers of Japan,41: 1398–1403.

8. Yammamoto, K. & Higuchi, S. (1992) Development ofa drowsiness warning system. Journal of the Society ofAutomotive Engineers of Japan, 46: 127–133.

9. Smith, P., Shah, M. & da Vitoria Lobo, N. (2000)Monitoring head/eye motion for driver alertness withone camera. The 15th International Conference on PatternRecognition, Vol. 4, pp. 636–642.

10. Saito, S. (1992) Does fatigue exist in a quantitative of eyemovement? Ergonomics, 35: 607–615.

11. Anon. (1999) Perclos and eyetracking: Challenge andOpportunity. Technical Report, Applied Science Labora-tories, Bedford, MA.

12. Wierville, W.W. (1994) Overview of research on driverdrowsiness definition and driver drowsiness detection.ESV, Munich.

13. Dinges, D.F., Mallis, M., Maislin, G. & Powell, J.W.(1998) Evaluation of techniques for ocular measurementas an index of fatigue and the basis for alertnessmanagement. Department of Transportation HighwaySafety Publication 808 762, April 1998.

14. Anon. (1998) Proximity array sensing system: headposition monitor/metric. Advanced Safety Concepts, Inc.,Sante Fe, NM87504.

15. Anon. (1999) Conference on Ocular Measures of DriverAlertness, Washington, DC, April, 1999.

16. Grace R. (1999) A drowsy driver detection system forheavy vehicles. Conference on Ocular Measures of DriverAlertness, April 1999.

17. Cleveland, D. (1999) Unobtrusive eyelid closure and visualof regard measurement system. Conference on OcularMeasures of Driver Alertness, April 1999.

18. Fukuda, J., Adachi, K., Nishida, M. & Akutsu, E. (1995)Development of driver’s drowsiness detection technology.Toyota Technical Review, 45: 34–40.

19. Richardson, J.H. (1995) The development of a driveralertness monitoring system. In: Harrtley, L. (ed.) Fatigueand Driving: Driver Impairment, Driver Fatigue and DriverSimulation. London: Taylor & Francis.

20. Hutchinson, T.E. (1990) Eye movement detection withimproved calibration and speed. U.S. Patent 4950069,April.

21. Morimoto, C.H., Koons, D., Amir, A. & Flickner, M.(2000) Pupil detection and tracking using multiple lightsources. Image and Vision Computing, 18: 331–335.

22. Kittler, J., Illingworth, J. & Foglein, J. (1985) Thresholdselection based on simple image statistic. Computer Vision,Graphics, and Image Processing, 30: 125–147.

23. Blake, A., Curwen, R. & Zisserman, A. (1993) A frame-work for spatio-temporal control in the tracking of visualcontours. International Journal of Computer Vision, 11:127–145.

24. Maybeck, P.S. (1979) Stochastic Models, Estimation andControl, Vol. 1. New York: Academic Press, Inc.

25. Wierville, W.W. & Ellsworth, L.A. et al. (1994) Researchon vehicle based driver status/performance monitoring:development, validation, and refinement of algorithms fordetection of driver drowsiness. National Highway TrafficSafety Administration Final Report: DOT HS 808 247.

26. Tsukamoto, A., Lee, C. & Tsuji, S. (1994) Detection andpose estimation of human face with synthesized imagemodels. ICPR-94, pp. 754–757.

27. Gee, A. & Cipoll, R. (1994) Determining the gaze of facesin images. Image and Vision Computing, 30: 639–647.

28. Horprasert, T., Yacoob, Y. & Davis, L. (1996). Comput-ing 3D head orientation from a monocular image.

REAL-TIMEDRIVERVIGILANCEMONITORING 377

Procedings of International Conference on AutomaticFace and Gesture Recognition, pp. 242–247.

29. Gong, S., Ong, E. & McKenna, S. (1998) Learningto associate faces across views in vector space ofsimilarities of prototypes. Proceedings of British MachineVision. Southampton, UK, 1998, 300–305.

30. Pentland, A., Moghaddam, B. & Starner, T. (1994) View-based and modular eigenspaces for face recognition.CVPR-94, pp. 84–91.

31. Rae, R. & Ritter, H. (1998) Recognition of human headorientation based on artificial neural networks. IEEETransactions on Neural Networks, 9: 257–265.

32. Baluja, S. & Pomerleau, D. (1994) Non-intrusive gazetracking using artificial neural networks. Technical ReportCMU-CS-94-102, Carnegie Mellon University.

33. Yang, G. & Waibel, A. (1996) A real-time facetracker. Workshop on Applications of Computer Vision,pp. 142–147.

34. Ebisawa, Y. (1989) Unconstrained pupil detection techni-que using two light sources and the image differencemethod. Visualization and Intelligent Design in Engineer-ing, pp. 79–89.

35. Hutchinson, T.E. (1988) Eye movement detection withimproved calibration and speed. United States Patent [19],(4,950,069).

36. Hutchinson, T.E., Preston White, K., Worthy, J.R.,Martin, N., Kelly, C., Lisa, R. & Frey, A. (1989)Human–computer interaction using eye-gaze input.IEEE Transactions on Systems, man, and Cybernetics, 19:1527–1533.