Gang, W.; Gaze Tracking Using One Fixed Camera

8/7/2019 Gang, W.; Gaze Tracking Using One Fixed Camera

1/6

Seventh lournational Comlerence on Comtrol,Automition,Robotics And Vbloa (ICARCV'OZX Du 2002. Singapore

Gaze Tracking Using One Fixed C ameraWen Gang

Department of Electrical and Computer EngineeringNational University of Singapore

Blk E4 h M5-03,Engineering Drive 3, Singapore 117576Email [email protected]

AbstractIn this paper, a nonwutact comealpupil reflectionscheme using only one 6 x 4 camera to track the eyegaze is presentedA sma l l manual focus lens is used in acamera without a pan-and-tilt base. A connectedcomponent labeling algorithm is designed to detect thepupils. The, d e n t information is utilized to find theprecise pupil center. The effect of pupil detectionprecision to serem point estimation precision is alsodiscussed..Mer the calibration process, the headmovement is tracked by a Kalman filter. Weexperimented with the head roll compensation by usingg eo mh c information. The whole system MO NU inreal time at video rate (25fnmes/second). This pilotstudy shows that using a ked camera has its ownadvantage in dynamic property and MO achieve thesame precisionas more complex systems do.1 IntroductionUsing eye ga7z to w-1 a wmputer has manyadvantages. First, the eye can move vay quickly incomparison to other parts of the body. Furthermore,when we want to activate an icon we usually look at theicon 61% and then we move the mouse cursor to thepoint of interest and click to confirm the action.Therefore, if we can predict which point the user islooking at on the screen, there wi l l be no need to movethe cmor anymore, thereby improving the interface andpossibly also reducingktigue. Lastly, it is more naturalto use the eye gaze to, get information in human-wmputer interactions, especially in virmal realitysystems.Many methcds have heen proposed to estimate the eyegaze direction of a USCT looking at a point on the scrcenYoung and Shema give a detailed description ofdifferent gaze t r a c k techniques [I]. These methodscan be divided into two categories: wntact andnonwntact methods. Among the noncontact methods,Tomono et al. [2] design a system which is wmposed of3 CCD cameras and 2 near infrared light sources withdifferent wavelengb. Ebisawa designed a system usmgtwo fiared LEDs to detect the pupils [3], and laterimproved it [4] [SI.He implemented a real-time pupildetectionsystem using custonnzed hardware and a pupilbrightness stabilization scheme.In the noncontact schemes mentioned above, usuallyone camera is installed on a pan-and-tilt base so that thecamera and the infrared LEDs point directly into thefacial plane. The moving pan-and-tilt base makes the

system complex and distracts the user's attention Thesesystems either use a galvo mirror assembly that is fastenough to achieve real-time p er fom awe or use pan-and-tilt cameras which are not fast enough.The gaze trackingsystem described here can do withoutwith the pan-and-tilt base and uses only one camera,hence"gthe system very simple. It also achievesreal time, video rate, processing.The paper is shuchued as follows: the hardwarecon6guration and the corneal refledion method for gazedirection estimation are presented in Section 2; precisepupil detection is desQibed in Scetion 3; pupil trackingappears in Secbon 4; head roll compensation isintroduced in Section 5 ; applicatim is contained inSection 6 and finally the conclusion in Section 7.2 System description2.1 Eardware configuration1

Figure 1 Hardware setup.An infrarsd sensitiveB/W CCD camera1AICV-MSOIRis used This camera is placed at the bottom-center ofthe screen and is pointing upward to the user's face as isshown in Figure I . The lens is a 16mm fixed focus lens.LED sets LED1 and LED2 a n mounted coaxially withthe camera lens and located directly below the screen onthe vertical axis defined by the screen center. The innercircle of LEDs (LEDI) is mounted m front of thecamera lens to achievethe bright pupil image. The outercircle of LEDs (LED2) is fixed coaxially with the lensbut mounted farther from the optical center. The thirdLED set LED3 is located at the left-center of the

1409


2/6

. .

. .

monitor. An icedpass tilter M&K #078 is used toGlter out the visible light,1.2 Theory for gaze direction estimationThe gaze direction is estimated by following the foursteps described below.1) Find pupil centerThe common comeal reflection method is to use one orseveral infrared sensitive cameras to capture thereflected in6ared light coming from the anterior surfaceof the comea and the retina [6]. In our wnfigUratiosLEDl illuminates at odd Gelds. LED2 and LED3illuminate at even fields. The coaxial LEDl willgenerate a bright pupil and dark iris image (Figure 2(a))because the infrared lighl is reflected %om the retina.The LED2 and LED3 will generate a dark pupil image(Figure 2@)) because the infrared light emitted into thepupil cannot be reflected out to the camera due to theinput angle of the infrared light. These two images aresubtracted to gd the difference image. Mer getting theabsolute difference image, only a certain ratio of pixelsare retained. Then a connected component-labelingalgorithm (CCLA) is designed to group all theremaining pixels [7]. To increase the programsrobustness, five largest pupil candidates are sclected.The &dates which are line-like are eliminatedbecause the real pupils are ellipses instead of lines.Since each Geld has only half the lincs in a 6a,me, theideal pupil inage will appear as an ellipse with a heightto width ratio (W )of 0.5. The candidates whoseHTWR are too far away f 0.5 are also eliminatedand only three best candidates are remained. The b e ecandidatescan f o m three different pairs. The final pairofpupils is found according to three properties: The firstproperty is the difference in size (pixel number). Thesccond propeaty is the differ- between the HTW Rsum and 1. The third property is the difference inorientation. If a pair has the smallest value in at leasttwo properties, this pair is the final result Or else, thepair with the smallest size difference is selected. Onedetected o u d in the absolute diffekce image after

( 4 @) (C)Figure 2 (a) Bright pupil;@) Dark pupil; (c) Detected pupil.

2) FindglintcenterMer getting pupil centers, the glint centers aredetected. The glint is the light reflected f the comeawhich is caused by LED2 and LED3. These glintsappear bright in the dark pupil image. Hence, the darkpupil image is utilized to get the glint centers. First, thebrightest pixel is kept which corresponds to Glint1 as

shown in Figure 3 due to LED2 is closer to the usathan W D 3 .

Figure 3 view vector s&ematicSecond, a small region on the upper left of Glintl isdefmed and the brightest pixel in this region is kept. Thecenter of this pixel corresponds to Glina. Hence theintersection point of the horizontal line passing Glintland the vatical line passing Glint2 is found for eachpupil.3) Fmd view vectorA view vector is defined as the vector &om the glmtcenter intersection point ( 02 ) to the pupil center (03)which corresponds to the vector 01 in Figure 3 . Toillustrate the meaning of the n ew vector, we assumefirst that the head is still. Then an eye can only rotate inits socket. The depth variation caused by the eyemovement ULO be ignored when the subject sits farenough from the camera (ex. 500 ) . Hence the eyemovement can be simplified as a planar movement.When the infmed light is reflected from the corne+ itwill form a bright and small point in the c~meraimage(glint). In our scheme,GintI and G W (Figure 3)are glintcc in the -era image which arc imagesof the corneal reflection light emitted h m LED2 andLED3 (Figure I) , respectively. Considering that LED2and LED3 are located at the center of two sides of thescreen,the verticalaxisdeGned by LED2 and horizontalaxis defmed by LED3 will intersect at the center of thescreen. Then the intersection point (02) of the two axesdefined by the two glmts ( Gl i t l and Glint2) is thescreen center in the camera image as shown in Figure 3.Therefore, the view vector V , is a measure of the eyemovement relative to the screen center [8]. vI = Owhen the user looks directly at the center of the screen.The vector v 2 fromthe camera imagecenter (01) to thescreen center in camera image (02) is a reflection of thecurrent head position relative to the screen center.v 2 = 0 when the bead is positioned directly in h u t ofthe screen center.Thus, the head movement and the eyemovement are separated.

* Refer to the dark pupil image in Figure 2@).

21410


3/6

Since there are two pupils, theie will be two viewvectors. The average of the two view vectors is used asthe 6nal view vector.4) Map to screen coordinatesThe view vector needs to be mapped to screencoordinates. For this, a calibration pattem is designedwith nine small sqnares (30x30 pixels) distributedevenlyon the screen (Figure 4(a)).I l l I

(4 @)Figure 4 (a)Calibration image; @) Test image.The resolution of the screen is 1024x768 pixels. whenthe user presses a button while gazing at each of thesqnare, one screen wordinate ( x , , y, ) e d view vector:( x v , y v ) pair is obtained for each square. Thetransformation h m the view vector to screencoordinates is taken to be a pair of simple second orderpolynomialas:X J =ao C a l x , + a 2 y , + n p r , y , +a4x; + a 5 y ;y, =a6 +U,*" +u*y, +n ,x ,y , + q 0 x f +UIIY:

( I )(2)

where ai (i=O,lJ,...,ll)needtobefoundAs each correspondmg point yields two equations,ninepoints will give us 18 qualions. Since there are only I 2unknowns, the coefficientsare solved by least squaresmethod.3 Algorithm forprecise pupil detectionIn order to 6nd the center of the pupil, the regionrepresents the pupil is fitted with an ellipse, and theellipse center is used to approximatethe pupil center.However, the dark pixelswithinthe pupil in Figure 2(c),whch are caused by gllnts from LED2 and LED3, canform concavities in the boundary of the pupil region.This willcause the fittedellipseto move off-center.In order to correct fiom this, the points representingtheconcavitieson the raw pupil boundary are found, andeliminated h m the ellipse fitting procedure. The ellipsecenter is then found to represent the pupil center moreaccurately.This is shown in Figure 5 . The gray ellipse in Figure 50)is the result of tiking byusing aU the boundary points of the region in FigureS(a). The gray ellipse in Figure S(c) is the result ofus ing

l b i l C \\- , \ ~ ,Figure 5 (a) Test image;@) Without precise fitting; (c) With precise fitting

In Figure S(a) there are concavities in the test image dueto two sets of glmts. The fitted ellipse (gray pixels) inFigure 5@) shifts to the right significantly in this case.However h e n the concave points are eliminated, theellipse in Figure S(c) is a better fit to the pupil region,and hence represents the pupil center more accurately.The effect of pupil center estimation WBS consideredwith reference to the calibration and test images inFigure 4(a) and @) respectively.In the experiment, the user's head in the test phase isrequired to be at the same position as in the calibrationphase. The result showed that the avenge errors in theestimated gaze point on the screen were roughly thesame by using the two center estimation methods, theyare (-7.7, 16.3) and (15.17, -0.69), respectively. But thestandard deviation of the m o r by using precisedetection was much better than not using it. Thestandard deviation is (21.8, 34.7) for the precise fittingand (29.7, 43.8) for not using it.4 PupiltrackingAfter the calibration process, the mapping mm viewvector to screen coordinatesneeds to relax the constraintof fixed head position. Fint the user is allowed to movehis head in the facial plane. A Kalman filter is designedto track the head movement by tracking the two pupils[9] . In this system, a constaut speed model has beenutilized.4.1 Kalmnn FilterThe Kalman filter can solve the general problem ofestimating the state,x(k),of a discrete-time controlledprocess that is governed by the linear stochasticdifference equation

(3)with a measurement z ( k ) given by

(4)where F is the statc transition matrix goveming theprocess, H is .the measurement matrix relating themeasuIement vector to the state vector, v ( k ) is theprocess noise vector, and w(k) is the meaSuTementnoise vector. The noise vectors v ( k ) and w(k) areassumed to he independent of each other, white, andGaussian such that

x(k + 1) = F . x ( k )+G . v ( k )~ ( k )= H .~ ( k )+ w ( k )

R, , j = k ( 5 )E[v(j ) v(k)' ]= 0 , otherwiseandE [ v ( k ) .w ( k ) ]= 0 ,for allk (7)Kalman filtering involves obtaining a least mean squareerror estimate, P ( k ) , of x ( k ) . In essence, thealgorithm involves calculating the a priori state

31411


4/6

estimate, P(k+l lk) , and error covariance matrixP(k + Ilk) to estimate the Kalman gain, K ( k + 1).We then obtain the a posteriori state estimate,f ( k + l I k + l ) , and error covariance matrixP(k +Ilk + 1). There aro four steps in implementingthe algorithm.Step 1. One-Step Estimate:f(k + 1Jk)= F 'P(kjk)Step 2. Complde KalmanGain:

(8)~ ( k+ ilk) = F .p(kF).F' + G . R , .G' ( 9 )

(IO)

(11)(12)

K ( k +1)= P(k +Ilk) .HT.S(R + 1)-1i(k + I F + 1) = P(k +Ilk)+ K ( k + l ) .v(k + I)P(k + IF+ 1) = [ I - K ( k + 1). H I . P (k +lpf)S(k + 1)= H . P ( k + llk) .H' +R,v(k + 1) = Z ( k +l)-H . P (k + lk )

Step 3. Update Estimate:Step 4. Update Error Covariance:where

(13)

(14)4.2 Initial conditions ofKalmao filterIn this s", a constant speed model has been utilized.AU thc initial values arc calculated manually from 12se& of image series, with 50 succssive imagesin eachset. The user moves in the facial plane among theseimage series. The initial matrixes for the K a h m filterare Listed at below.P (q0) = Indentity and P(Ol0) = (0 0 0 0)r ,.=(I O O 0)0 1 0 0

10.00460 -0.00029 0.23035 -0.01459-0.00029 0.00026 -0.01459 .0.01346 (I7)0.23035 -0.01459 11.51768 -0.72998-0.01459 0.01346 -0.72998 0.673074.3 Experiment for prediction with bead movementTo evaluate the system's precision with Lateral headmovement, an experiment was conducted using thesame image patterns as shown in Figure 4. The averageprediction error was (-13.21, 16.29) (pixel) and thestandard deviation is (24.66, 50.18) (pixel), whichcorresponds to a eye gaze detection pmision of(0.799'. 1.705') in x and y diredon, respectively.5 Head rotation compensationTo enable the user's natural use the system, the headrotation needs to be compensated. We finished the rollcompensation and p dt i l t compensation is in progress.For bead roll compensation we need to compensate boththe X, andthe y , .As we know, to get the screen position fiom the viewvector is a process of second order polynomial mappingas shown in equations ( I ) and (2). Since the factors U jof the two equations are calculated with view vectors innomd facial images', the view vector measured in theimage with bead roll needs to be converted to a vectorin the normal facial image. Then the convcred viewvector canbe used in equations (1) and (2) to calculatethe meen position th e user is looking at. This is thegeneralidea ofhead rotationcompensation.An experiment is done to get the new vector withdifferent roll degrees. The user is asked to ml l his headuntil his eyes are looking at the calibration points@oints A-E in Figure 6) directly. In this way, theuser's head is roughly at a Catain roll degree. Then theuser is required to look at tbc test point P.The viewvector is remrded for each roll degree. For oneexperiment there are five view vectors recorded Thenthe exueriment is oerformed for ten times. Ten sets ofdata k c gathered. ~

I- - - - - - E t- - - --- -P(7sS. 476) 'E

. AI

Figure 6 Pattern of calibrationpoints

'N& facial images are facial images taken when theuser sits opposite to the scrcencenter and with facialplane parallel to the screen.

41412


5/6

S i the coordinates of the test point P are known, theview vector with a normal head pose can be calculatedfrom ( I ) and (2).5.1 Compensatioofor x,The difference between the calculated view vector andthe measured view vector X , was computed andrecorded as h r ,A sample graph of x, with respect toratioX is shown in Figure 7.

1-01 li t A (3 li. 7 5 t BFigure 7 Measured X , (pixel) vs. r u t i d .

From th i s graph, we can see tha! the ratioX isdecreasing approximately with respect to the xV.As weknow, the measured x, is the largest with a n o dbead pose. With a larger roll degree, the X , is mallerand the h i s larger. Hence the rotioX is increasingwhen x , decreases. Hence, a line was used toapproximate for Mch set of data in a least square sense.Then the average parameters were calculated.Here, we solved the problem of head roll compensation.5.2 compensation for y ,similarly we found tbat ratio? is generaUydecreasing when y , increases. Each data set wasapproximated by using a cubic inteqolation method.The approximation polynomial is:

(18)3 1y = p , . x + P Z X + p 3 . x + p ,After interpolation for each set of data, approximationsat certain y, values were made. Then the averageratioy value was calculated by averaging all the

Ax r a t i o x h --X

r a t i o y h -AY-Y

approximated value at each y , value. The result wasshownin Figure 8.

. -4 & .;r & ob 0; Oi 5 ;. 0;s .iF i p e 8 Measured y , (pixel) vs. the averageratio Y .A 6nal cubic interpolation for the points in Figure 8 wasmade. The approximated curve was shown in Figure 8in real curve.The ratios along the real curve can then beused to compensate the measured y ,6 Applicationhe system was implemented in two applications. Thefirsfone was within the Quake game. In this game. weconverted the eye gaze information into the rotation ofthe Quake game scene. That is to say, supposing theuser is looking at an object on the lefl half of the screen,the Quake game will auto mati dy move its scene to theright until that object is positioned in the center of thescreen. The whole system can nm in real timc at videorate without the use of a mouse.The second application of the eye gaze -king systemwas to w m l a Windows calculator program namedfox calculator. The calculator was womed to occupythe full screen. The size for one calculators button wasroughly lOOxl00 pixels. The eyes were used to movethe CUMT on the screen. A key on the keyboard wasused to simulate the muses lefl bunon by using a DLLtile when the program was running.Although there wassome jitteriug for the cmor s position, we proved thatthe eye gaze system could be used in controllingWindows programs.I ConclusionIn this paper, a new video based eye gaze detectionsystem was described. Only one camera is used withoutpan-and-tilt base. The system nms at video rate(25frmedsecond) and has a precision of(0.799,1.705) in x and y direcoon, respectively. Anellipse-fitting method was used to accuiately estimatethe pupil center. Me r the calibration process, the usershead movement is tracked by a Kalman filter. Themeasured view vectors are compensated to the correctview vectors by using the head pose information.Finally, the gaze tracking system is implemented in theQuake I game and a Windows calculator.

51413


6/6

, AcknowledgmentsWe would like to thank Myron Flickner and DaveKoons of IBM company for their brainstonujugdiscussions. Yoshinobu Ebisawa for providug relevantreferences and i n f o d o n .References[I ] L. R. Young and D. Sheena, Methods And Designs:S w e y Of Eye Movement Recording Methods.Behauior Research Methods & Instrumenmion Vol. 7,No. 5, pp. 397429, 1975.[2] A Tomono, I. Muneo, and Y. Kobayasbi, A TVcamera system which extracts feature points for non-contact eye movement detection. S P I E vol. 1194.Optics, Illumination. and Image Sensing for MachineVision IV., 1989.[3] Y. Ebisawa and S . Satoh, Effectiveness of pupilarea detection technique using two light sources and theimage difference method. Proceedings of the 15thannual International conference of the IEEE Eng. inMed. & Bwl. Soc., pp. 1268-1269, 1993.[4] Y. Ebisawq Improved vidco-based eye-gazedetection method, Proceedings of the Id annualconference of the IEEE Insbumentation andMeasurement Technology, 1994. Vol. 2, pp. 963 -966,1994.[SI Y. Ebisawa7 Improved video-based eye-gazedetection mcthod, IEEE Transactions onInstrumentation and Measurem en

Gang, W.; Gaze Tracking Using One Fixed Camera

Documents

Transcript of Gang, W.; Gaze Tracking Using One Fixed Camera