Gang, W.; Gaze Tracking Using One Fixed Camera

download Gang, W.; Gaze Tracking Using One Fixed Camera

of 6

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Gang, W.; Gaze Tracking Using One Fixed Camera

  • 8/7/2019 Gang, W.; Gaze Tracking Using One Fixed Camera


    Seventh lournational Comlerence on Comtrol,Automition,Robotics And Vbloa (ICARCV'OZX Du 2002. Singapore

    Gaze Tracking Using One Fixed C ameraWen Gang

    Department of Electrical and Computer EngineeringNational University of Singapore

    Blk E4 h M5-03,Engineering Drive 3, Singapore 117576Email

    AbstractIn this paper, a nonwutact comealpupil reflectionscheme using only one 6 x 4 camera to track the eyegaze is presentedA sma l l manual focus lens is used in acamera without a pan-and-tilt base. A connectedcomponent labeling algorithm is designed to detect thepupils. The, d e n t information is utilized to find theprecise pupil center. The effect of pupil detectionprecision to serem point estimation precision is alsodiscussed..Mer the calibration process, the headmovement is tracked by a Kalman filter. Weexperimented with the head roll compensation by usingg eo mh c information. The whole system MO NU inreal time at video rate (25fnmes/second). This pilotstudy shows that using a ked camera has its ownadvantage in dynamic property and MO achieve thesame precisionas more complex systems do.1 IntroductionUsing eye ga7z to w-1 a wmputer has manyadvantages. First, the eye can move vay quickly incomparison to other parts of the body. Furthermore,when we want to activate an icon we usually look at theicon 61% and then we move the mouse cursor to thepoint of interest and click to confirm the action.Therefore, if we can predict which point the user islooking at on the screen, there wi l l be no need to movethe cmor anymore, thereby improving the interface andpossibly also reducingktigue. Lastly, it is more naturalto use the eye gaze to, get information in human-wmputer interactions, especially in virmal realitysystems.Many methcds have heen proposed to estimate the eyegaze direction of a USCT looking at a point on the scrcenYoung and Shema give a detailed description ofdifferent gaze t r a c k techniques [I]. These methodscan be divided into two categories: wntact andnonwntact methods. Among the noncontact methods,Tomono et al. [2] design a system which is wmposed of3 CCD cameras and 2 near infrared light sources withdifferent wavelengb. Ebisawa designed a system usmgtwo fiared LEDs to detect the pupils [3], and laterimproved it [4] [SI.He implemented a real-time pupildetectionsystem using custonnzed hardware and a pupilbrightness stabilization scheme.In the noncontact schemes mentioned above, usuallyone camera is installed on a pan-and-tilt base so that thecamera and the infrared LEDs point directly into thefacial plane. The moving pan-and-tilt base makes the

    system complex and distracts the user's attention Thesesystems either use a galvo mirror assembly that is fastenough to achieve real-time p er fom awe or use pan-and-tilt cameras which are not fast enough.The gaze trackingsystem described here can do withoutwith the pan-and-tilt base and uses only one camera,hence"gthe system very simple. It also achievesreal time, video rate, processing.The paper is shuchued as follows: the hardwarecon6guration and the corneal refledion method for gazedirection estimation are presented in Section 2; precisepupil detection is desQibed in Scetion 3; pupil trackingappears in Secbon 4; head roll compensation isintroduced in Section 5 ; applicatim is contained inSection 6 and finally the conclusion in Section 7.2 System description2.1 Eardware configuration1

    Figure 1 Hardware setup.An infrarsd sensitiveB/W CCD camera1AICV-MSOIRis used This camera is placed at the bottom-center ofthe screen and is pointing upward to the user's face as isshown in Figure I . The lens is a 16mm fixed focus lens.LED sets LED1 and LED2 a n mounted coaxially withthe camera lens and located directly below the screen onthe vertical axis defined by the screen center. The innercircle of LEDs (LEDI) is mounted m front of thecamera lens to achievethe bright pupil image. The outercircle of LEDs (LED2) is fixed coaxially with the lensbut mounted farther from the optical center. The thirdLED set LED3 is located at the left-center of the


  • 8/7/2019 Gang, W.; Gaze Tracking Using One Fixed Camera


    . .

    . .

    monitor. An icedpass tilter M&K #078 is used toGlter out the visible light,1.2 Theory for gaze direction estimationThe gaze direction is estimated by following the foursteps described below.1) Find pupil centerThe common comeal reflection method is to use one orseveral infrared sensitive cameras to capture thereflected in6ared light coming from the anterior surfaceof the comea and the retina [6]. In our wnfigUratiosLEDl illuminates at odd Gelds. LED2 and LED3illuminate at even fields. The coaxial LEDl willgenerate a bright pupil and dark iris image (Figure 2(a))because the infrared lighl is reflected %om the retina.The LED2 and LED3 will generate a dark pupil image(Figure 2@)) because the infrared light emitted into thepupil cannot be reflected out to the camera due to theinput angle of the infrared light. These two images aresubtracted to gd the difference image. Mer getting theabsolute difference image, only a certain ratio of pixelsare retained. Then a connected component-labelingalgorithm (CCLA) is designed to group all theremaining pixels [7]. To increase the programsrobustness, five largest pupil candidates are sclected.The &dates which are line-like are eliminatedbecause the real pupils are ellipses instead of lines.Since each Geld has only half the lincs in a 6a,me, theideal pupil inage will appear as an ellipse with a heightto width ratio (W )of 0.5. The candidates whoseHTWR are too far away f 0.5 are also eliminatedand only three best candidates are remained. The b e ecandidatescan f o m three different pairs. The final pairofpupils is found according to three properties: The firstproperty is the difference in size (pixel number). Thesccond propeaty is the differ- between the HTW Rsum and 1. The third property is the difference inorientation. If a pair has the smallest value in at leasttwo properties, this pair is the final result Or else, thepair with the smallest size difference is selected. Onedetected o u d in the absolute diffekce image after

    ( 4 @) (C)Figure 2 (a) Bright pupil;@) Dark pupil; (c) Detected pupil.

    2) FindglintcenterMer getting pupil centers, the glint centers aredetected. The glint is the light reflected f the comeawhich is caused by LED2 and LED3. These glintsappear bright in the dark pupil image. Hence, the darkpupil image is utilized to get the glint centers. First, thebrightest pixel is kept which corresponds to Glint1 as

    shown in Figure 3 due to LED2 is closer to the usathan W D 3 .

    Figure 3 view vector s&ematicSecond, a small region on the upper left of Glintl isdefmed and the brightest pixel in this region is kept. Thecenter of this pixel corresponds to Glina. Hence theintersection point of the horizontal line passing Glintland the vatical line passing Glint2 is found for eachpupil.3) Fmd view vectorA view vector is defined as the vector &om the glmtcenter intersection point ( 02 ) to the pupil center (03)which corresponds to the vector 01 in Figure 3 . Toillustrate the meaning of the n ew vector, we assumefirst that the head is still. Then an eye can only rotate inits socket. The depth variation caused by the eyemovement ULO be ignored when the subject sits farenough from the camera (ex. 500 ) . Hence the eyemovement can be simplified as a planar movement.When the infmed light is reflected from the corne+ itwill form a bright and small point in the c~meraimage(glint). In our scheme,GintI and G W (Figure 3)are glintcc in the -era image which arc imagesof the corneal reflection light emitted h m LED2 andLED3 (Figure I) , respectively. Considering that LED2and LED3 are located at the center of two sides of thescreen,the verticalaxisdeGned by LED2 and horizontalaxis defmed by LED3 will intersect at the center of thescreen. Then the intersection point (02) of the two axesdefined by the two glmts ( Gl i t l and Glint2) is thescreen center in the camera image as shown in Figure 3.Therefore, the view vector V , is a measure of the eyemovement relative to the screen center [8]. vI = Owhen the user looks directly at the center of the screen.The vector v 2 fromthe camera imagecenter (01) to thescreen center in camera image (02) is a reflection of thecurrent head position relative to the screen center.v 2 = 0 when the bead is positioned directly in h u t ofthe screen center.Thus, the head movement and the eyemovement are separated.

    * Refer to the dark pupil image in Figure 2@).


  • 8/7/2019 Gang, W.; Gaze Tracking Using One Fixed Camera


    Since there are two pupils, theie will be two viewvectors. The average of the two view vectors is used asthe 6nal view vector.4) Map to screen coordinatesThe view vector needs to be mapped to screencoordinates. For this, a calibration pattem is designedwith nine small sqnares (30x30 pixels) distributedevenlyon the screen (Figure 4(a)).I l l I

    (4 @)Figure 4 (a)Calibration image; @) Test image.The resolution of the screen is 1024x768 pixels. whenthe user presses a button while gazing at each of thesqnare, one screen wordinate ( x , , y, ) e d view vector:( x v , y v ) pair is obtained for each square. Thetransformation h m the view vector to screencoordinates is taken to be a pair of simple second orderpolynomialas:X J =ao C a l x , + a 2 y , + n p r , y , +a4x; + a 5 y ;y, =a6 +U,*" +u*y, +n ,x ,y , + q 0 x f +UIIY:

    ( I )(2)

    where ai (i=O,lJ,...,ll)needtobefoundAs each correspondmg point yields two equations,ninepoints will give us 18 qualions. Since there are only I 2unknowns, the coefficientsare solved by least squaresmethod.3 Algorithm forprecise pupil detectionIn order to 6nd the center of the pupil, the regionrepresents t