ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real...

26
ENGR 9885 ENGR 9885 ENGR 9885 ENGR 9885 Image Processing Image Processing Image Processing Image Processing Real Time Eye Tracking and Blink Real Time Eye Tracking and Blink Real Time Eye Tracking and Blink Real Time Eye Tracking and Blink Detection Using Low Resolution Web Cam Detection Using Low Resolution Web Cam Detection Using Low Resolution Web Cam Detection Using Low Resolution Web Cam Project Report Project Report Project Report Project Report Yassir Nawaz Yassir Nawaz Yassir Nawaz Yassir Nawaz 1 Shiladitya Sircar Shiladitya Sircar Shiladitya Sircar Shiladitya Sircar 1 December 16, 2002 December 16, 2002 December 16, 2002 December 16, 2002 1 Faculty of Electrical and Computer Engineering (Memorial University)

Transcript of ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real...

Page 1: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

ENGR 9885ENGR 9885ENGR 9885ENGR 9885 Image ProcessingImage ProcessingImage ProcessingImage Processing

Real Time Eye Tracking and Blink Real Time Eye Tracking and Blink Real Time Eye Tracking and Blink Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using Low Resolution Web CamDetection Using Low Resolution Web CamDetection Using Low Resolution Web Cam

Project ReportProject ReportProject ReportProject Report

Yassir NawazYassir NawazYassir NawazYassir Nawaz1111

Shiladitya SircarShiladitya SircarShiladitya SircarShiladitya Sircar1111 December 16, 2002December 16, 2002December 16, 2002December 16, 2002

1Faculty of Electrical and Computer Engineering (Memorial University)

Page 2: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

Abstract

In recent years, scientists are aiming to develop non-intrusive man-

machine-interface with vision systems in a constrained domain. This has

led to the emergence of the field of automatic face processing and

identification of visual facial behaviors (such as blinking, smiling, frowning

etc.), which are instinctively inherent to every human being and thus

bringing cognitive sciences closer to computer sciences.

In this project we investigate and implement a computationally efficient

and cost effective solution for blink detection and tracking of eyes in real

time. We also present a framework for research, which can be used to

investigate various strategies for human computer interaction. We

commence with the extraction of human face and its various features

from a low-resolution video stream. These extracted features, in particular

eyes are tracked and monitored in real time to detect any movements

and variations. These variations then form the basis of various actions or

responses. We classify these actions based on the kind of monitoring used

i.e. active and passive.

Active monitoring- Used to generate commands for the computer by the

human. For example a blink could be translated into a mouse click.

Passive monitoring- Used by the computer to monitor the human subject

itself. For example blinks can be counted by the computer to monitor the

blink rate.

Lastly we demonstrate the effectiveness of our technique through

experimental results and provide recommendations for future

development and improvement.

Page 3: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

1 Introduction

Traditional human-computer interaction mechanisms are limited to input and output

devices such as keyboard, mouse, displays and printers. However advances in

technologies have led to the emergence of more friendly and natural interaction

techniques. Speech processing for example has made it possible for the humans to

communicate with their computers verbally and speech recognition and processing

softwares are widely in use today. These new technologies are taking us closer to the

realization of �perceptual intelligence� in machines. Perceptual intelligence is defined as

the ability to recognize or distinguish a state of an object or person by observing several

key parameters. For example a software application which can extract in real time, face

of a person from a web cam image stream can be used to monitor the state of that person.

This information can be processed to determine the orientation of head, approximate

distance from the camera, location of eyes and direction of eyeballs etc. All this

information can then be used to determine whether the person sitting in front of the

computer is looking towards or away from the computer, each being a separate state.

These techniques are being explored for automatic monitoring of pilots during their flight

to their use in advanced interactive robots which would be able to talk to the humans and

attract their attention if they are looking away from them.

In this project we investigate and implement techniques, which can be used to develop

several intelligent applications based on the monitoring of facial features in particular

eyes. Effective monitoring of facial features requires their accurate identification and real

time detection of any variations in them, therefore in this project we focus primarily on

facial feature tracking and variation measurement techniques. We also suggest several

applications which can be built using the techniques presented in this report.

Page 4: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

2 Feature Point Initialization

In order to track a facial feature it must first be identified in an image stream. Several

techniques were explored for feature point initialization:

2.1 Initialization based on skin segmentation

Of all the facial features, eyes are of most interest to us and since eyes are located within

the face, face segmentation is a logical point to start. Face can be segmented based on the

skin color and then further processed to determine the actual location of the eyes. Image

pixels representing skin can have very different RGB values for different people

depending on their skin color, illumination etc, however their chromatic red and

chromatic blue values defined as

chR = R/(R+G+B)

chB = B/(R+G+B)

are clustered in a limited region of (chR, chB) space and are largely independent of skin

color or lighting conditions. Based on their chromatic red and chromatic blue values, the

pixels are classified into skin or non skin pixels. The skin segmentation algorithm works

as follows:

In order to train the algorithm a training set S of skin pixels is obtained. The algorithm

depends heavily on this training set. In order for it to work for various skin colors the

training set should be extensive and must include skin samples from maximum races.

Once the training set has been obtained an experimental frequency histogram of chR and

chB values is obtained. This histogram can be represented as F(chR(u), chB(u)), u ∈ S.

The frequency of observation (chR(u), chB(u)), in the experimental histogram can be

used to model the conditional probability P(u∈ skin | chR(u), chB(u) ) i.e. probability of

pixel u to be a skin pixel given its chR and chB values. If this probability is above a

certain threshold the pixel can be classified as a skin pixel. The skin segmentation

algorithm was implemented and tested on several images and the results are shown below

Page 5: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

Figure 2.1 Skin Segmentation

Figure 2.1 shows an original image and skin segmented image. Most of the skin pixels in

the face have been segmented however the exact locations of eyes can not be determined

from the above image. Therefore while skin segmentation can be used for face detection

it has not been used in the initialization of the eyes in this project. It can be used to reduce

the search space in the subsequent algorithms however skin segmentation suffers from

some serious problems. The major being the presence of a skin colored background,

especially wooden doors.

2.2 Initialization using Blink Detection

In order to get accurate location of the eyes blink-detection based eye initialization

algorithm is used. This algorithm requires the user to look into the camera and blink

without too much head movement. This blink initializes the location of eyes which are

then tracked in the subsequent images by an eye tracker.

2.2.1 Variance Map

When a person blinks the intensity of the pixels that represent the eyes also changes. This

forms the basis of blink detection. A variance map of the change in intensity values in the

corresponding pixels of the two consecutive frames, in an image sequence is created on a

pixel by pixel basis. The steps are:

• Get the first frame I1 of M×N pixels and 8 bit pixel depth, from the image

sequence.

Page 6: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

• Initialize a variance map σ2 of M×N pixels by putting zeros in each pixel

• Initialize a mean image µ by assigning the pixels of the first image to its

corresponding pixels so that

µ1(x,y)=I1(x,y)

• Get next frame I2 and update the mean image µ and variance map σ2 by using the

recursive formulas given in equation 2.1 and 2.2. The recursive formula computes

the variance and mean on a pixel by pixel basis by taking into account the

variance and mean of the last j pixels.

1

),(),(),( 1

1 ++

= ++ j

yxIyxjyx jj

j

µµ (2.1)

21

221 ))(1(),(11),( jjjj jyx

jyx µµσσ −++

−= ++ (2.2)

• The variance map is thresholded after updating it with the new frame. The

threshold value used in this project is 255. All the pixels that have the value of

255 remain unchanged and the rest are set to zero. Figure 2.2 shows two images

taken from a blink sequence and the corresponding variance map generated.

Figure 2.2 Variance map generated by a blink

2.2.2 Blob Analysis

Once a binary variance map is generated it is then subjected to blob analysis to find the

blobs which represent the eyes. Prior to blob analysis the binary image is opened to

Page 7: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

remove pixel noise and very small connected components. The algorithm then scans the

entire image and retrieves all the contours or connected components. Each component is

then analyzed to determine if it resembles an eye blob. If the number of potential eye

blobs is two they are further processed to ensure that they are of right size and shape. If

exactly two blobs do not qualify, the variance map is rejected and new variance map is

generated. Depending on the head movement or background motion additional blobs may

be generated and therefore the qualifying criterion is very rigid to eliminate these

unwanted blobs. . Figure 2.3 shows the variance map after the removal of small

connected components.

Figure 2.3 Removal of pixel noise and small connected components

The statistical operations performed on the blobs to determine their eligibility as eye

blobs include their area, centroid, minimum and maximum extent of the blob and

compactness. The horizontal and vertical distance between two blobs must also be within

a certain range. In this project a minimum horizontal spacing of 25 pixels and a

maximum vertical spacing of 20 is permissible. Blobs are also rejected if they contain

more than 200 or less than 50 pixels.

After two blobs have been selected the horizontal and vertical distance between their

centers is calculated to determine the approximate distance of the face from the camera as

well as the orientation of the head from the line�s slope.

Page 8: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

The complete flow diagram for blink-detection based feature point initialization

algorithm is given in fig 2.4

Figure 2.4 Flow chart of the Blink detection algorithm

Page 9: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

3 Feature Point Tracking

Feature point tracker has been implemented using Lucas& Kanade optical flow based

algorithm. The tracker takes as input two images and the coordinates of the feature points

to track in the first image. It then returns the coordinates of the same feature points in the

second image. The tracker takes a new image in each iteration and returns the location of

desired feature points in that image using the information from the prior image.

3.1 Optical Flow and Lucas Kanade Algorithm

The Lucas & Kanade algorithm is an optical flow based technique which relies on the

"constant brightness" assumption i.e. the brightness of an object will remain the same

between successive frames. This assumption is usually valid because images are usually

shot in rapid succession. Suppose we wish to find the motion of a group of pixels

between two images. Ideally we would like to find the motion parameters of the pixels

such that if we apply the motion to the pixels the brightness, or image intensity, will be

the same.

3.1.1 Optical Flow

Let us define the brightness of an image as I(x, y, t) at coordinates x = (x, y) at time t.

Now let us consider a point P which undergoes a small displacement in time dt, such that

the new position of P is given as (x + dx, y + dy). By using the assumption of optical

flow the brightness of point P at new location (x + dx, y + dy) remains unchanged.

I(x, y, t) = I(x + dx, y + dy, t+dt)

Hence

0=dtdI

Using chain rule for differentiation

0=∂∂+

∂∂+

∂∂=

dtdt

tI

dtdy

yI

dtdx

xI

dtdI

Page 10: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

0=∂∂+

∂∂+

∂∂=

tI

dtdy

yI

dtdx

xI

Let u =dtdx and v =

dtdy

where u = (u, v) is a velocity vector, giving the projection velocity of the point P on the

image plane. By substituting the values of u and v the optical flow constraint equation is

obtained.

Ixu + Iyv +It = 0

Here u and v are the components of the optical flow field in x and y direction

respectively. However, this equation alone does not suffice for the computation of the

flow components and additional constraints are required. Several approaches have been

proposed to obtain additional constraints for the solution of the optical flow constraint

equation. A modified version of Lucas & Kanade approach which has been used in this

project is explained below.

3.1.2 Lucas-Kanade Feature Tracking Using Pyramidal Image Representation

In order to obtain the location of the feature points in an image sequence to sub-pixel

accuracy a modified version of Lucas and Kanade approach, with a pyramidal image

representation is employed as given in [Bouguet 1999]. This has the advantage of giving

accurate results even for a relatively large movement of the features in an image sequence

and elaborated below.

Consider a feature point P at a location x = (x, y) to be tracked by Lucas-Kanade

approach. The intensity function I(x, y) gives the brightness of P. Assume that P

undergoes a small motion, such that its new coordinates are (x + dx, y + dy) in the next

image. Due to slight variation in the intensity of the feature point P (may be due to noise

or change in lighting conditions etc.) from the previous image to the next one, a residual

function e, which minimises the difference between the intensities I(x, y) and I(x + dx, y

Page 11: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

+ dy), is used to compute the new position x + dx = (x + dx, y + dy) of the feature point

P in the new image.

ε = (I(x,y)-I(x + dx, y + dy))2

Because of the aperture problem (if a window is used such that only one point is visible,

the motion of that point cannot be determined, but only the sense of motion is observed

and this is called the aperture problem), a larger window has to be employed and hence

the notion of 2D neighbourhood is defined. This larger window is the neighbourhood

region R(x) of the point x. This becomes the problem of minimization of the residual

function e for the entire region R. This is expressed as:

∫∈

+−=)(

2))()((xRx

dxxIxIε

In discrete terms, when the width of the integration window is (2wx + 1) and height of

the integration window is (2wy + 1), we have:

∑ ∑+

−=′

+

−=′+′+′−′′=

x

x

y

y

wx

wxx

wy

wyydyydxxIyxI 2)),(),((ε

For the pyramid representation, consider the two images I and J, such that the temporal

distance between them is dt. The objective is to find feature point P, which is at a

position (x, y) in I and moves to a new position (x + dx, y + dy) in J. The pyramidal

representation of I at zeroth level is I0 with size nx× nx, which is the original image itself.

The next levels of I, such as I1, I2, I3, I4, I5,�� are built in a recursive manner: I1 is

computed from I0, I2 from I1 and so on, where the size of the image at level n is one-

fourth of the image at pyramid level n � 1. A similar pyramid representation is built for J.

For tracking of P, its coordinates are evaluated for the pyramid at the highest-level n and

an initial guess g n = (g n x, g n y) is assumed to minimise the residual function en at level

Page 12: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

n and the remaining dn = (dn x, dn y) is computed, which gives the displacement of the

feature point at level n. Usually g n x and g n y are initialised with zero. It is done as:

∑ ∑+

−=′

+

−=′++′++′−′′==

x

x

y

y

wx

wxx

wy

wyy

ny

ny

nx

nxnn

ny

nx

nnn dydgydgxJyxIddd 2)),(),((),()( εε

The computed result is propagated down to level n � 1, to form the new initial guess,

which is given as:

gn-1 = 2(gn + dn)

This process is continued, until level zero is reached, and d = (dx, dy) is calculated as

d = g0 + d0

The solution can be represented in a single equation as:

∑=

=n

i

ii dd0

2

The advantage this technique offers is the fact that d i can be kept small, whereas overall

displacement d of the feature at point P may be large [Bouguet 1999]. This project

requires accurate tracking of the eye-feature points to sub-pixel accuracy located by blink

detection module. Therefore, this approach is used, as it is robust to relatively larger

motion of the head in context of tracking eye-feature points.

3.1.3 Selection of good feature points

The choice of feature points can affect the performance of the Lucas & Kanade feature

tracker. A number of points around the eye blob can be selected for tracking. Since the

tracker examines the neighborhood of the desired feature points, a good neighborhood is

important. It has been determined experimentally that inner corners of the eyes provide

good and robust tracking. However sideways movement of the head results in slight error

in the location of the feature points. In order to counter this problem another feature point

between the eyes and above the nose (i.e. between the two eyebrows) is also tracked. It

Page 13: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

can be obtained by calculating the middle of the line which connects the center of the two

eye blobs. This point is very robust and can tolerate significant sideways movement.

Since the objective is to track the eyes, distance of this point from the inner corner of the

eyes is constantly examined to detect and correct any errors due to slight shift in the

feature points representing inner corner of the eyes.

Figure 3.1 shows images taken from an image stream during the tracking of the feature

points. The blue rectangles indicate the approximate position of the eye during the motion

of head in various directions.

Figure 3.1 Eye tracking using optical flow

4 Blink Detection

As explained in section 2 Blink detection has been used to initialize the feature points i.e.

eyes. The same algorithm is used for blink detection also however now the region of

interest has been narrowed down to two rectangular boxes which contain the two eyes.

The coordinates of the two eye boxes are passed to the blink detection algorithm which

builds the variance map and detects a blink if significant number of pixels register a

variation in the thresholded variance map.

4.1 Voluntary and Involuntary Blinks

Since blinking is a natural phenomenon we must distinguish between voluntary and

involuntary blinks if we want to use blinks for interaction with the computer. For

example if we want to simulate a mouse click using a blink then we must ensure that

involuntary blinks are not interpreted as mouse clicks. The logic used for making this

distinction is explained below.

Page 14: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

A blink sequence can be divided into three types of images, one with open eyes, then

closed eyes followed again by open eyes as shown in fig 4.1. When the eyes close a

variance map is generated. As long as they remain closed no significant intensity variance

is registered in the eye boxes, however when they open again it results in the generation

of another variance map. Therefore a blink sequence generates two variance maps spaced

apart in time equal to the duration of the blink. This time t is measured and if it is less

than a certain threshold the blink is classified as involuntary blink. For a blink to be

interpreted as a voluntary blink in this project it must last for at least 1 second.

↓ ↓

Variance Map ←..�t�...→ Variance Map

Figure 4.1 Involunatary and voluntary blink detection

5 Eyeball Detection

Eyeball detection algorithm receives the coordinates of two eye boxes and finds the

location of eyeballs within these boxes. Several techniques were considered for the

purpose of eyeball detection. Geometrical pattern search or ellipse fitting techniques

provide very good results however they are computationally intensive and were rejected

in favor of circular hough transform. Circular hough transform itself is computationally

intensive however it can be improved considerably by pre processing the eye image. This

pre processing reduces the number of pixels which can vote in the hough transform

making it much faster.

Page 15: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

5.1 Pre processing of eye image

The purpose of pre processing is to minimize the number of on-pixels in the binary image

on which hough transform is computed. An ideal image for hough transform will only

contain a circle representing the outline of the eyeball, however it is difficult to obtain

and attempt is made to remove unwanted pixels without removing the eyball pixels. The

pre processing steps are

5.1.1 Color Filtering

Three channel RGB image is converted to a single channel image for edge detection. It

has been found experimentally that green channel image provides best results for edge

detection even though difference with other channels is not huge.

5.1.2 Histogram equalization

The single channel image is equalized to compensate for different lighting conditions.

Since eye image contains two regions of comparable areas and distinct intensities, one

representing the eyeball and the other white area around it, equalization tends to highlight

the edges of the eyeballs.

5.1.3 Edge Detection

The green channel image is then subjected to edge detection to get an outline of the

eyeball. Since the eyeball pixels have different intensity than the white pixels around it an

edge is readily obtained however it is occluded. A 3 by 3 vertical prewitt operator is used

for this purpose. Horizontal prewitt operator was also tried but since the outline of the

eyes form horizontal edges, extra edges appear in the edge image. Vertical operator

returns the vertical edges which are sufficient for hough transform as all the pixels are not

required to obtain the best fit circle.

Page 16: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

5.1.4 Morphological operations

Some morphological operations i.e. thinning and removal of small connected components

were also considered however not much improvement in the accuracy and efficiency

hough transform was obtained.

The steps explained above are shown in figure 5.1

a b c d

5.1(a) Original eye image 5.1(b) Green channel intensity image

5.1(c) Equalized Image 5.1(d) 3×3 Vertical Prewitt operator

Figure 5.1 Pre processing of eye image for Hough transform

5.2 Hough Transform

Hough transform can be used to extract various features from an image. The transform

takes the geometric equation of the feature to be found and then inverts it so that x and y

become constants in the equation thus resulting in an inverse function space. The

intersections in this space reveals the constants associated with the equation of the

features discovered in the original image.

Consider the example of a line. The equation of a line in slope intercept form is given as

y = a.x + b

And its hough transform can be written as

b = -x.a + y

From the above equation a second image, which is an array of accumulator cells, is

obtained. In this image each point in the original image becomes a line. The lines are

added to the transform one by one and every time there is an intersection the value of that

pixel is incremented by 1. then by looking at the values of the cells in the transform, we

Page 17: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

can determine through which points the most lines cross. The �a� and �b� values of these

points in the transform give us our required constants i.e. slope and intercept.

Figure 5.2.1 Hough transform in point slope form of a line

Similarly instead of slope intercept model, normal model can also be used with the

following standard representation.

x.cosθ + y.sinθ = ρ

By stepping through the values of θ from �π to π a number of sinusoidal curves can be

derived in the ρθ plane. The accumulator cells with the highest values will indicate the ρ

and θ of the detected line. If multiple lines are to be found a threshold can be established

in the hough transform to allow any feature with more than a certain number of points

comprising its shape to be determined.

5.2.1 Circular Hough Transform

Hough transform can also be applied in the extraction of other geometrical objects such s

circle, ellipse etc. The procedure is very similar to the hough transform of a line however

the transform space will become a multidimensional space depending on the number of

constants in the representation of the object. Consider the equation of a circle which is the

shape of interest to us.

( ) ( ) 222 rbyax =−+−

Page 18: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

The above equation can also be written in the following form

a = x + r.cosθ

b = y + r.sinθ

θ = 0.. 2 π

Since there are three variables in the transform the resulting transform space will be three

dimensional containing cones. The figure below shoes a 2D and a 3D view of the circular

hough transform

Figure 5.2.2 Circular Hough transform in 2D and 3D

5.2.2 Pseudocode for Circular Hough transform

In order to find the eyeball we use circular Hough transform effectively. The procedure

consists of taking each pixel of the image and finding the values of a, b according to the

above equations. However we must vary two other parameters i.e. radius r and θ. The

range for r depends on the size of the circle we want to extract and θ must be varied from

0 to 2π in small steps. Once we find the transform T(a,b) we can update the three

dimensional Hough transform array H. Once the transform has been computed pixel or

pixels corresponding to maximum count can be find out to get the centre and radius of the

circle. The algorithm is illustrated below. Figure 5.2.3 shows the extraction of eye ball

from eye image using Hough transform. The image has been equalized and vertical

prewitt operator has been used to get the approximate outline of the eyeball as explained

Page 19: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

in section 5.1. Hough transform is then performed on this edge image to obtain the centre

of the eyeball as shown in the figure.

Circular Hough Transform Pseudo code

for i = 1.. X

for j=1..Y

if IMG(X,Y) > 0

for r = 1 .. Rmax

T=zeros(X,Y)

for θ = 0 .. 2π

a = I + r.cos θ % Find a

b = j + r.sin θ % Find b

if (a ≥ 1) & (a ≤ X) & (b ≥ 1) & (b ≤ Y)

T(a,b) = 1

end if

end for

H(: , : , r) = H(; , ; , r) + T(; , ; , 1);

end for

end if

end for

end for

% Loop through transform to find center.

for i = 1 .. X

for j = 1 .. Y

for r = 1 .. Rmax

if H(i, j, r) ≥ Threshold

C = [C: i, j, r, H(i, j, r)] % Increment the intersect pixel

end if

end for

end for

end for

Page 20: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

5.2.3 (a)

5.2.3 (b)

5.2.3 (c)

Figure 5.2.3 Circular Hough transform to locate eyeball

5.2.3 Trade offs in Hough transform

Circular Hough transform is computation intensive as explained above. The amount of

computations depend on the number of pixels which are on or are allowed to vote in the

transform. By reducing the number of these pixels hough transform efficiency can be

vastly improved. On the other hand minimization of these pixels also results in the

removal of certain pixels which are part of the eyeball edge thus reducing the chances of

accurate detection. The size of the eye image therefore is also very important. In this

project low resolution camera has been used and the eye image has very few pixels. A

camera with increased resolution might result in better edge detection and circle fitting,

but the computation time for hough transform will increase considerably.

Another important factor is the radius of the circle that is being searched. If the exact

radius is known we only have to search for one circle however in the absence of the

knowledge of circle�s radius beforehand, hough transform has to go through several

Page 21: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

iterations to find the best circle. In this project distance between the camera and the

subject�s face is not fixed, therefore radius of the eyeball may change with the head

movement towards or away from the camera. One solution is to make hough transform

search through a range of radii however this will increase the computations and therefore

the time taken by the hough transform which is unacceptable. In order to keep the search

space small the desired circle radius has been made a function of the eye box itself. This

assumption is safe because the size of the eye box will also increase or decrease as the

face moves towards or away from the camera respectively. The eye ball radius is

assumed to be one fourth of the height of the eye box and the hough transform search for

the circles with radii within ±3 pixels of this radius.

6 Eyeball tracking

Hough transform can be used to locate eye ball in each eye image thus determining the

location of the eye ball. This technique however is computationally intensive and slows

down the throughput of the algorithm. The advantage of computing hough transform on

the other hand is that we do not have to worry about the loss of tracking. If the hough

transform fails to find a circle in one image, this will not affect the transform computation

in the subsequent images.

Another technique is to use Lucas & Kanade feature tracker explained in section 3.1 to

track the eyeball once its center and radius has been obtained through circular hough

transform. This vastly improves the speed of the algorithm however the tracking will be

lost upon the loss of a feature point representing the eyeball. This feature point could be

lost due to a blink or a rapid movement of the eyeball or eye itself. This problem can be

addressed by constantly monitoring the tracked feature points to determine if the tracking

has been lost, in which case hough transform can be used to re initialize the eyeball

location. The best feature points for eyeball tracking are the center of the eyeball and its

two outer edges as shown in figure 6.1. The relative distance among these three points

should remain constant as the eyeball moves within the eye. A significant change in the

above distance will indicate a loss of a feature point.

Page 22: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

Both techniques have been implemented in this project and have their own advantages

and disadvantages as explained above. User has the option of selecting the tracking

criteria according to the application requirement.

Figure 6.1 Good feature points to track

7 Gaze Estimation

Once the centre of the eye (iris) has been located we can estimate the gaze of the human

subject by determining the relative position of the iris within the eyebox. Figure 7.1

shows a gaze vector diagram where the eyebox is shown along with the eyeball in

Cartesian coordinates. B is the bottom left corner of the eyebox and represents the origin

of the diagram . Since we already know the coordinates of points A, C and D, we can

easily determine vector A. We regard vector A as the �gaze vector�. The length and the

angle of the vector with respect to line BD indicates the direction in which the subject is

looking.

For this project we have assumed the movement of eyeball on a two dimensional surface

however due to the curvature of the eye the length of the �gaze vector� is not directly

proportional to the distance of the iris from the bottom left corner of the eye. A

geometrical model which takes into account the curvature of the eye can be used for more

accurate implementations.

Page 23: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

Figure 7.1 Gaze Estimation

8 Final Results

Figure 8.1 and 8.2 show various images from two tracking videos. Figure 8.1

demonstrate the tracking of the eyes along with between the eyes feature point. The

second circle demonstrates the tracking of the right eye ball during the movement of head

in various directions.

Figure 8.2 shows approximate end points of the eyes in addition to the eyeball. The

images indicate successful tracking of the eyes as well as the eyeball for normal motion

of the head and eyes. It has been found experimentally that as long as the two eyes are in

view of the camera satisfactory tracking results are obtained, however turning of head

sideways more than 45 degrees result in the loss of tracking for one eye.

The error in eyeball tracking is within 2 pixels i.e. the centre of the circle obtained

through hough transform is off by the actual center of the eyeball by one to two pixels

which is acceptable for this project.

The accuracy of blink detections after feature point initialization is over 90% however,

blink detection used in initialization is less accurate. This results from the fact that search

area consists of entire image and blobs must qualify strict criteria to be considered as eye

blobs. Since this detection is performed once it does not affect the accuracy and

performance of the subsequent steps in the algorithm.

Page 24: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

Figure 8.1 Eye tracking

Figure 8.2 Eye tracking

Page 25: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

9 Conclusions and Recommendations

The results for blink detection are satisfactory for this project. Initial blink detection can

be improved by using skin segmentation algorithm to locate the face thus reducing the

search space for blink detection algorithm.

The eye tracker can tolerate up to 45 degrees head turn in either direction. This can be

further improved by using �between the eyes� feature point. This point can tolerate up to

90 degrees rotation of the head and can be used to reinitialize the lost feature points by

using the distance measurements from the other �tracked� eye. The other alternative is to

reinitialize all the feature points using blink detection.

Hough transform gives satisfactory results for this project where a very low resolution

web cam has been used. The images shown above have a resolution of 320×240. If more

accuracy is required i.e. very subtle eyeball movements need to be detected a high

resolution web camera can be used. This will however increase the number of pixels

processed by the hough transform thus reducing the overall speed. Additional hardware

to increase the computing power may be used to offset the additional calculations.

Overall the eye tracker and blink detector implemented in this project meet our initial

goals and expectations. They can be used to built various intelligent applications some of

which are suggested below

• Tracker and blink detector can be interfaced with a PC mouse so that cursor can

be moved on the screen using eye movements. Voluntary blink detections can be

used to simulate mouse clicks thus enabling a hands free browsing. This

application can provide severely disabled individuals who are unable to operate

ordinary mouse, use personal computers.

• Various human monitoring applications measure the blink rates of human to

determine several biological parameters. This software can be used to monitor and

detect the blink rate of a person automatically.

Page 26: ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using

• This software can form the basis of applications used in advanced interactive

robots that have the ability to converse with humans. These robots must have the

ability to locate a human face and determine where they are looking to determine

their response.