ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real...
Transcript of ENGR 9885 Image Processingsircar/project1_files/eyetracking.pdf · ENGR 9885 Image Processing Real...
ENGR 9885ENGR 9885ENGR 9885ENGR 9885 Image ProcessingImage ProcessingImage ProcessingImage Processing
Real Time Eye Tracking and Blink Real Time Eye Tracking and Blink Real Time Eye Tracking and Blink Real Time Eye Tracking and Blink Detection Using Low Resolution Web CamDetection Using Low Resolution Web CamDetection Using Low Resolution Web CamDetection Using Low Resolution Web Cam
Project ReportProject ReportProject ReportProject Report
Yassir NawazYassir NawazYassir NawazYassir Nawaz1111
Shiladitya SircarShiladitya SircarShiladitya SircarShiladitya Sircar1111 December 16, 2002December 16, 2002December 16, 2002December 16, 2002
1Faculty of Electrical and Computer Engineering (Memorial University)
Abstract
In recent years, scientists are aiming to develop non-intrusive man-
machine-interface with vision systems in a constrained domain. This has
led to the emergence of the field of automatic face processing and
identification of visual facial behaviors (such as blinking, smiling, frowning
etc.), which are instinctively inherent to every human being and thus
bringing cognitive sciences closer to computer sciences.
In this project we investigate and implement a computationally efficient
and cost effective solution for blink detection and tracking of eyes in real
time. We also present a framework for research, which can be used to
investigate various strategies for human computer interaction. We
commence with the extraction of human face and its various features
from a low-resolution video stream. These extracted features, in particular
eyes are tracked and monitored in real time to detect any movements
and variations. These variations then form the basis of various actions or
responses. We classify these actions based on the kind of monitoring used
i.e. active and passive.
Active monitoring- Used to generate commands for the computer by the
human. For example a blink could be translated into a mouse click.
Passive monitoring- Used by the computer to monitor the human subject
itself. For example blinks can be counted by the computer to monitor the
blink rate.
Lastly we demonstrate the effectiveness of our technique through
experimental results and provide recommendations for future
development and improvement.
1 Introduction
Traditional human-computer interaction mechanisms are limited to input and output
devices such as keyboard, mouse, displays and printers. However advances in
technologies have led to the emergence of more friendly and natural interaction
techniques. Speech processing for example has made it possible for the humans to
communicate with their computers verbally and speech recognition and processing
softwares are widely in use today. These new technologies are taking us closer to the
realization of �perceptual intelligence� in machines. Perceptual intelligence is defined as
the ability to recognize or distinguish a state of an object or person by observing several
key parameters. For example a software application which can extract in real time, face
of a person from a web cam image stream can be used to monitor the state of that person.
This information can be processed to determine the orientation of head, approximate
distance from the camera, location of eyes and direction of eyeballs etc. All this
information can then be used to determine whether the person sitting in front of the
computer is looking towards or away from the computer, each being a separate state.
These techniques are being explored for automatic monitoring of pilots during their flight
to their use in advanced interactive robots which would be able to talk to the humans and
attract their attention if they are looking away from them.
In this project we investigate and implement techniques, which can be used to develop
several intelligent applications based on the monitoring of facial features in particular
eyes. Effective monitoring of facial features requires their accurate identification and real
time detection of any variations in them, therefore in this project we focus primarily on
facial feature tracking and variation measurement techniques. We also suggest several
applications which can be built using the techniques presented in this report.
2 Feature Point Initialization
In order to track a facial feature it must first be identified in an image stream. Several
techniques were explored for feature point initialization:
2.1 Initialization based on skin segmentation
Of all the facial features, eyes are of most interest to us and since eyes are located within
the face, face segmentation is a logical point to start. Face can be segmented based on the
skin color and then further processed to determine the actual location of the eyes. Image
pixels representing skin can have very different RGB values for different people
depending on their skin color, illumination etc, however their chromatic red and
chromatic blue values defined as
chR = R/(R+G+B)
chB = B/(R+G+B)
are clustered in a limited region of (chR, chB) space and are largely independent of skin
color or lighting conditions. Based on their chromatic red and chromatic blue values, the
pixels are classified into skin or non skin pixels. The skin segmentation algorithm works
as follows:
In order to train the algorithm a training set S of skin pixels is obtained. The algorithm
depends heavily on this training set. In order for it to work for various skin colors the
training set should be extensive and must include skin samples from maximum races.
Once the training set has been obtained an experimental frequency histogram of chR and
chB values is obtained. This histogram can be represented as F(chR(u), chB(u)), u ∈ S.
The frequency of observation (chR(u), chB(u)), in the experimental histogram can be
used to model the conditional probability P(u∈ skin | chR(u), chB(u) ) i.e. probability of
pixel u to be a skin pixel given its chR and chB values. If this probability is above a
certain threshold the pixel can be classified as a skin pixel. The skin segmentation
algorithm was implemented and tested on several images and the results are shown below
Figure 2.1 Skin Segmentation
Figure 2.1 shows an original image and skin segmented image. Most of the skin pixels in
the face have been segmented however the exact locations of eyes can not be determined
from the above image. Therefore while skin segmentation can be used for face detection
it has not been used in the initialization of the eyes in this project. It can be used to reduce
the search space in the subsequent algorithms however skin segmentation suffers from
some serious problems. The major being the presence of a skin colored background,
especially wooden doors.
2.2 Initialization using Blink Detection
In order to get accurate location of the eyes blink-detection based eye initialization
algorithm is used. This algorithm requires the user to look into the camera and blink
without too much head movement. This blink initializes the location of eyes which are
then tracked in the subsequent images by an eye tracker.
2.2.1 Variance Map
When a person blinks the intensity of the pixels that represent the eyes also changes. This
forms the basis of blink detection. A variance map of the change in intensity values in the
corresponding pixels of the two consecutive frames, in an image sequence is created on a
pixel by pixel basis. The steps are:
• Get the first frame I1 of M×N pixels and 8 bit pixel depth, from the image
sequence.
• Initialize a variance map σ2 of M×N pixels by putting zeros in each pixel
• Initialize a mean image µ by assigning the pixels of the first image to its
corresponding pixels so that
µ1(x,y)=I1(x,y)
• Get next frame I2 and update the mean image µ and variance map σ2 by using the
recursive formulas given in equation 2.1 and 2.2. The recursive formula computes
the variance and mean on a pixel by pixel basis by taking into account the
variance and mean of the last j pixels.
1
),(),(),( 1
1 ++
= ++ j
yxIyxjyx jj
j
µµ (2.1)
21
221 ))(1(),(11),( jjjj jyx
jyx µµσσ −++
−= ++ (2.2)
• The variance map is thresholded after updating it with the new frame. The
threshold value used in this project is 255. All the pixels that have the value of
255 remain unchanged and the rest are set to zero. Figure 2.2 shows two images
taken from a blink sequence and the corresponding variance map generated.
Figure 2.2 Variance map generated by a blink
2.2.2 Blob Analysis
Once a binary variance map is generated it is then subjected to blob analysis to find the
blobs which represent the eyes. Prior to blob analysis the binary image is opened to
remove pixel noise and very small connected components. The algorithm then scans the
entire image and retrieves all the contours or connected components. Each component is
then analyzed to determine if it resembles an eye blob. If the number of potential eye
blobs is two they are further processed to ensure that they are of right size and shape. If
exactly two blobs do not qualify, the variance map is rejected and new variance map is
generated. Depending on the head movement or background motion additional blobs may
be generated and therefore the qualifying criterion is very rigid to eliminate these
unwanted blobs. . Figure 2.3 shows the variance map after the removal of small
connected components.
Figure 2.3 Removal of pixel noise and small connected components
The statistical operations performed on the blobs to determine their eligibility as eye
blobs include their area, centroid, minimum and maximum extent of the blob and
compactness. The horizontal and vertical distance between two blobs must also be within
a certain range. In this project a minimum horizontal spacing of 25 pixels and a
maximum vertical spacing of 20 is permissible. Blobs are also rejected if they contain
more than 200 or less than 50 pixels.
After two blobs have been selected the horizontal and vertical distance between their
centers is calculated to determine the approximate distance of the face from the camera as
well as the orientation of the head from the line�s slope.
The complete flow diagram for blink-detection based feature point initialization
algorithm is given in fig 2.4
Figure 2.4 Flow chart of the Blink detection algorithm
3 Feature Point Tracking
Feature point tracker has been implemented using Lucas& Kanade optical flow based
algorithm. The tracker takes as input two images and the coordinates of the feature points
to track in the first image. It then returns the coordinates of the same feature points in the
second image. The tracker takes a new image in each iteration and returns the location of
desired feature points in that image using the information from the prior image.
3.1 Optical Flow and Lucas Kanade Algorithm
The Lucas & Kanade algorithm is an optical flow based technique which relies on the
"constant brightness" assumption i.e. the brightness of an object will remain the same
between successive frames. This assumption is usually valid because images are usually
shot in rapid succession. Suppose we wish to find the motion of a group of pixels
between two images. Ideally we would like to find the motion parameters of the pixels
such that if we apply the motion to the pixels the brightness, or image intensity, will be
the same.
3.1.1 Optical Flow
Let us define the brightness of an image as I(x, y, t) at coordinates x = (x, y) at time t.
Now let us consider a point P which undergoes a small displacement in time dt, such that
the new position of P is given as (x + dx, y + dy). By using the assumption of optical
flow the brightness of point P at new location (x + dx, y + dy) remains unchanged.
I(x, y, t) = I(x + dx, y + dy, t+dt)
Hence
0=dtdI
Using chain rule for differentiation
0=∂∂+
∂∂+
∂∂=
dtdt
tI
dtdy
yI
dtdx
xI
dtdI
0=∂∂+
∂∂+
∂∂=
tI
dtdy
yI
dtdx
xI
Let u =dtdx and v =
dtdy
where u = (u, v) is a velocity vector, giving the projection velocity of the point P on the
image plane. By substituting the values of u and v the optical flow constraint equation is
obtained.
Ixu + Iyv +It = 0
Here u and v are the components of the optical flow field in x and y direction
respectively. However, this equation alone does not suffice for the computation of the
flow components and additional constraints are required. Several approaches have been
proposed to obtain additional constraints for the solution of the optical flow constraint
equation. A modified version of Lucas & Kanade approach which has been used in this
project is explained below.
3.1.2 Lucas-Kanade Feature Tracking Using Pyramidal Image Representation
In order to obtain the location of the feature points in an image sequence to sub-pixel
accuracy a modified version of Lucas and Kanade approach, with a pyramidal image
representation is employed as given in [Bouguet 1999]. This has the advantage of giving
accurate results even for a relatively large movement of the features in an image sequence
and elaborated below.
Consider a feature point P at a location x = (x, y) to be tracked by Lucas-Kanade
approach. The intensity function I(x, y) gives the brightness of P. Assume that P
undergoes a small motion, such that its new coordinates are (x + dx, y + dy) in the next
image. Due to slight variation in the intensity of the feature point P (may be due to noise
or change in lighting conditions etc.) from the previous image to the next one, a residual
function e, which minimises the difference between the intensities I(x, y) and I(x + dx, y
+ dy), is used to compute the new position x + dx = (x + dx, y + dy) of the feature point
P in the new image.
ε = (I(x,y)-I(x + dx, y + dy))2
Because of the aperture problem (if a window is used such that only one point is visible,
the motion of that point cannot be determined, but only the sense of motion is observed
and this is called the aperture problem), a larger window has to be employed and hence
the notion of 2D neighbourhood is defined. This larger window is the neighbourhood
region R(x) of the point x. This becomes the problem of minimization of the residual
function e for the entire region R. This is expressed as:
∫∈
+−=)(
2))()((xRx
dxxIxIε
In discrete terms, when the width of the integration window is (2wx + 1) and height of
the integration window is (2wy + 1), we have:
∑ ∑+
−=′
+
−=′+′+′−′′=
x
x
y
y
wx
wxx
wy
wyydyydxxIyxI 2)),(),((ε
For the pyramid representation, consider the two images I and J, such that the temporal
distance between them is dt. The objective is to find feature point P, which is at a
position (x, y) in I and moves to a new position (x + dx, y + dy) in J. The pyramidal
representation of I at zeroth level is I0 with size nx× nx, which is the original image itself.
The next levels of I, such as I1, I2, I3, I4, I5,�� are built in a recursive manner: I1 is
computed from I0, I2 from I1 and so on, where the size of the image at level n is one-
fourth of the image at pyramid level n � 1. A similar pyramid representation is built for J.
For tracking of P, its coordinates are evaluated for the pyramid at the highest-level n and
an initial guess g n = (g n x, g n y) is assumed to minimise the residual function en at level
n and the remaining dn = (dn x, dn y) is computed, which gives the displacement of the
feature point at level n. Usually g n x and g n y are initialised with zero. It is done as:
∑ ∑+
−=′
+
−=′++′++′−′′==
x
x
y
y
wx
wxx
wy
wyy
ny
ny
nx
nxnn
ny
nx
nnn dydgydgxJyxIddd 2)),(),((),()( εε
The computed result is propagated down to level n � 1, to form the new initial guess,
which is given as:
gn-1 = 2(gn + dn)
This process is continued, until level zero is reached, and d = (dx, dy) is calculated as
d = g0 + d0
The solution can be represented in a single equation as:
∑=
=n
i
ii dd0
2
The advantage this technique offers is the fact that d i can be kept small, whereas overall
displacement d of the feature at point P may be large [Bouguet 1999]. This project
requires accurate tracking of the eye-feature points to sub-pixel accuracy located by blink
detection module. Therefore, this approach is used, as it is robust to relatively larger
motion of the head in context of tracking eye-feature points.
3.1.3 Selection of good feature points
The choice of feature points can affect the performance of the Lucas & Kanade feature
tracker. A number of points around the eye blob can be selected for tracking. Since the
tracker examines the neighborhood of the desired feature points, a good neighborhood is
important. It has been determined experimentally that inner corners of the eyes provide
good and robust tracking. However sideways movement of the head results in slight error
in the location of the feature points. In order to counter this problem another feature point
between the eyes and above the nose (i.e. between the two eyebrows) is also tracked. It
can be obtained by calculating the middle of the line which connects the center of the two
eye blobs. This point is very robust and can tolerate significant sideways movement.
Since the objective is to track the eyes, distance of this point from the inner corner of the
eyes is constantly examined to detect and correct any errors due to slight shift in the
feature points representing inner corner of the eyes.
Figure 3.1 shows images taken from an image stream during the tracking of the feature
points. The blue rectangles indicate the approximate position of the eye during the motion
of head in various directions.
Figure 3.1 Eye tracking using optical flow
4 Blink Detection
As explained in section 2 Blink detection has been used to initialize the feature points i.e.
eyes. The same algorithm is used for blink detection also however now the region of
interest has been narrowed down to two rectangular boxes which contain the two eyes.
The coordinates of the two eye boxes are passed to the blink detection algorithm which
builds the variance map and detects a blink if significant number of pixels register a
variation in the thresholded variance map.
4.1 Voluntary and Involuntary Blinks
Since blinking is a natural phenomenon we must distinguish between voluntary and
involuntary blinks if we want to use blinks for interaction with the computer. For
example if we want to simulate a mouse click using a blink then we must ensure that
involuntary blinks are not interpreted as mouse clicks. The logic used for making this
distinction is explained below.
A blink sequence can be divided into three types of images, one with open eyes, then
closed eyes followed again by open eyes as shown in fig 4.1. When the eyes close a
variance map is generated. As long as they remain closed no significant intensity variance
is registered in the eye boxes, however when they open again it results in the generation
of another variance map. Therefore a blink sequence generates two variance maps spaced
apart in time equal to the duration of the blink. This time t is measured and if it is less
than a certain threshold the blink is classified as involuntary blink. For a blink to be
interpreted as a voluntary blink in this project it must last for at least 1 second.
↓ ↓
Variance Map ←..�t�...→ Variance Map
Figure 4.1 Involunatary and voluntary blink detection
5 Eyeball Detection
Eyeball detection algorithm receives the coordinates of two eye boxes and finds the
location of eyeballs within these boxes. Several techniques were considered for the
purpose of eyeball detection. Geometrical pattern search or ellipse fitting techniques
provide very good results however they are computationally intensive and were rejected
in favor of circular hough transform. Circular hough transform itself is computationally
intensive however it can be improved considerably by pre processing the eye image. This
pre processing reduces the number of pixels which can vote in the hough transform
making it much faster.
5.1 Pre processing of eye image
The purpose of pre processing is to minimize the number of on-pixels in the binary image
on which hough transform is computed. An ideal image for hough transform will only
contain a circle representing the outline of the eyeball, however it is difficult to obtain
and attempt is made to remove unwanted pixels without removing the eyball pixels. The
pre processing steps are
5.1.1 Color Filtering
Three channel RGB image is converted to a single channel image for edge detection. It
has been found experimentally that green channel image provides best results for edge
detection even though difference with other channels is not huge.
5.1.2 Histogram equalization
The single channel image is equalized to compensate for different lighting conditions.
Since eye image contains two regions of comparable areas and distinct intensities, one
representing the eyeball and the other white area around it, equalization tends to highlight
the edges of the eyeballs.
5.1.3 Edge Detection
The green channel image is then subjected to edge detection to get an outline of the
eyeball. Since the eyeball pixels have different intensity than the white pixels around it an
edge is readily obtained however it is occluded. A 3 by 3 vertical prewitt operator is used
for this purpose. Horizontal prewitt operator was also tried but since the outline of the
eyes form horizontal edges, extra edges appear in the edge image. Vertical operator
returns the vertical edges which are sufficient for hough transform as all the pixels are not
required to obtain the best fit circle.
5.1.4 Morphological operations
Some morphological operations i.e. thinning and removal of small connected components
were also considered however not much improvement in the accuracy and efficiency
hough transform was obtained.
The steps explained above are shown in figure 5.1
a b c d
5.1(a) Original eye image 5.1(b) Green channel intensity image
5.1(c) Equalized Image 5.1(d) 3×3 Vertical Prewitt operator
Figure 5.1 Pre processing of eye image for Hough transform
5.2 Hough Transform
Hough transform can be used to extract various features from an image. The transform
takes the geometric equation of the feature to be found and then inverts it so that x and y
become constants in the equation thus resulting in an inverse function space. The
intersections in this space reveals the constants associated with the equation of the
features discovered in the original image.
Consider the example of a line. The equation of a line in slope intercept form is given as
y = a.x + b
And its hough transform can be written as
b = -x.a + y
From the above equation a second image, which is an array of accumulator cells, is
obtained. In this image each point in the original image becomes a line. The lines are
added to the transform one by one and every time there is an intersection the value of that
pixel is incremented by 1. then by looking at the values of the cells in the transform, we
can determine through which points the most lines cross. The �a� and �b� values of these
points in the transform give us our required constants i.e. slope and intercept.
Figure 5.2.1 Hough transform in point slope form of a line
Similarly instead of slope intercept model, normal model can also be used with the
following standard representation.
x.cosθ + y.sinθ = ρ
By stepping through the values of θ from �π to π a number of sinusoidal curves can be
derived in the ρθ plane. The accumulator cells with the highest values will indicate the ρ
and θ of the detected line. If multiple lines are to be found a threshold can be established
in the hough transform to allow any feature with more than a certain number of points
comprising its shape to be determined.
5.2.1 Circular Hough Transform
Hough transform can also be applied in the extraction of other geometrical objects such s
circle, ellipse etc. The procedure is very similar to the hough transform of a line however
the transform space will become a multidimensional space depending on the number of
constants in the representation of the object. Consider the equation of a circle which is the
shape of interest to us.
( ) ( ) 222 rbyax =−+−
The above equation can also be written in the following form
a = x + r.cosθ
b = y + r.sinθ
θ = 0.. 2 π
Since there are three variables in the transform the resulting transform space will be three
dimensional containing cones. The figure below shoes a 2D and a 3D view of the circular
hough transform
Figure 5.2.2 Circular Hough transform in 2D and 3D
5.2.2 Pseudocode for Circular Hough transform
In order to find the eyeball we use circular Hough transform effectively. The procedure
consists of taking each pixel of the image and finding the values of a, b according to the
above equations. However we must vary two other parameters i.e. radius r and θ. The
range for r depends on the size of the circle we want to extract and θ must be varied from
0 to 2π in small steps. Once we find the transform T(a,b) we can update the three
dimensional Hough transform array H. Once the transform has been computed pixel or
pixels corresponding to maximum count can be find out to get the centre and radius of the
circle. The algorithm is illustrated below. Figure 5.2.3 shows the extraction of eye ball
from eye image using Hough transform. The image has been equalized and vertical
prewitt operator has been used to get the approximate outline of the eyeball as explained
in section 5.1. Hough transform is then performed on this edge image to obtain the centre
of the eyeball as shown in the figure.
Circular Hough Transform Pseudo code
for i = 1.. X
for j=1..Y
if IMG(X,Y) > 0
for r = 1 .. Rmax
T=zeros(X,Y)
for θ = 0 .. 2π
a = I + r.cos θ % Find a
b = j + r.sin θ % Find b
if (a ≥ 1) & (a ≤ X) & (b ≥ 1) & (b ≤ Y)
T(a,b) = 1
end if
end for
H(: , : , r) = H(; , ; , r) + T(; , ; , 1);
end for
end if
end for
end for
% Loop through transform to find center.
for i = 1 .. X
for j = 1 .. Y
for r = 1 .. Rmax
if H(i, j, r) ≥ Threshold
C = [C: i, j, r, H(i, j, r)] % Increment the intersect pixel
end if
end for
end for
end for
5.2.3 (a)
5.2.3 (b)
5.2.3 (c)
Figure 5.2.3 Circular Hough transform to locate eyeball
5.2.3 Trade offs in Hough transform
Circular Hough transform is computation intensive as explained above. The amount of
computations depend on the number of pixels which are on or are allowed to vote in the
transform. By reducing the number of these pixels hough transform efficiency can be
vastly improved. On the other hand minimization of these pixels also results in the
removal of certain pixels which are part of the eyeball edge thus reducing the chances of
accurate detection. The size of the eye image therefore is also very important. In this
project low resolution camera has been used and the eye image has very few pixels. A
camera with increased resolution might result in better edge detection and circle fitting,
but the computation time for hough transform will increase considerably.
Another important factor is the radius of the circle that is being searched. If the exact
radius is known we only have to search for one circle however in the absence of the
knowledge of circle�s radius beforehand, hough transform has to go through several
iterations to find the best circle. In this project distance between the camera and the
subject�s face is not fixed, therefore radius of the eyeball may change with the head
movement towards or away from the camera. One solution is to make hough transform
search through a range of radii however this will increase the computations and therefore
the time taken by the hough transform which is unacceptable. In order to keep the search
space small the desired circle radius has been made a function of the eye box itself. This
assumption is safe because the size of the eye box will also increase or decrease as the
face moves towards or away from the camera respectively. The eye ball radius is
assumed to be one fourth of the height of the eye box and the hough transform search for
the circles with radii within ±3 pixels of this radius.
6 Eyeball tracking
Hough transform can be used to locate eye ball in each eye image thus determining the
location of the eye ball. This technique however is computationally intensive and slows
down the throughput of the algorithm. The advantage of computing hough transform on
the other hand is that we do not have to worry about the loss of tracking. If the hough
transform fails to find a circle in one image, this will not affect the transform computation
in the subsequent images.
Another technique is to use Lucas & Kanade feature tracker explained in section 3.1 to
track the eyeball once its center and radius has been obtained through circular hough
transform. This vastly improves the speed of the algorithm however the tracking will be
lost upon the loss of a feature point representing the eyeball. This feature point could be
lost due to a blink or a rapid movement of the eyeball or eye itself. This problem can be
addressed by constantly monitoring the tracked feature points to determine if the tracking
has been lost, in which case hough transform can be used to re initialize the eyeball
location. The best feature points for eyeball tracking are the center of the eyeball and its
two outer edges as shown in figure 6.1. The relative distance among these three points
should remain constant as the eyeball moves within the eye. A significant change in the
above distance will indicate a loss of a feature point.
Both techniques have been implemented in this project and have their own advantages
and disadvantages as explained above. User has the option of selecting the tracking
criteria according to the application requirement.
Figure 6.1 Good feature points to track
7 Gaze Estimation
Once the centre of the eye (iris) has been located we can estimate the gaze of the human
subject by determining the relative position of the iris within the eyebox. Figure 7.1
shows a gaze vector diagram where the eyebox is shown along with the eyeball in
Cartesian coordinates. B is the bottom left corner of the eyebox and represents the origin
of the diagram . Since we already know the coordinates of points A, C and D, we can
easily determine vector A. We regard vector A as the �gaze vector�. The length and the
angle of the vector with respect to line BD indicates the direction in which the subject is
looking.
For this project we have assumed the movement of eyeball on a two dimensional surface
however due to the curvature of the eye the length of the �gaze vector� is not directly
proportional to the distance of the iris from the bottom left corner of the eye. A
geometrical model which takes into account the curvature of the eye can be used for more
accurate implementations.
Figure 7.1 Gaze Estimation
8 Final Results
Figure 8.1 and 8.2 show various images from two tracking videos. Figure 8.1
demonstrate the tracking of the eyes along with between the eyes feature point. The
second circle demonstrates the tracking of the right eye ball during the movement of head
in various directions.
Figure 8.2 shows approximate end points of the eyes in addition to the eyeball. The
images indicate successful tracking of the eyes as well as the eyeball for normal motion
of the head and eyes. It has been found experimentally that as long as the two eyes are in
view of the camera satisfactory tracking results are obtained, however turning of head
sideways more than 45 degrees result in the loss of tracking for one eye.
The error in eyeball tracking is within 2 pixels i.e. the centre of the circle obtained
through hough transform is off by the actual center of the eyeball by one to two pixels
which is acceptable for this project.
The accuracy of blink detections after feature point initialization is over 90% however,
blink detection used in initialization is less accurate. This results from the fact that search
area consists of entire image and blobs must qualify strict criteria to be considered as eye
blobs. Since this detection is performed once it does not affect the accuracy and
performance of the subsequent steps in the algorithm.
Figure 8.1 Eye tracking
Figure 8.2 Eye tracking
9 Conclusions and Recommendations
The results for blink detection are satisfactory for this project. Initial blink detection can
be improved by using skin segmentation algorithm to locate the face thus reducing the
search space for blink detection algorithm.
The eye tracker can tolerate up to 45 degrees head turn in either direction. This can be
further improved by using �between the eyes� feature point. This point can tolerate up to
90 degrees rotation of the head and can be used to reinitialize the lost feature points by
using the distance measurements from the other �tracked� eye. The other alternative is to
reinitialize all the feature points using blink detection.
Hough transform gives satisfactory results for this project where a very low resolution
web cam has been used. The images shown above have a resolution of 320×240. If more
accuracy is required i.e. very subtle eyeball movements need to be detected a high
resolution web camera can be used. This will however increase the number of pixels
processed by the hough transform thus reducing the overall speed. Additional hardware
to increase the computing power may be used to offset the additional calculations.
Overall the eye tracker and blink detector implemented in this project meet our initial
goals and expectations. They can be used to built various intelligent applications some of
which are suggested below
• Tracker and blink detector can be interfaced with a PC mouse so that cursor can
be moved on the screen using eye movements. Voluntary blink detections can be
used to simulate mouse clicks thus enabling a hands free browsing. This
application can provide severely disabled individuals who are unable to operate
ordinary mouse, use personal computers.
• Various human monitoring applications measure the blink rates of human to
determine several biological parameters. This software can be used to monitor and
detect the blink rate of a person automatically.
• This software can form the basis of applications used in advanced interactive
robots that have the ability to converse with humans. These robots must have the
ability to locate a human face and determine where they are looking to determine
their response.