[IEEE 2013 Second International Conference on Robot, Vision and Signal Processing (RVSP) -...

4
Webcam-based Accurate Eye-central Localization Hossain Mahbub Elahi 1 , Didar Islam 2 , Imtiaz Ahmed 3 , Syoji Kobashi 4 , and Md. Atiqur Rahman Ahad 5 1,2,3,5 Dept. of Applied Physics, Electronics and Communication Engineering University of Dhaka, Dhaka, Bangladesh. 4 Graduate School of Engineering, University of Hyogo, Japan. Email: 1 [email protected], 2 [email protected], 3 [email protected], 4 [email protected], 5 [email protected] AbstractThis paper contains experimental procedure of webcam-based eye-tracker specially for low power devices. This paper consists of five processes. First one is background suppression for reducing average processing requirement. Second one is Haar-cascade feature-based face detection algorithm. Third one is geometrically eye-position determination. The fourth part is to detect and track eye-ball center using mean of gradient vector. The fifth and last step is to detect where the user looking. We simply calculate percentage of movement of eye to detect either it looking left, right, up or down. This procedure is highly effective with satisfactory accuracy. It also requires less processing power. Keywordseye-tracking; face detection; eye-center localization; mean gradient; Haar-cascade I. INTRODUCTION This paper brings together common eye-detection and tracking algorithm to use them in low-power devices. In computer vision, based on identification or understanding of eye movement, hand gestures, gait analysis, action recognition, we can develop various systems for different applications [1,2]. It can help a mobile robot [3] to understand and help someone, or an intelligent system to take an action based on the eye- tracking. Today, the technology of gaze tracking has made amazing progress, and it can have a revolutionary impact on several aspects of human life. Unlike head mounted or other expensive eye tracker/ gaze tracker devices, we use simple low cost webcam as the sensing device. So it can be used in commercial purpose to help disables to operate smart devices. It also may be used for improvement of human computer interaction. Gaze tracker is the next thing of Human Computer Interaction (HCI). It also helps to improve vision in robotics. Most of the Gaze trackers are designed with head-mounted camera. Other common methods of gaze trackers use hardwired lances, infrared sensor with infrared light source, etc. The Basic feature of camera or webcam-based gaze tracker is that, it does not require any extra or expensive hardware to support this. Now-a-days, most of the devices contain camera as one of the components. In this paper, we present a procedure of webcam-based eye- tracker. The mean of gradient vector process requires dot product of image gradient vector. We use a mathematical function that reaches peak value at the center of eye. After tracking the center of eye we calculate the percentage of movement to detect it either right of left, up or down. For determination of the perfect position of eye, we also need to derive a mathematical model. The paper is organized as follows: Section II presents the methodology. In Section III, we present the experimental details. Results are analyzed in Section IV. Finally, we conclude the paper in Section V. II. METHODOLOGY The first step of the total process is the background suppression. When the program starts at first it searches for faces in front of the camera. If there is none then it starts monitoring changes of the image. Here we use derivative controller. If the change is significant then it checks for face in the image again. The second step is the face detection. For face detection, we use Viola-Jones face detection algorithm [4]. Though it is a faster and more accurate face detection algorithm, it has a severe drawback. Its training process requires complex dataset and huge time. Accuracy of Viola- Jones face detection algorithm is highly dependent on its training dataset [4]. Compromising that it requires less processing power. Therefore, we employ this face detection algorithm, which has been very widely used. The third step is to detect eyes in a face. In this step, we do not exploit any feature-based eye detection algorithm. We just use the fact that, if the face is not out of ratio, eyes are placed in a certain position of face. If we select face region without hair, then the left eye can be found in a rectangle region with left upper-most position, (x,y) (0.13*face_width, 0.25*face_height), width=0.35*face_width, height=0.3*face_height. Those values are selected after several number of testing. For those optimal values, several critical conditions arises (details are presented in experimental result section). Next step is eye-center detection. There are several proposals for finding eye-ball center; most of them are considering eye-ball is circular. In fact, visualized eye-ball is not fully circular, but semi-circular. Therefore, we use an approach, considering mean of gradient vector in eye region [5]. Geometrically, this process can provide us the center position of any circular object [6]. In this process, we may use the orientation of each gradient vector to draw a line through the whole image and then increase an accumulator bin each time one such line passes through it. The value of bin would be the maximum at the center position of eye. To reduce the problem arises due to other contributing factor, e.g., floating hair, reflection on glass, eyebrow, etc., we use some prerequisite processes. Let c be a possible center and g i be the gradient vector at position x i . Then, the normalized displacement vector d i should have the same orientation (except for the sign) as the gradient g i (see Fig. 1). If we use the vector field of (image) gradients, we can exploit this vector field by computing the dot products between the normalized displacement vectors (related to a fixed center) and the gradient vectors g i . The optimal center c* of a circular object in an image with pixel positions x i , i є {1, 2… N}, is then given by, 2013 Second International Conference on Robot, Vision and Signal Processing 978-1-4799-3184-2/13 $31.00 © 2013 IEEE DOI 10.1109/RVSP.2013.19 47 2013 Second International Conference on Robot, Vision and Signal Processing 978-1-4799-3184-2/13 $31.00 © 2013 IEEE DOI 10.1109/RVSP.2013.19 47

Transcript of [IEEE 2013 Second International Conference on Robot, Vision and Signal Processing (RVSP) -...

Page 1: [IEEE 2013 Second International Conference on Robot, Vision and Signal Processing (RVSP) - Kitakyushu, Japan (2013.12.10-2013.12.12)] 2013 Second International Conference on Robot,

Webcam-based Accurate Eye-central Localization Hossain Mahbub Elahi1, Didar Islam2, Imtiaz Ahmed3, Syoji Kobashi4, and Md. Atiqur Rahman Ahad5

1,2,3,5Dept. of Applied Physics, Electronics and Communication Engineering University of Dhaka, Dhaka, Bangladesh.

4Graduate School of Engineering, University of Hyogo, Japan. Email: [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract— This paper contains experimental procedure of webcam-based eye-tracker specially for low power devices. This paper consists of five processes. First one is background suppression for reducing average processing requirement. Second one is Haar-cascade feature-based face detection algorithm. Third one is geometrically eye-position determination. The fourth part is to detect and track eye-ball center using mean of gradient vector. The fifth and last step is to detect where the user looking. We simply calculate percentage of movement of eye to detect either it looking left, right, up or down. This procedure is highly effective with satisfactory accuracy. It also requires less processing power.

Keywords—eye-tracking; face detection; eye-center localization; mean gradient; Haar-cascade

I. INTRODUCTION This paper brings together common eye-detection and

tracking algorithm to use them in low-power devices. In computer vision, based on identification or understanding of eye movement, hand gestures, gait analysis, action recognition, we can develop various systems for different applications [1,2]. It can help a mobile robot [3] to understand and help someone, or an intelligent system to take an action based on the eye-tracking. Today, the technology of gaze tracking has made amazing progress, and it can have a revolutionary impact on several aspects of human life. Unlike head mounted or other expensive eye tracker/ gaze tracker devices, we use simple low cost webcam as the sensing device. So it can be used in commercial purpose to help disables to operate smart devices. It also may be used for improvement of human computer interaction.

Gaze tracker is the next thing of Human Computer Interaction (HCI). It also helps to improve vision in robotics. Most of the Gaze trackers are designed with head-mounted camera. Other common methods of gaze trackers use hardwired lances, infrared sensor with infrared light source, etc. The Basic feature of camera or webcam-based gaze tracker is that, it does not require any extra or expensive hardware to support this. Now-a-days, most of the devices contain camera as one of the components.

In this paper, we present a procedure of webcam-based eye-tracker. The mean of gradient vector process requires dot product of image gradient vector. We use a mathematical function that reaches peak value at the center of eye. After tracking the center of eye we calculate the percentage of movement to detect it either right of left, up or down. For determination of the perfect position of eye, we also need to derive a mathematical model.

The paper is organized as follows: Section II presents the methodology. In Section III, we present the experimental details. Results are analyzed in Section IV. Finally, we conclude the paper in Section V.

II. METHODOLOGY The first step of the total process is the background

suppression. When the program starts at first it searches for faces in front of the camera. If there is none then it starts monitoring changes of the image. Here we use derivative controller. If the change is significant then it checks for face in the image again. The second step is the face detection. For face detection, we use Viola-Jones face detection algorithm [4]. Though it is a faster and more accurate face detection algorithm, it has a severe drawback. Its training process requires complex dataset and huge time. Accuracy of Viola-Jones face detection algorithm is highly dependent on its training dataset [4]. Compromising that it requires less processing power. Therefore, we employ this face detection algorithm, which has been very widely used.

The third step is to detect eyes in a face. In this step, we do not exploit any feature-based eye detection algorithm. We just use the fact that, if the face is not out of ratio, eyes are placed in a certain position of face. If we select face region without hair, then the left eye can be found in a rectangle region with left upper-most position,

(x,y) ≡ (0.13*face_width, 0.25*face_height), width=0.35*face_width, height=0.3*face_height. Those values are selected after several number of testing.

For those optimal values, several critical conditions arises (details are presented in experimental result section).

Next step is eye-center detection. There are several proposals for finding eye-ball center; most of them are considering eye-ball is circular. In fact, visualized eye-ball is not fully circular, but semi-circular. Therefore, we use an approach, considering mean of gradient vector in eye region [5]. Geometrically, this process can provide us the center position of any circular object [6]. In this process, we may use the orientation of each gradient vector to draw a line through the whole image and then increase an accumulator bin each time one such line passes through it. The value of bin would be the maximum at the center position of eye. To reduce the problem arises due to other contributing factor, e.g., floating hair, reflection on glass, eyebrow, etc., we use some prerequisite processes.

Let c be a possible center and gi be the gradient vector at position xi. Then, the normalized displacement vector di should have the same orientation (except for the sign) as the gradient gi (see Fig. 1). If we use the vector field of (image) gradients, we can exploit this vector field by computing the dot products between the normalized displacement vectors (related to a fixed center) and the gradient vectors gi. The optimal center c* of a circular object in an image with pixel positions xi, i є {1, 2… N}, is then given by,

2013 Second International Conference on Robot, Vision and Signal Processing

978-1-4799-3184-2/13 $31.00 © 2013 IEEEDOI 10.1109/RVSP.2013.19

47

2013 Second International Conference on Robot, Vision and Signal Processing

978-1-4799-3184-2/13 $31.00 © 2013 IEEE

DOI 10.1109/RVSP.2013.19

47

Page 2: [IEEE 2013 Second International Conference on Robot, Vision and Signal Processing (RVSP) - Kitakyushu, Japan (2013.12.10-2013.12.12)] 2013 Second International Conference on Robot,

Figure 1 Getting center position of eye.

The displacement vectors di are scaled to unit length in order to obtain an equal weight for all pixel positions. In order to improve robustness to linear changes in lighting and contrast the gradient vectors should also be scaled to unit length [5].

Computational complexity can be decreased by considering only gradient vectors with a significant magnitude. In order to obtain the image gradients, we compute the partial derivative,

′ ′ ′ ′

But other methods for computing image gradients will not change the behavior of the objective function significantly [5]. Under some conditions, the maximum is not well-defined, or there are local maxima that lead to wrong center estimates. Therefore, we use incorporate prior knowledge about the eye in order to increase robustness.

Since the pupil is usually dark compared to sclera and skin, we apply a weight wc for each possible center c such that dark centers are more likely than bright centers. Integrating this into the objective function leads to:

where, wc is the grey intensity value of any position (cx, cy). Input image is smoothened and inverted, then this procedure is applied.

The proposed summation of weighted squared dot products yields accurate results if the image contains the eye. However, when applying the multi-stage scheme, the rough eye regions sometimes also contain other structures such as hair, eyebrows, or glasses. Especially, hair and strong reflections in glasses show significant image gradients that do not have the same orientation as the image gradients of the pupil and the iris, hence the estimation of the eye centers might be wrong. We therefore used a post-processing step in order to overcome these problems. We apply a threshold on the objective function, based on the maximum value, and remove all remaining values that are connected to one of the image borders. Then, we determine the maximum of the remaining values and use its position as center estimate. Based on our experiments, the value of this threshold does not have a significant influence on the center estimates. Hence, we suggest to set this threshold to 90% of the overall maximum. To reduce the memory requirement and other processing requirement, this process requires low resolution image. This process cannot be used in high resolution image.

The last step is to estimate where someone is looking. Initially we only determine either the position is left or right,

Figure 2 Evaluation of equation (1) for an exemplary pupil with the detected center marked in white (left). The objective function achieves a strong maximum at the center of the pupil; 2-dimensional plot (center) and 3-dimensional plot (right).

up or down. To do that we first find the corner position of eye by calculating image gradient. The next thing is to calculate eye-ball center position respect to the eye corner. If eye-ball position is in same distance from left and right corner of eye then the eye-ball position is center. Otherwise, it would be left or right. Calculating the ratio of distance between eye-ball center and any corner of the eye to the distance between two corners of eye give us more accurate estimation of eye-ball position.

III. EXPERIMENTAL RESULT Step 1: Read a frame from image sensor (webcam or digital

camera). Convert it into greyscale. Search for face in the frame using Haar cascade algorithm. Here we use window size 60x60 to ignore smaller faces. If no face is detected then we use this frame as primary image. Read next frame. Subtract it from the primary image. If change is significant then search for face in the image again, else use this frame as primary image.

Step 2: If there is face in the image, crop the face region from the frame, else use the frame as primary image.

Step 3: After cropping face region, we detect eye region in the face. Using histogram equalization process, increase image quality. Then pass it through Gaussian blur to reduce noise. Impulse noises can be removed by [7]. This improves the performance of eye-ball center detection.

Step 4: After detecting the eye region, we use mean of gradient vector algorithm to detect eye-ball center position. Before applying this algorithm, we resize the eye image to a specific and smaller-sized image, so that we may get maximum value position. In this case, we make eye region width 50pixel and vary the height to keep the original ratio of the eye image. After this eye-ball center detection from the next frame, we only track the movement of eye-ball using sum of square difference method of feature tracking [8]. This provides the stability to the system.

Step 5: Now we will detect corners of both eyes. For that task, we convert the eye image into a threshold binary image. We calculate the relative position of eye-ball center with respect to the eye corner. Then we estimate whether a person is looking towards left, right or center.

Up to face recognition and eye position detection, the developed system performs very satisfactorily. Hardly, we have seen any error in face detection. However, after face detection, depending on the camera quality and lighting condition, sometimes the eyeball position cannot be tracked perfectly. In poor lighting condition, not only eyeball but the whole eye region goes dark. Poor color depth of image is another big factor. During experiment we find some unexplainable fact that, though we used same algorithm for both eye, left eye shows more error in detection then right eye.

In case of multiple faces, it detects the biggest face in the image and processes the rest. It can also detect partial or occluded faces. We used a low-cost webcam in VGA quality

4848

Page 3: [IEEE 2013 Second International Conference on Robot, Vision and Signal Processing (RVSP) - Kitakyushu, Japan (2013.12.10-2013.12.12)] 2013 Second International Conference on Robot,

Figure 3 cropped face with eye-ball center detection.

mode, and still, we achieved satisfactory result in real-life environment.

IV. DISCUSSION As accuracy measure for the estimated eye centers, we

evaluate the normalized error, which indicates the error obtained by the worst of both eye estimations. This measure is introduced by Jesorsky et al. [8] and is defined as:

where, el, er are the Euclidean distances between the

estimated and the correct left and right eye centers, and d is the distance between the correct eye centers. When analyzing the performance of an approach for eye localization, this measure has the following characteristics:

� e ≤ 0.25 ≈ distance between the eye center and the eye corners,

� e ≤ 0.10 ≈ diameter of the iris, and � e ≤ 0.05 ≈ diameter of the pupil. Thus this approach may give a good result with e ≤ 0.05,

but when e ≤ 0.25 this process will fail. The statistics (Table 1) is produced based on different test

subjects with different number of samples. But number of samples taken under each condition are the same.

From the experimental results, we attain that the most vital condition is the lighting condition. If light comes from the back, the entire process fails. It works best when light comes from the front. At different lighting conditions, e.g., day light and indoor light the performance becomes fairly constant. Performance only decays when indoor light is not enough for the camera. If camera sensitivity is good, then there is hardly any change either the lighting is indoor or outdoor.

For different genders, error in eye-ball center detection varies due to floating hair and wider eyebrow of females. It causes shifting of maximum value position.

For different aged group, this method gives fairly constant error. In case of kids, errors are relatively higher. It is due to the fact that, kids have relatively smaller visible eye region and they have unsteady movements. For light and dark skin color,

TABLE I. STATISTICS OF EXPERIMENTAL DATA OUTPUT

Condition Total Sample

Total Error

Percentage of error

Light From the Front 100 3 3% Light from the Side 100 21 21% Light from the Back 100 63 63% Gender- Male 100 3 3% Gender- Female 100 8 8% Age 14- 100 10 10% Age 15~18 100 3 4% Age 19~35 100 5 5% Age 35+ 100 6 6% With dark skin color 100 2 2% With bright skin color

100 3 3%

At day Light 100 3 3% At indoor light 100 5 5%

Figure 4 Output of eye-ball center detection under different conditions.

Figure 5 Output Cropped eye image under different condition.

error hardly fluctuates.

V. CONCLUSION In this paper, we apply a newly proposed algorithm for eye-

ball center detection. This process is relatively less complex in computational aspect and also invariant to rotation. We find this process rather very accurate and it can be used for different applications. Though, we use this process in complete gaze tracking system, it requires complex mathematical modeling for estimating gaze point. We are planning to do this in future. This process can be applicable for the improvement of Human-Computer Interaction (HCI). For example, for disable persons, it would be imperative to operate computers and various electrical devices. It may be useful for creating psychological profile and more interactive kids learning methods [9]. Image registration is a challenge that might be necessary in some applications [10]. With improvement of this process, we may remove other pointing devices and replace with gaze pointing device.

REFERENCES

[1] Md. Atiqur Rahman Ahad, Computer Vision and Action Recognition, Atlantis Ambient and Pervasive Intelligence (available in Springer), 2011.

[2] Md. Atiqur Rahman Ahad, Motion History Images for Action Recognition and Understanding, Springer, 2012.

[3] S. Keshri, S. Omkar, A. Singh, V. Jeengar, and M. Yadav, “A Real-time Scheme of Video Stabilization for Mobile Surveillance Robot,” International J. of Computer Vision and Signal Processing, vol. 2, no. 1, pp. 8-16, 2013.

[4] P. Viola and M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, International J. of Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.

[5] F. Timm and E. Barth, “Accurate Eye Center Localization by Mean of Gradient”, Proc. Intl. Conf. on Computer Theory and Applications, pp. 125-130, 2011.

[6] R. Kothari and J. Mitchell, “Detection of eye locations in unconstrained visual images”, Proc. IEEE Intl. Conf. on Image Processing, pp. 519-522, 1996.

[7] P. Patel, B. Jena, B. Majhi, and C. Tripathy, “Fuzzy Based Adaptive Mean Filtering Technique for Removal of Impulse

4949

Page 4: [IEEE 2013 Second International Conference on Robot, Vision and Signal Processing (RVSP) - Kitakyushu, Japan (2013.12.10-2013.12.12)] 2013 Second International Conference on Robot,

Noise from Images,” International J. of Computer Vision and Signal Processing, vol. 1, no. pp. 15-21, 2012. [8] O. Jesorsky, K. Kirchberg, and R. Frischholz, “Robust face

detection using the Hausdorff distance,” Proc. 3rd Audio and Video Based Person Authentication, Lecture Notes in Computer Science, Springer, pp. 90-95, 2013.

[9] K. Sanjay, D. Chauhan, M. Vitsa, and R. Sing, “A Robust Skin Color Based Face Detection Algorithm,” Tamkang J. of Science and Engineering, vol. 6, no. 4, pp. 227-234, 2007.

[10] Y. Wan, P. Duraisamy, M. Alam, and B. Buckles, “Wireless Capsule Endoscopy Segmentation using Global-Constrained Hidden Markov Model and Image Registration,” International J. of Computer Vision and Signal Processing, vol. 2, no. 1, pp. 17-28, 2013.

[11] http://www.tobii.com/ accessed on 27 Sep 2013. [12] R. Kothari and J.L. Mitchell, “Detection of Eye Location in

Unconstrained Visual Image,” Proc. IEEE Int. Conf. on Image Processing, pp. 519-522, 1996.

[13] S. Singh, M. Sharma, and N. Rao, “Robust & Accurate Face Recognition using Histograms,” Proc. Int. Conf. on Emerging Trends in Computer and Image Processing, 2011.

[14] K. Peng, L. Chen, S. Ruan, and G. Kukharev, “A Robust Algorithm for Eye Detection on Gray Intensity Face without Spectacles,” J. of Computer Science and Technology, vol. 5, no. 3, 2005.

[a]

[b]

[c]

[d]

Figure 6 Experimental errors for: (a) different lighting condition, (b) different age, (c) different gender, (d) different skin color.

3%

21%

63%

L I G H T F R O M T H E F R O N T

L I G H T F R O M T H E S I D E

L I G H T F R O M T H E B A C K

10%

4% 5%

6%

A G E 1 4 - A G E 1 5 - 1 8 A G E 1 9 - 3 5 A G E 3 5 +

3%

8%

G E N D E R - B O Y G E N D E R - G I R L S

2%

3%

W I T H D A R K S K I N C O L O R W I T H B R I G H T S K I N C O L O R

5050