MSc Research Proposal Proforma.doc

53
UNIVERSITY OF ENGINEERING AND TECHNOLOGY TAXILA RESEARCH PROJECT PROPOSAL FROFORMA DEPARTMENT OF CIVIL ENGINEERING PhD Student Particulars No.: ………………... Date: …………………... Name: Muhammad Shahid Regd No: Date of registration: 08F-UET/PhD- CP-13 Status: Full Time Research Supervisor: Title of Research Project Proposal Vehicle Future Position Augmentation over Wind Screen: Head- up-Display and Eye Tracking Framework. Requested Amount 150,000/- Tentative Starting Date January, 2011 Duration 1 Year The PhD Research Committee Meeting held on _____________ at ______________, [1]. Recommends the approval of the research project proposal [2]. Rejects the research project proposal [3]. Recommends the approval with following suggestions …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… 1. Name: Dr. Tabassam Nawaz Signature:

Transcript of MSc Research Proposal Proforma.doc

Page 1: MSc Research Proposal Proforma.doc

UNIVERSITY OF ENGINEERING AND TECHNOLOGY TAXILA

RESEARCH PROJECT PROPOSAL FROFORMA

DEPARTMENT OF CIVIL ENGINEERING

PhD Student Particulars No.: ………………... Date: …………………...

Name: Muhammad Shahid Regd No:

Date of registration: 08F-UET/PhD-CP-13 Status: Full Time

Research Supervisor:

Title of Research Project Proposal

Vehicle Future Position Augmentation over Wind Screen: Head-up-Display and Eye

Tracking Framework.

Requested Amount

150,000/-

Tentative Starting Date

January, 2011

Duration

1 Year

The PhD Research Committee Meeting held on _____________ at ______________,

[1]. Recommends the approval of the research project proposal

[2]. Rejects the research project proposal

[3]. Recommends the approval with following suggestions

……………………………………………………………………………………

……………………………………………………………………………………

……………………………………………………………………………………

1. Name: Dr. Tabassam Nawaz Signature: _______________________

(PhD Research Supervisor)

3. Name: Prof. Dr. M. Iram Baig Signature: _______________________

(PhD Research Committee Member)

4. Name: Dr. Samiuddin Signature: _______________________

(PhD Research Committee Member)

5. Name: Dr. Hafiz Adnan Habib Signature: _______________________

(PhD Research Committee Member)

Page 2: MSc Research Proposal Proforma.doc

FACULTY RESEARCH PROJECT PROPOSAL SUBMITTED TODIRECTOR OF ADVANCED STUDIES, RESEARCH AND

TECHNOLOGICAL DEVELOPMENTUNIVERSITY OF ENGINEERING & TECHNOLOGY TAXILA

(COVER PAGE)

Name of Sponsoring Agency/Department/Organization: Grant No: ______________

Faculty

Telecommunication &

Information Engineering

Department

Computer Engineering

Area of Specialization

Computer Vision & Image

Processing

Title of Research Project Proposal

Vehicle Future Position Augmentation over Wind Screen: Head-up-Display and Eye Tracking

Framework

Requested Amount150,000/-

Tentative Start DateJanuary 2011

Duration 1 Year

Investigator’s Name Designation Highest Degree

Degree Date Institution

Principal Investigator Dr. Tabassam Nawaz

Associate Professor(Chairman SED)

Ph.D August2008

UET Taxila

Co-Investigator

Muhammad Shahid

PhD Scholar MSc March

2007

NUST

Rwp.

Co-Investigator

Co-Investigator

Signature

(Principal Investigator) Co-Investigator Co-Investigator Co-Investigator

Endorsements Chairman of Department Dean of faculty

Name Dr. M. Iram Baig Prof. Dr. Adeel Akram

Signature with Date

Page 3: MSc Research Proposal Proforma.doc
Page 4: MSc Research Proposal Proforma.doc

CONTENTS

Page

1. TITLE AND PROJECT SUMMARY 4

2. PROJECT DESCRIPTIN 5

2.1 Literature Review

5

2.2 Project and Program Objectives

5

2.3 Project Management Design 5

2.4 Organization and Management Plan 6

2.5 Work Schedule Plan

6

2.6 Budget Description

7

2.7 Utilization

9

3. CITED REFERENCES 9

4. CURRICULUM VITAE

5. UNDERTAKING 9

Page 5: MSc Research Proposal Proforma.doc

Project Title

Vehicle Future Position Augmentation over Wind Screen: Head-up-Display and Eye Tracking

Framework.

Principal Investigator

Dr. Tabassam Nawaz

Summary of Proposed Work (Use Additional Sheets if Necessary)

This TR&D proposal aims at the development of novel augmented reality system for vehicle

driver assistance. System provides assistance to the driver up to twenty five meters ahead of

vehicle. Two lines are drawn over wind screen through tiny projector inside vehicle. Each line

extends over wind screen to give visualization as if lines are physically drawn over road

according to the width of either side of vehicle. The knowledge base has pre-stored information

about width of vehicle that can be adjusted according to width of vehicle.

Technical challenge lies in determining the position and shapes of lines to be projected on wind

screen, as shown in Figure 1. Lines can’t be fixed as these lines are projection of road

coordinates to the human eye while passing through wind screen. Lines drawn over wind screen

may be considered as zoomed version of image being developed in human eye. Viewer/driver

head position determines the position of these lines over wind screen. Change in position of face

alters the position of lines on wind screen. Position of face is captured through a camera inside

vehicle focusing the driver.

Page 6: MSc Research Proposal Proforma.doc

Figure 1: Proposed Framework

During last few years, head up displays are being installed in top level vehicles and the trend

influenced more automotive manufacturers to include head up displays in their vehicles. Head up

displays are being deployed in automotive for displaying status information such as speed, fuel

gauge, temperature etc. This TR&D proposal is a novel application for head up displays. Such

system can directly be used for driver assistance system. This system can also be an extension in

driver training systems. Massive deployment for such system can be expected as deployment of

head up displays is being common in automotives. Similarly this system can also be a major

extension in driver training systems.

This TR&D proposal seeks funding for equipment required for experimentation setup, final

software application development, embedded system development and human resources

involved. My PhD dissertation focused on the development of video based real time system that

involved algorithm development, testing prototype development, coding and testing of

algorithms and finally publishing the work in IEEE Transactions on Consumer Electronics. I

have good understanding of video & imaging systems and determined for successful execution

of this TR&D project.

2. PROJECT DESCRIPTION

2.1 Literature Review

This section is divided into six parts

Page 7: MSc Research Proposal Proforma.doc

2.1.1 Augmented Reality

2.1.2 Display

2.1.2.1 Video see-through

2.1.2.2 Optical see-through

21.2.3 Projective

2.1.3 Display positioning

B1.3.1 Head-worn

B1.3.2 Hand-held

B1.3.3 Spatial

2.1.4 Tracking sensors and approaches

B1.4.1 Modeling environments

B1.4.2 User movement tracking

B1.4.2(i) Face Detection

- Face Detection Challenges:

- Literature Survey

B1.4.2(ii) Eye-Gaze Tracking

- Eye-Gaze Challenges:

- Literature Survey

2.1.5 Coordinate systems

B1.5.1 Five Frame of reference:

2.1.6 Camera Calibration

B1.6.1 Calibration methods and models

B1.6.2 Camera model

2.1.1 Augmented Reality

Augmented Reality (AR) is a visualization paradigm that supplements (typically) vision with

computer generated information, superimposed on the users view[1]. AR is focused on the

presentation of information for enhanced situational awareness and perception within a temporal

and spatially structured manner, allowing users to view, browse, filter, search and interact with

data in the real world[2][3]. Schmalsteig et. al.[1] state that AR is the natural progression of

cartography (map drawing). AR techniques also empower to remove (occlude) existing objects

from reality, such techniques commonly known as mediated or diminished reality[4].

Page 8: MSc Research Proposal Proforma.doc

Augmenting the real world with computer generated graphics provides users with an experience

not possible in just the real or virtual world. Augmented Reality (AR) resides between virtuality

(Virtual Environments) and reality on Milgram & Kishino's continuum [5].

The field of modern AR has existed for well over a decade [4], with origins back to the 1960's

when Sutherland (1968) and his students at Harvard University and the University of Utah

developed Sketchpad, a man-machine graphical communication system. AR systems are defined

by their ability to spatially integrate virtual objects into the physical world in real-time,

immersing the user in an information-rich, interactive environment[1].

AR can be applied to any human sense, although vision is most commonly augmented. Most

applications typically uses Head-Mounted Displays (HMD's) to display spatially registered

computer generated images onto the real world for the user to see[6]. AR is useful in any

situation where additional information is required. Known application areas of AR include

medicine, manufacturing, construction, automotive maintenance, environmental management,

tourism, entertainment, architecture, military, aero and automobile industry. In many critical

applications such as medical operations and military simulations, it has been found advantageous

to conduct simulations within safe augmented environments that are physically accurate and

visually realistic.

In AR applications such as automotive or aircraft, the heads-up display(HUD) virtual graphics

being displayed are not aligned with the real objects resulting in a weak link between virtuality

and reality. A real sense of the virtual and real worlds meeting is achieved when the graphics are

spatially aligned to real world objects which require accurate real-time tracking of the user’s

Reality-Virtuality (RV) Continuum, Courtesy of Milgram[15]

Page 9: MSc Research Proposal Proforma.doc

position and orientation[7]. The diversity of applications to which AR has been applied makes a

consensual definition difficult. Rather than an explicit definition, Azuma[4] defines AR as a

computer system which; combines real and virtual objects, is interactive in real-time and

registered in two or three-dimensional space. While this statement is not perfect] – it is generally

inclusive of all augmented reality derivatives. In the context of application of AR in vehicle

future position augmentation over wind screen, we can adopt this definition as an area of

augmented reality in which augmentation occurs through pre-stored knowledge base about

vehicle dimension and determination of user’s position and orientation (the user’s location and

therefore the surrounding environment).” Augmented Reality has been identified as providing an

excellent user interface for the given application as it augment the user’s senses with additional

information about the vehicle future position.

2.1.2 Display

Of all the modalities in human sensory input, sight, sound and/or touch are currently the senses

that AR systems commonly apply. Since the focus of proposed research is limited to sight this

section only focuses on visual displays.

There are basically three ways to visually present an augmented reality. Closest to virtual reality

is video see-through, where the virtual environment is replaced by a video feed of reality and the

AR is overlaid upon the digitized images. Another way is optical see-through and leaves the

real-world perception alone but displays only the AR overlay by means of transparent mirrors

and lenses. The third approach is to project the AR overlay onto real objects themselves resulting

in projective displays. True 3D displays for the masses are still far off, although [8] already

achieve 1000 dots per second in true 3D free space using plasma in the air. The three techniques

may be applied at varying distance from the viewer: head-mounted, hand-held and spatial.

2.1.2.1 Video see-through

Besides being the cheapest and easiest to implement, this display technique offers the several

other advantages i.e. (1) to mediate or remove objects from reality including removing or

replacing fiducial markers or placeholders with virtual objects, (2) to blend with ease brightness

and contrast of virtual objects with the real environment, evaluating the light conditions of a

static outdoor scene is of importance when the computer generated content has to blend in

smoothly, (3) allows tracking of head movement for better registration, and (4) provides

Page 10: MSc Research Proposal Proforma.doc

possibility to match perception delays of the real and virtual. Issues in video see-through include

a low resolution of reality, a limited field-of-view, and user disorientation due to a parallax (eye-

offset) due to the camera’s positioning at a distance from the viewer’s true eye location, causing

significant adjustment effort for the viewer.

2.1.2.2 Optical see-through

Optical see-through techniques with beam-splitting holographic optical elements (HOEs) may be

applied in head-worn displays, hand-held displays, and spatial setups where the AR overlay is

mirrored either from a planar screen or through a curved screen. These displays not only leave

the real-world resolution intact, they also have the advantage of being cheaper, safer, and

parallax-free (no eye-offset due to camera positioning). Optical techniques are safer because

users can still see when power fails, making this an ideal technique for military and medical

purposes. However, other input devices such as cameras are required for interaction and

registration. Challenges in combining the virtual objects holographically through transparent

mirrors and lenses involves reduced brightness and contrast of both the images and the real-

world perception.

2.1.2.3 Projective

These displays have the advantage that they do not require special eye-wear thus accommodating

user’s eyes during focusing, and they can cover large surfaces for a wide field-of-view.

Projection surfaces may range from flat, plain colored walls to complex scale models ]. Zhou et

al. [9] list multiple pico projectors that are lightweight and low on power consumption for better

integration. However, as with optical see-through displays, other input devices are required for

(indirect) interaction. Also, in most applications the projectors need to be calibrated each time

the environment or the distance to the projection surface changes (crucial in mobile setups).

Fortunately, calibration may be automated using cameras in e.g. a multi-walled Cave automatic

virtual environment (CAVE) with irregular surfaces [10]. Furthermore, this type of display is

limited to indoor use only due to low brightness and contrast of the projected images. Occlusion

or mediation of objects is also quite poor, but for head-worn projectors this may be improved by

covering surfaces with retro-reflective material. Objects and instruments covered in this material

will reflect the projection directly towards the light source which is close to the viewer’s eyes,

thus not interfering with the projection.

Page 11: MSc Research Proposal Proforma.doc

2.1.3 Display positioning

AR displays may be classified into three categories based on their position between the viewer

and the real environment: head-worn, hand-held, and spatial.

2.1.3.1 Head-worn

Visual displays attached to the head include the video/optical see-through head-mounted display

(HMD), virtual retinal display (VRD), and head-mounted projective display (HMPD). A current

drawback of head-worn displays is the fact that they have to connect to graphics computers like

laptops that restrict mobility due to limited battery life.

2.1.3.2 Hand-held

This category includes hand-held video/optical see-through displays as well as hand-held

projectors. Although this category of displays is bulkier than head-worn displays, it is currently

the best work-around to introduce AR to a mass market due to low production costs and ease of

use. For instance, hand-held video see-through AR acting as magnifying glasses may be based

on existing consumer products like mobile phones Möhring et al.

2.1.3.3 Spatial

The last category of displays are placed statically within the environment and include screen-

based video see-through displays, spatial optical see-through displays, and projective displays

such as windshield in automobile applications. These techniques lend themselves well for

varying size presentations and exhibitions with limited interaction. Early ways of creating AR

are based on conventional screens (computer or television) that show a camera feed with an AR

overlay. Such technique is commonly applied in the world of sports television where

environments such as swimming pools and race tracks are well defined and easy to augment.

Head-up displays (HUDs) in military cockpits, aero and automobile industry are a form of

spatial optical see-through and are expected to become a standard extension for future car

productions to project navigational directions in the windshield.

2.1.4 Tracking sensors and approaches

Before an AR system can display virtual objects into a real environment, the system must be able

(1) to sense the environment and (2) track the viewer’s (relative) movement preferably with six

Page 12: MSc Research Proposal Proforma.doc

degrees of freedom (6DOF): three variables (x, y, and z) for position and three angles (yaw,

pitch, and roll) for orientation. There must be some model of the environment to allow tracking

for correct AR registration. Furthermore, most environments have to be prepared before an AR

system is able to track 6DOF movement, but not all tracking techniques work in all

environments. To this day, determining the orientation of a user is still a complex problem with

no single best solution.

2.1.4.1 Modeling environments

Both tracking and registration techniques rely on environmental models. To annotate for instance

external objects on the windshield, an AR system needs to know where they are located with

regard to the user’s current position and field of view. Sometimes the annotations themselves

may be occluded based on environmental model. For instance when an annotated building is

occluded by other objects, the annotation should point to the non-occluded parts only. Since the

Vehicle Future Position Overlay uses optical see-through head-up display technique, the

environmental modeling will be of little concern.

2.1.4.2 User movement tracking

AR tracking devices must have higher accuracy, a wider input variety and bandwidth, and longer

ranges. Registration accuracy depends not only on the geometrical model but also on the

distance of the objects to be annotated – vehicle path in the of case vehicle future position

augmentation. The further away an object (i) the less impact errors in position tracking have and

(ii) the more impact errors in orientation tracking have on the overall mis-registration [18].

Tracking is usually easier in indoor settings than in outdoor settings as the tracking devices do

not have to be completely mobile and wearable or deal with shock, abuse, weather, etc. Instead

the indoor environment is easily modeled and prepared, and conditions such as lighting and

temperature may be controlled. Currently, unprepared outdoor environments still pose tracking

problems with no single best solution.

In case of vehicle future position augmentation, the user tracking involves a two step approach

(1) face detection (2) eye tracking and gaze estimation.

2.1.4.2(i) Face Detection

In this section, an introduction to the work done by other researchers related to the problem of

Page 13: MSc Research Proposal Proforma.doc

face detection will be presented which is the first step in user detection in vehicle future position

augmentation application.

Face detection is a vital component in several applications, it is the first step of any automated

face recognition and eye-gaze tracking systems which can also be used as part of surveillance

and security systems, general object detection, video encoding, facial feature tracking, human

tracking etc. in diversified industrial domains. Face detection problem in the field of computer

vision has been addressed mainly over the last few decades, it has been under thorough

investigation by many researchers with the aim of producing an efficient face detection

algorithms that can solve the problem of detecting faces under different lighting conditions,

scale, location, orientation (up-right, rotated), poses (frontal, profile), and faces in complex

backgrounds. Even up to this day, this problem has not been quite solved and more research into

this field is still required.

Face detection techniques employed several cues e.g. skin color for face detection in color

images and videos, motion for faces in videos, facial/head shape, facial appearance, or a

combination of these parameters. Most successful face detection algorithms use appearance-

based models. A general face detection process involves scanning of an input image at all

possible locations and scales by a sub-window. Face detection is posed as classifying the pattern

in the sub-window as either face or non-face. The face/non-face classifier is learned from face

and non-face training examples using statistical learning methods.

a) Face Detection Challenges

It is particularly hard to build robust classifiers which are able to detect faces in different image

situations and face conditions even if it seems really easy to do this with our human visual

system. In fact, the object “face” is hard to define because of its large variability, depending on

the identity of the person, the lightning conditions, the psychological context of the person etc...

The main challenge for detecting faces is to find a classifier which can discriminate faces from

all other possible images. The first problem is to find a model which can englobe all the possible

states of faces. Let’s define the main variable points of the faces:

The face global attributes. We can extract some common attributes from every face. A face is

globally an object which can be estimated by a kind of ellipse but there are thin faces, rounder

Page 14: MSc Research Proposal Proforma.doc

faces... The skin color can also be really different from one person to one another.

The pose of the face. The position of the person in front of the camera which has been used to

acquire the image can totally change the view of the face: the frontal view, the profile view and

all the intermediate positions, upside down...

The facial expression. Face appearance depends highly on the affective state of the people. The

face features of a smiling face can be far from those of an indifferent temperament or a sad

one. Faces are nonrigid objects and that will limit considerably the number of detection

methods.

Presence of added objects. Face detection included objects that we can usually find on a face:

glasses which change one of the main characteristics of the faces: the darkness of the eyes.

Natural facial features such as mustache beards or hair which can occult one part of the face.

Image Condition. The face appearance vary a lot in function of the lightning conditions, the

type of illumination and intensity and the characteristics of the acquisition system need to be

taken in account.

The background composition is one of the main factors for explaining the difficulties of face

detection. Even if it is quite easy to build systems which can detect faces on uniform

backgrounds, most of the applications need to detect faces in any background condition,

meaning that the background can be textured and with a great variability.

In this context, various approaches have been taken to detect faces in images.

Two main face detection approaches:

Image-Based methods which are built given a set of examples and uses a sliding window to

perform the detection.

Geometrical-Based methods which take in account geometric particularity of face structures.

b) Literature Survey:

The problem of face detection goes back to early 70’s, at that time the overall focus was on

finding ways to detect human faces in simple constitutions where typically the face is in a

passport like photo with a uniform background and uniform lighting conditions with frontal

faces without any rotation in the plane or out of the plane, the research on the subject was rather

simple at the time and only came into attention at early 90’s where more in-depth research was

taking over into the problem using different algorithms, the problem began to address the issues

Page 15: MSc Research Proposal Proforma.doc

related to detecting faces in complex backgrounds with different scales and rotation degrees,

introducing statistical methods and neural networks for face detection in cluttered scenes and

using new methods to extract features from the images like active contours which delineates the

outlines of possible objects in complex images and deformable templates or active shape model

and its variations which could be learned out of a training set to identify the location of pre-

labeled features points and thus can be used to test unseen features points and check if they lie

within the boundary of the deformable templates, several statistical methods can be used to build

such models including the PCA principle component analysis which helps to reduce the

dimensionality of the input training data set. The face detection approaches are as follows

Feature based methods for face detection

Feature based face detection involves several stages, each depends on the results obtained from

the earlier stage, first a low level analysis is done to segment the image features depending on

pixel values, the features extracted are in general the edges in the current scene. Along with edge

detection gray or color pixel information can be used in the initial analysis of the image. The

features obtained from the first stage are analyzed in the another stage to reduce the number of

low level features and extract features relative to facial features depending on the knowledge of

the face geometry. Several methods could be applied in this stage. Leung[11] adopted random

search of facial features and mapping them to the relative position of the facial feature

component which uses a flexible graph matching technique for facial features matching. Benn et

al[12] uses the information of the eyes shape and densities compared to the surrounding areas for

face detection. Kroon[13] addresses the issues of using the eyes for face matching . Active Shape

Models, deformable template and point distributed models which uses labeled training samples

to statistically learn the active fitting parameters and limits, could also be used to extract the final

set of facial features and determine the final decision about the presence or absence of a face.

View-based methods for face detection

View-based methods attempts to detect the human face without depending on the knowledge of

its geometry, treating the face detection problem as a pattern recognition problem by having two

separate classes, faces and none faces, and by implicitly extracting the knowledge embedded in

the face to create a proper boundary between face and none face samples in the image-two

dimensional pixels space. This approach requires training to produce the final classifier which

will be applied at all positions at all scales, where windows are extracted, possibly re-sampled

Page 16: MSc Research Proposal Proforma.doc

(re-scaled), this exhaustive search can be boosted by increasing the scan and/or scaling steps to

reduce the total number of scanned windows, algorithms used in the view-based approach

include PCA Principal Component Analysis, Neural Networks, SVM Support Vector Machines

and other statistical methods that can learn from examples.

Neural Network-Based Face Detection

Rowley’s[14] research a view based approach to detect faces in still images and proved that face

detection problem can be effectively solved by using Neural Networks to detect frontal, and non-

frontal faces with different poses and rotation degrees, the main problems addressed in using

machine learning techniques to learn detecting faces was that faces in images vary considerably

with different lighting conditions, pose, occlusion and facial expressions, compensating for these

variations was important for the learning process, it was essential to choose a representative set

of none faces where training set size could increase indefinitely, so the training used a bootstrap

method[15]. Baluja[16] studied how to detect faces which are rotated in plane, assuming a the

current window is a face window, an initial neural network tries to find the rotation angle of the

face then rotate it to become upright frontal face image and use another neural network to

determine the presence or absence of a face.

Hausdorff Distance-Based Face Detection

Hausdorff distance[17] can be effectively used to detect similarity of two binary images. Several

researched used this method, Jesorsky et al[18] presented a model based approach to detect faces

which uses an edge model that is aimed to model a wide variety of faces this model is matched

against a calculated image of a candidate face which is calculated by using Sobel operator, a

threshold is used and can be changed to take variable illuminations conditions into

considerations.

Face detection using a Boosted Cascade of Simple Features

This method proved to detect objects extremely rapidly and is comparable to the best real time

face detection systems. Viola and Jones[19] presented a new image representation called Integral

Image which allows fast calculation of image features to be used by their detection algorithm,

the second step is an algorithm based on AdaBoost which is trained against the relevant object

class to select a minimal set of features to represent the object, third step is introducing series of

classifiers which have an increasing complexity, these set of cascade classifiers allows to rapidly

Page 17: MSc Research Proposal Proforma.doc

discard of most non-objects at early stages, allowing more time to be spent on the final stages to

yield higher detection rates and thus boosting both accuracy and processing time for the object

detection task, Viola and Jones presented a frame work for object detection motivated by faced

detection and yielded some good results as viewed by their research for real time face detection

at high rates.

Real time 2D face detection using Color Ratios and KMean Clustering

Byrd et. al.[20] presented an algorithm for detecting faces in real time MPEG videos. The

proposed method after converting the MPEG stream into JPEG file sequences the application

turns into still image face detection problem where two phases are used for the detection

purpose, the first is the Skin region detection classifier, images passed to this classifier are

searched for skin regions, images that don’t contain skin regions are discarded, images with skin

regions are passed to a K-Mean face segmentation clustering unit for further investigation of the

skin regions to match a human face.

Component-based Face Detection

Heisele et al[21] used two support vector machine classifiers. The first classifier detects the

components of the current image window as a potential face, the second classifier checks the

geometry of these components to match that of a face or not, in this, a component based

approach is more robust to occlusion than approaches that depend on the whole face.

FloatBoost Learning and Statistical Face Detection

Li and Zang[22] proposed a method to improve the performance of AdaBoost method to achieve

a minimal error rate and requires less weak classifiers to achieve the lower error rate than that of

the AdaBoost, AdaBoost which learns a sequence of weak classifiers which when combined

from a strong classifier of a higher accuracy, as in Viola-Jones work which primarily focused on

frontal faces views, and their work depending on AdaBoost learning algorithm consist mainly of

three stages the first is to identify a minimal set of features out of huge number of features, the

second stage is learning weak classifiers to identify each of these features and the third one in the

combination of these weak classifiers into a strong classifier, this hierarchy leads to one of the

best frontal real time faces detection systems.

Support Vector Regression and Classification Based Multi-view Face Detection and

Page 18: MSc Research Proposal Proforma.doc

Recognition

Li et al[23] approach involves construction of multiple face detectors; each for a specific view of

the face, it makes use of the symmetrical property of the face, but first it applies support vector

regression techniques to estimate to the pose of the face, this estimation is quite crucial to choose

which detection component to apply which improves both accuracy and computational time,

applying a separate detector for each pose yields a better detection rate compared to methods

applied to any pose, further improvements could be made for applying this algorithm on video

by using pose change smoothing strategies to improve performance in real time image

sequences.

Facial feature detection using Haar classifier

Wilson et. al.[24] made an attempt to improve the performance of Viola-Jones detection method

by reducing the area that needs to be examined and thus reducing the computation time, instead

of using the intensity values of each pixel alone, it uses the change in values of contrast between

groups of adjacent regions, these regions are rectangles that can be changes in size to scale Haar-

like features and detect objects of different sizes.

2.1.4.2(ii) Eye-Gaze Tracking:

Another, more specific task after locating the faces’ positions is to recognize what a person is

looking at commonly known as eye-gaze tracking. There is an obvious difference between the

head pose/orientation and the view direction of the eyes. A head can face a specific object, a

monitor for instance, and follow several activities on the screen, such as the mouse pointer, by

moving the eyes against static head. In many application domains the knowledge about the view

direction of the eyes is more important than the orientation of the head, respectively the face. But

the measurements relying on the eyes and the head are usually related to each other. A typical

example in the automotive context using information about the view angle is to measure the

duration of distraction while driving and operating the infotainment system at once. Another

demanding application is the proper perspective projection of a 3-dimensional scene in a virtual

reality environment or to highlight the area with the field of view.

a) Eye-Gaze Detection Challenges:

Eye models needs to be sufficiently expressive to take account of large variability in the

appearance and dynamics of the eye while also sufficiently constrained to be computationally

Page 19: MSc Research Proposal Proforma.doc

efficient is essential in eye detection. The appearance of eye regions shares commonalities across

race, illumination, and viewing angle. On the other hand, a relatively small variation in viewing

angles, even of the same subject, can cause significant changes in appearance. Despite active

research, eye detection and tracking remains a very challenging task due to several unique

issues, including (1) occlusion of the eye by the eyelids, (2) degree of openness of the eye, (3)

variability in either size, reflectivity or head pose, etc.

b) Eye-Gaze Survey:

Eye Gaze is defined as the line of sight of a person. It represents a person’s focus of attention.

Eyes and their movements play an important role in expressing a person’s desires, needs,

cognitive processes, emotional states, and interpersonal relations. The importance of eye

movements to the individual’s perception of and attention to the visual world is implicitly

acknowledged as it is the method through which we gather the information necessary to

negotiate our way through and identify the properties of the visual world. Robust nonintrusive

eye detection and tracking is, therefore, crucial for the development of human-computer

interaction, attentive user interfaces, and understanding human affective states.

Research in eye detection and tracking focuses on two areas: eye localization in the image and

gaze estimation. There are three aspects of eye detection i.e. (1) to detect the existence of eyes,

(2) to accurately interpret eye positions in the images, and (3) to track the detected eyes from

frame to frame in case of video tracking. The eye position is commonly measured using the pupil

or iris center. The detected eyes in the images are used to estimate and track where a person is

looking in 3D, or alternatively, determining the 3D line of sight. This process is called gaze

estimation. In the subsequent discussion, we will use the terms “eye detection” and “gaze

tracking” to differentiate them, where eye detection represents eye localization in the image

while gaze tracking means estimating gaze paths.

Several techniques have been investigated by the researchers for effective eye-gaze tracking

including scleral coil contact lens, electro-oculography(EoG), video-oculography and head

mounted eye trackers. However, since our concern is only limited to video-aculography or video

based eye trackers, the review is limited to this specific domain of eye and gaze tracking.

Page 20: MSc Research Proposal Proforma.doc

Eye Detection

Eye models needs to be sufficiently expressive to take account of large variability in the

appearance and dynamics of the eye while also sufficiently constrained to be computationally

efficient is essential in eye detection. The appearance of eye regions shares commonalities across

race, illumination, and viewing angle. On the other hand, a relatively small variation in viewing

angles, even of the same subject, can cause significant changes in appearance. Despite active

research, eye detection and tracking remains a very challenging task due to several unique

issues, including (1) occlusion of the eye by the eyelids, (2) degree of openness of the eye, (3)

variability in either size, reflectivity or head pose, etc. The eye image may be characterized by

the intensity distribution of the pupil(s), iris, and cornea, as well as by their shapes. Ethnicity,

viewing angle, head pose, color, texture, light conditions, the position of the iris within the eye

socket, and the state of the eye (i.e., open/close) are issues that heavily influence the appearance

of the eye. The intended application and available image data lead to different prior eye models.

The prior model representation is often applied at different positions, orientations, and scales to

reject false candidates. Shape-based methods can be subdivided into fixed shape[25][26][27] and

deformable shape[28][29]. The methods are constructed from either the local point features of

the eye and face region or from their contours. The pertinent features may be edges, eye corners,

or points selected based on specific filter responses. The limbus and pupil are commonly used

features. While the shape-based methods use a prior model of eye shape and surrounding

structures, the appearance-based methods rely on models built directly on the appearance of the

eye region. The appearance-based approach (the holistic approach) conceptually relates to

template matching by constructing an image patch model and performing eye detection through

model matching using a similarity measure. The appearance- based methods can be further

divided into intensity and subspace-based methods. The intensity-based methods[30] use the

intensity or filtered intensity image directly as a model, while the subspace methods[31] assume

that the important information of the eye image is defined in a lower dimensional subspace.

Hybrid methods combine feature, shape, and appearance approaches to exploit their respective

benefits.

Gaze Estimation

The primary task of gaze estimation is to determine user focus or attention. Gaze can be either

the gaze direction or the point of regard (PoR). Gaze modeling consequently focuses on the

relations between the image data and the point of regard/gaze direction. Basic categorizations of

Page 21: MSc Research Proposal Proforma.doc

eye movements include saccades and fixations. A fixation occurs when the gaze rests for some

minimum amount of time on a small predefined area, usually within 2-5 degrees of central

vision, usually for at least 80-100 ms. Saccades are fast, jump-like rotations of the eye between

two fixated areas, bringing objects of interest into the central few degrees of the visual field.

Smooth pursuit movements are a further categorization that describe the eye following a moving

object. Saccadic eye movements have been extensively investigated for a wide range of

applications including the detection of fatigue/drowsiness, human vision studies, diagnosing

neurological disorders, and sleep studies. Fixations are often analyzed in vision science,

neuroscience, and psychological studies to determine a person’s focus and level of attention. Eye

positions are restricted to a subset of anatomically possible positions described in Listing’s and

Donder’s laws[32]. According to Donder’s law, gaze direction determines the eye orientation

uniquely and the orientation is furthermore independent of the previous positions of the eye.

Listing’s law describes the valid subset of eye positions as those which can be reached from the

so called primary position through a single rotation about an axis perpendicular to the gaze

direction.

People move their heads when using a gaze tracking. A person’s gaze is determined by the head

pose (position and orientation) and eyeball orientation. A person can change gaze direction by

rotating the eyeball (and consequently also the pupil) while keeping the head stationary.

Similarly, a person can change gaze direction by moving the head while keeping the eye

stationary relative to the head. Usually, a person moves the head to a comfortable position before

orienting the eye. Head pose, therefore, determines the coarse-scale gaze direction while the

eyeball orientation determines the local and detailed gaze direction. Gaze estimation therefore

needs to (either directly or implicitly) model both head pose and pupil/iris position. The problem

of ensuring head pose invariance in gaze tracking, such as in the case of vehicle future position

augmentation, is important and constitutes a challenging research topic. Head pose invariance

may be obtained through various hardware configurations and prior knowledge of the geometry

and cameras. Information on head pose is rarely used directly in the gaze models. It is more

common to incorporate it implicitly either through the mapping function or through the use of

glints (reflections on the cornea).

All gaze estimation methods need to determine a set of parameters through calibration including:

Page 22: MSc Research Proposal Proforma.doc

- camera-calibration: determining intrinsic camera parameters;

- geometric-calibration: determining relative locations and orientations of different units in the

setup such as camera, light sources, and monitor;

- personal calibration: estimating cornea curvature, angular offset between visual and optical

axes; and

- gazing mapping calibration: determining parameters of the eye-gaze mapping functions.

A system where the camera parameters and geometry are known is termed fully calibrated.

Desirable attributes in a gaze tracking include minimal intrusiveness and obstruction, allowing

for free head movements while maintaining high accuracy, easy and flexible setup, and low cost.

A more detailed description of eye tracking preferences is elaborated by Scott and Findlay [33].

Until recently, the standard eye tracking systems were intrusive, requiring, for example, a

reflective white dot placed directly onto the eye or attaching a number of electrodes around the

eye. Use of headrests, bite-bars, or making the eye tracker head mounted were common

approaches to accommodate significant head movements. Head movements are typically tracked

using either a magnetic head tracker, another camera, or additional illuminators. Head and eye

information is fused to produce gaze estimates. Compared to the early systems, video-based gaze

trackers have now evolved to the point where the user is allowed much more freedom of head

movements while maintaining good accuracy (0.5 degree or better). As reviewed in this section,

recent studies show that using specific reflections from the cornea allows gaze tracking relatively

easy, and enhances stable and head pose invariant gaze estimation.

2.1.5 Coordinate Systems:

a) Raster oriented uses row and column coordinates starting at [0, 0] from the top left.

b) Cartesian coordinate frame with [0, 0] at the lower left.

Page 23: MSc Research Proposal Proforma.doc

c) Cartesian coordinate frame with [0, 0] at the image center.

d) Relationship of pixel center point [x, y] to area element sampled in array element I[i, j]

2.1.5.1 Five Frame of reference

References frames are needed in order to do either qualitative or quantitative analysis of 3D

scenes. Five frames of reference are commonly found for general problems in 3D scene analysis.

Figure: Five coordinate frames for 3D analysis

i) Pixel Coordinate Frame I:

Many things about a scene can be determined by analysis of the image in terms of only rows and

columns .Using only image I, however, and no other information, we cannot determine which

object is actually larger in 3D or whether or not objects are on a collision course.

ii) Object Coordinate Frame O:

An object coordinate frame is used to model ideal objects in both computer graphics and

computer vision. The object coordinate frame is needed in order to inspect an object, e.g. to

check if a particular hole is in a proper position relative to other holes or corners.

iii) Camera Coordinate Frame C:

The camera coordinate frame is often needed for “egocentric” (camera centric) view; e.g. to

represent whether or not an object is just in front of the sensor, moving away etc.

iv) Real Image Coordinate Frame F:

Camera coordinates are real numbers, usually in the same units as the world coordinates say

inches or mm. 3D points project to the real image plane at coordinates [ ] where f is the focal

Page 24: MSc Research Proposal Proforma.doc

length and are related to the pixel size and pixel position of the optical axis in the image.

v) World Coordinate Frame W:

The world coordinate frame is needed to relate objects in 3D. E.g. to determine whether or not a

runner is far off a base or if the runner and second baseman will collide.

2.1.6 Camera Calibration

Camera calibration, in the context of three-dimensional machine vision, is the process of

determining the internal camera geometric and optical characteristics (intrinsic parameters),

and/or the 3-D position and orientation of the camera frame relative to a certain world coordinate

system. For applications that need to infer 3-D information from 2-D image coordinates, e.g.

automatic assembling, three dimensional metrology, robot calibration, and vehicle guidance, it is

essential to calibrate the camera either on-line or off-line. In many cases, the overall system

performance strongly depends on the accuracy of the camera calibration. Camera calibration has

been widely studied in photogrammetric and computer vision. As a result, several calibration

techniques have been proposed. According to Weng et al. these techniques can be classified into

three categories:

nonlinear minimization

These techniques typically produce very accurate calibration results, but they are

computationally intensive.

closed-form solutions

Closed-form solutions are obtained simply by solving a set of linear equations, but the result is

not as accurate as can be achieved with nonlinear methods. A trade-off between the accuracy and

the computational cost is up to the requirements of the application.

two-step methods

Two-step methods involve a direct solution for most of the camera parameters and an iterative

solution for the other parameters. Explicit camera calibration means the process of computing

the physical camera parameters, like the image center, focal length, position and orientation, etc.

Most of the calibration procedures proposed in the literature belong to this category. However, in

some specific cases, the physical explanation for the camera parameters is not required. An

implicit calibration procedure can then be used where the parameters do not have any physical

meaning.

Page 25: MSc Research Proposal Proforma.doc

Usually, the problem of camera calibration is to compute the camera parameters based on a

number of control points whose mutual positions are know, and whose image positions are

measured. The control point structure may be three-dimensional or coplanar. In the case of a 3-D

structure, the camera parameters can be solved based on a single image. However, an accurate 3-

D target that covers the entire object space is quite difficult to manufacture and maintain.

Coplanar targets are more convenient, but not all camera parameters can be solved based on a

single view. Thus, some constraints for the parameters must be applied. Another possibility is to

capture several images from different viewpoints and solve all intrinsic and extrinsic parameters

simultaneously. However, problems may exist with highly correlated parameters. An approach

that differs considerably from the traditional calibration methods is self-calibration, where the

camera parameters are estimated based on an unknown scene structure, and only a reference

distance is needed to scale the parameters into correct units. Several calibration procedures can

be found from the literature, but typically the errors characteristic of these procedures are not

much considered. There are various error sources that contribute to the calibration procedure.

Typically, these errors are inherent in the image formation process. Some of them are observable

as a systematic component of the fitting residual, but there also are certain errors that cannot be

determined from the residual.

2.1.6.1 Calibration Methods and Models

The nature of the application and the required accuracy can dictate which of two basic

underlying functional models should be adopted:

A camera model based on perspective projection, where the implication is that the IO is

stable (at least for a given focal length setting) and that all departures from co linearity, linear

and non-linear, can be accommodated. This co linearity equation-based model generally

requires five or more point correspondences within a multi-image network and due to its

non-linear nature requires approximations for parameter values for the least-squares bundle

adjustment in which the calibration parameters are recovered.

A projective camera model supporting projective rather than Euclidean scene

reconstruction. Such a model, characterized by the Essential matrix and Fundamental matrix

models, can accommodate variable and unknown focal lengths, but needs a minimum of 6 - 8

point correspondences to facilitate a linear solution, which is invariably quite unstable.

Nonlinear image coordinate perturbations such as lens distortion are not easily dealt with in

Page 26: MSc Research Proposal Proforma.doc

such models.

Further criteria can also be used to classify camera calibration methods:

Implicit versus explicit models. The photogrammetric approach, with its explicit physically

interpretable calibration model, is contrasted against implicit models used in structure from

motion algorithms which correct image point positions in accordance with alignment

requirements of a real projective mapping.

Methods using 3D rather than planar point arrays. Whereas some CV methods and

photogrammetric self-calibration can handle both cases – with appropriate network

geometry – models such as the Essential matrix cannot accommodate planar point arrays.

Point-based versus line-based methods. Point-based methods are more popular in

photogrammetric, with the only line-based approach of note, namely plumb line

calibration, yielding parameters of lens distortion, but not of IO.

A more specific classification can be made according to the parameter estimation and

optimization technique employed:

Linear techniques are quite simple and fast, but generally cannot handle lens distortion and

need a control point array of known coordinates. They can include closed-form solutions,

but usually simplify the camera model, leading to low accuracy results. The well-known

DLT, which is essentially equivalent to an Essential matrix approach, exemplifies such a

technique.

Non-linear techniques such as the extended co linearity equation model, which forms the

basis of the self calibrating self calibrating bundle adjustment, are most familiar to

photogrammetrists. A rigorous and accurate modeling of the camera IO and lens distortion

parameters is provided through an iterative least-squares estimation process.

A combination of linear and non-linear techniques where a linear method is employed to

recover initial approximations for the parameters, after which the orientation and

calibration are iteratively refined. This two-stage approach has in most respects been

superseded for accurate camera calibration by the bundle adjustment formulation above,

which is also implicitly a two-stage process.

2.1.6.2 Camera Model

The object and camera coordinates are typically given in a right-handed Cartesian coordinate

Page 27: MSc Research Proposal Proforma.doc

system. This is illustrated in figure where the camera projection is approximated with a pinhole

model. The pinhole model is based on the principle of co linearity, where each point in the object

space is projected by a straight line through the projection center into the image plane. The

origin of the camera coordinate system is in the projection center at the location (X0, Y0, Z0)

with respect to the object coordinate system, and the z-axis of the camera frame is perpendicular

to the image plane. The rotation is represented using Euler angles ω, Φ and k that define a

sequence of three elementary rotations around the x-, y-, z-axis, respectively. The rotations are

performed clockwise, first around the x-axis, then the y-axis that is already once rotated, and

finally around the z-axis that is twice rotated during the previous stages.

Figure: Object and camera coordinate systems

2.2 Project and Program Objectives

(Describe the project aims in terms of theoretical, experimental and socio-

economic aspects. Clearly lay down major and minor milestones to be achieved

during the execution of the project. List down these objectives in explicit terms)

2.3 Project Management Design

(Narrate the total duration of the project. Then divide the total period into various

phases, such as;

Phase Activity Description Duration

Page 28: MSc Research Proposal Proforma.doc

1 Literature Review 2 Months

2 Simulation Setup Design 2 Month

3 Development of dataset 1 Month

4 Algorithm Coding & Result Formulation 4 Months

5 Publication & Report Writing 3 Months

2.4 Organization and Management

Human Resource Required:

1. Principle Investigator

2. Co-principle Investigator

2.5 Project Work Schedule

Phase

1

2

3

4

5

2.6 Project Detailed Budget

2.6.1 Human Resources

Senior Investigators List by

Name

Level of Effort

(No of Months)

(Part Time)

Rate/Month

Total (Rs)

Page 29: MSc Research Proposal Proforma.doc

Principal Investigator --- --- ---

Co Investigator --- --- ---

Research Assistants and Other

Personal (Put No in brackets)

a. ( ) Postgraduate Students ---

b. ( ) Undergraduate Students ---

c. ( ) Lab Technicians ---

d. ( ) Other Professionals ---

e. ( ) Secretaries & Labours - - -

2.6.2 Capital Expenses

Experimental Equipment

(Provide Details)

1. IR Camera Rs. 20000

2. Peco Projector Rs. 55000

3. TI Based Video Processing Board Rs. 75000

2.6.3 Materials and Supplies

a. Materials (Specify)

b) Supplies

c. Stationery __________________________________________

d. Others _____________________________________________

2.6.4 Books, Research Literature

Page 30: MSc Research Proposal Proforma.doc

a. Book & Literature “Journal Paper” b. Internet Services

Charges_____________________________ Rs. ………………

2.6.5 Other Expenses

a. Field Work/Collection of Data__________________________Rs. ………………

b. Travel _____________________________________________ Rs. ………………

c. Publication Charges

TOTAL PROJECT COST: Rs. 150000/-

2.7 Utilization

The proposed TR&D project has four major motivations

To enhance the functionality of driver training and simulation systems in real time

environment.

To assist drivers in real time driving situations by displaying real time information.

To train drivers to maintain their lane with maximum safety.

Overall improvement in safety for drivers, vehicles and pedestrians.

Driver training and simulation systems are mainly indoor based systems. This framework will

help in conducting the training in real time. Trainee will get help while driving vehicle over

Page 31: MSc Research Proposal Proforma.doc

road.

Drivers will get assistance in real time as they can see their vehicle’s future occupancy over road

area on the wind screen. In parallel, this system will support novice drivers to get maturity

gradually. System has major utility in crowded areas as well where pedestrians and other

vehicles are too close.

3. CITED REFERENCES

[1] Schmalstieg, D and Reitmayr, G (2007) ‘The World as a User Interface: AR for Ubiquitous

Computing’, In Gartner, G., Cartwright, W., and Peterson, M. (eds.), Location Based Services

and TeleCartography, Springer, pp. 369-392

[2] Behringer, R (1999) ‘Registration for Outdoor Augmented Reality Applications Using Computer

Vision Techniques and Hybrid Sensors’, In Proceedings of IEEE Virtual Reality, IEEE

Computer Society, pp. 244-251.

[3] Ledermann, F and Schmalstieg, D (2005) ‘APRIL: A High-Level Framework for Creating

Augmented Reality Presentations’, In IEEE Virtual Reality, pp. 187-194.

[4] Azuma, R, Baillot, Y, Behringer, R, Feiner, S et al. (2001) ‘Recent Advances in Augmented

Reality’, IEEE Computer Graphics and Applications, 21(6), pp. 34-47.

[5] Milgram, P and Kishino, F (1994) ‘A taxonomy of mixed reality visual displays’, IEICE

Transactions on Information Systems, E77-D(12), pp. 1321-1329.

[6] Reitmayr, G (2004) ‘On software design for augmented reality’, Vienna University of Technology.

[7] Dix, A, Finlay, J, Adowd, G and Beale, R (1998) Human-Computer Interaction, 2nd ed. Prentice

Hall.

[8] D. Schmalstieg, A. Fuhrmann, and G. Hesina. Bridging multiple user interface dimensions with

augmented reality. In [3], pp. 20–29.

[9 F. Zhou, H. B.-L. Duh, and M. Billinghurst. Trends in augmented reality tracking, interaction and

display: A review of ten years of ISMAR. In M. A. Livingston, O. Bimber, and H. Saito, editors,

ISMAR’08: Proc. 7th Int’l Symp. on Mixed and Augmented Reality, pp. 193–202, Cambridge,

UK, Sep. 15-18 2008. IEEE CS Press. ISBN 978-1-4244-2840-3.

[10] J. Rekimoto. Transvision: A hand-held augmented reality system for collaborative design. In

VSMM’96: Proc. Virtual Systems and Multimedia, pp. 85–90, Gifu, Japan, 1996. Int‟l Soc.

Page 32: MSc Research Proposal Proforma.doc

Virtual Systems and Multimedia

[11] Leung, T. K., Burl, M. C., and Perona, P. (1995). Finding faces in cluttered scenes using random

labeled graph matching. In Proceedings of the Fifth international Conference on Computer

Vision (June 20 - 23, 1995). ICCV. IEEE Computer Society, Washington, DC, p.637.

[12] Benn, D. E., Nixon, M. S. and Carter, J. N. (1999) Extending Concentricity Analysis by

Deformable Templates for Improved Eye Extraction. In: Proceedings of the Second International

Conference on Audio- and Video-Based Biometric Person Authentication.

[13] Kroon, B., Hanjalic, A., and Maas, S. M. 2008. Eye localization for face matching: is it always

useful and under what conditions?. In Proceedings of the 2008 international Conference on

Content-Based Image and Video Retrieval (Niagara Falls, Canada, July 07 - 09, 2008). CIVR '08.

ACM, New York, NY, 379-388.

[14] H.A. Rowley, S. Baluja, and T. Kanade, “Neural network-based face detection” ,IEEE

Transactions on Pattern Analysis and Machine Intelligence, 20(1):23–38,1998.

[15] Sung, K. Poggio,T.(1998), Example-based Learning for View-based Human Face Detection,

IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 20, No. 1, January 1998

[16] Baluja,S (1997). Face detection with in-plane rotation: Early concepts and preliminary Results,

Justsystem Pittsburgh Research Center, online available at:

http://www.cs.cmu.edu/˜baluja/papers/baluja.face.in.plane.ps.gz

[17] Huttenlocher, D. P. Klanderman, G. A. Rucklidge, W. J, (1993), Comparing Images Using the

Hausdorff Distance, IEEE Institute Of Electrical And Electronics, IEEE Transactions On Pattern

Analysis And Machine Intelligence”.

[18] Jesorsky, O.Kirchberg, K. Frischholz, R. (2001), Robust Face Detection Using the Hausdorff

Distance, In Proc. Third International Conference on Audio- and Video-based Biometric Person

Authentication, Springer, Halmstad, Sweden, 6–8 June 2001, Lecture Notes in Computer Science,

LNCS-2091, pp. 90–95.

[19] Viola, P. Jones, M. (2004), Rapid Object Detection Using a Boosted Cascade of Simple Features,

Mitsubishi Electric Research Laboratories, IEEE Computer Society Conference on Computer

Vision and Pattern Recognition, May 2004, [Online]: http://www.merl.com/reports/docs/TR2004-

043.pdf [10/03/2008]

[20] Byrd, R. and Balaji, R. 2006. Real time 2-D face detection using color ratios and K-mean

clustering. In Proceedings of the 44th Annual Southeast Regional Conference (Melbourne,

Page 33: MSc Research Proposal Proforma.doc

Florida, March 10 - 12, 2006). ACM-SE 44. ACM, New York, NY, 644-648.

[21] Heisele, B. et al. (2002) Component-based Face Detection, [online], Available:

http://cbcl.mit.edu/projects/cbcl/publications/ps/cvpr2001-1.pdf [02-Oct-2008.

[22] Li, Z. Zhang .Z(2004), FloatBoost Learning andStatistical Face Detection, IEEE Transactions

On Pattern Analysis And Machine Intelligence, Vol. 26, No. 9, September 2004

[23] Li, Y. Gong, S. and Liddell, H. (2000) Support vector regression and classification based multi-

view facedetection and recognition, Automatic Face and Gesture Recognition, 2000. Proceedings.

Fourth IEEE International Conference on Volume , Issue , 2000 Page(s):300 – 305.

[24] Wilson, P. I. and Fernandez, J. (2006). Facial feature detection using Haar classifiers. J. Comput.

Small Coll. 21, 4 (Apr. 2006), 127-133.

[25] D.W. Hansen and A.E.C. Pece, “Eye Tracking in the Wild,” Computer Vision and Image

Understanding, vol. 98, no. 1, pp. 182-210, Apr. 2005.

[26] J. Daugman, “The Importance of Being Random: Statistical Principles of Iris Recognition,”

Pattern Recognition, vol. 36, no. 2, pp. 279-291, 2003.

[27] K.-N. Kim and R.S. Ramakrishna, “Vision-Based Eye-Gaze Tracking for Human Computer

Interface,” Proc. IEEE Int’l Conf. Systems, Man, and Cybernetics, 1999.

[28] J.P. Ivins and J. Porrill, “A Deformable Model of the Human Iris for Measuring Small 3-

Dimensional Eye Movements,” Machine Vision and Applications, vol. 11, no. 1, pp. 42-51, 1998

[29] A. Yuille, P. Hallinan, and D. Cohen, “Feature Extraction from Faces Using Deformable

Templates,” Int’l J. Computer Vision, vol. 8, no. 2, pp. 99-111, 1992

[30] P.W. Hallinan, “Recognizing Human Eyes,” Geometric Methods in Computer Vision, pp. 212-

226, SPIE, 1991.

[31] J. Huang and H. Wechsler, “Eye Detection Using Optimal Wavelet Packets and Radial Basis

Functions (RBFs),” Int’l J. Pattern Recognition and Artificial Intelligence, vol. 13, no. 7, 1999.

[32] D. Tweed and T. Vilis, “Geometric Relations of Eye Position and Velocity Vectors during

Saccades,” Vision Research, vol. 30, no. 1, pp. 111-127, 1990.

[33] D. Scott and J. Findlay, “Visual Search, Eye Movements and Display Units,” human factors

report, 1993.

UNDERTAKING

I/We hereby agree to undertake the research work of project with details as under

Page 34: MSc Research Proposal Proforma.doc

Title of Research Project Proposal

Vehicle Future Position Augmentation over Wind Screen: Head-up-Display and Eye Tracking

Framework

Name of Sponsoring Agency/Department/Organization:

---------------------------------------------------------------------------------------------------------------------

Awarded to me/us vide Notification/Grant No: ……………………… Dated: ……………………

I/We also undertake to submit and present reports on the progress of the work and audited

statements of the approved expenditures incurred.

At the termination of the project, I/We would submit the final report on the research project to

the competent authority/sponsoring agency.

Further, I/We have not submitted this project to any other sponsoring agency for funding.

Principal Investigator Co Investigator

Name Dr. Tabassam Nawaz Mr. Muhammad Shahid

Designation Associate Professor PhD Scholar

Department Software Engineering Department

UET Taxila

Computer Engineering Department

UET Taxila

Signature with date

Endorsements Chairman of Department Dean of Faculty

Name Dr. M. Iram Baig Prof Dr. Adeel Akram

Signature with date

Page 35: MSc Research Proposal Proforma.doc