Prabhat Report11

8/13/2019 Prabhat Report11

1/70

DIMENSIONALITY REDUCTION TECHNIQUES FORFACE RECOGNITION

A MAJOR PROJECT REPORT

Submitted by ADITYA RITESH B090609EC

ATUL KUMAR B090739ECBHARAT RAJ MEENA B090780EC

GAURAV SINGH B090559ECGOURAB RAY B090402EC

PRABHAT PRAKASH VERMA B090829EC

In partial fulllment for the award of the Degree of

BACHELOR OF TECHNOLOGYIN

ELECTRONICS AND COMMUNICATION ENGINEERING

Under the guidance of DR. PRAVEEN SANKARAN

DEPARTMENT OFELECTRONICS AND COMMUNICATION ENGINEERINGNATIONAL INSTITUTE OF TECHNOLOGY CALICUT

NIT CAMPUS PO, CALICUTKERALA, INDIA 673601.

APRIL 2013


2/70

ACKNOWLEDGEMENT

At the outset, we are indeed very grateful to Dr. G. Abhilash, Project Coordina-

tor, for allowing us to undertake the project and for his useful guidance. We wouldalso like to extend our heartfelt gratitude to Dr. Praveen Sankaran, our ProjectGuide, for guiding us throughout the project, and with his deep knowledge in theeld, providing us a correct orientation to our ideas which led to the successful ac-complishment of the project. Next, we would like to thank Mr. R. Suresh , Mr.Ameer P. M and Mr V Sakthivel for evaluating our reports and giving us necessaryfeedbacks. Finally, we also thank the ECE Department, NIT Calicut, for providingus the necessary facilities and resources. For all the mistakes that remain, the blame

is entirely ours.

ii


3/70

DECLARATION

We hereby declare that this submission is my our work and that, to

the best of my knowledge and belief, it contains no material previously

published or written by another person nor material which has been

accepted for the award of any other degree or diploma of the university

or other institute of higher learning, except where due acknowledge-

ment has been made in the text.

Date:

Name Roll no. Signature

ADITYA RITESH B090609EC

ATUL KUMAR B090739EC

BHARAT RAJ MEENA B090780EC

GAURAV SINGH B090559EC

GOURAB RAY B090402EC

PRABHAT PRAKASH VERMA B090829EC

iii


4/70

CERTIFICATE

This is to certify that the MAJOR PROJECT entitled:

Dimensionality Reduction Techniques For Face Recogni-

tion submitted by Aditya Ritesh B090609EC , Atul Kumar

B090739EC , Bharat Raj Meena B090780EC , Gaurav Singh

B090559EC , Gourab Ray B090402EC , Prabhat Prakash

Verma B090829EC to the National Institute of Technology Calicut

towards partial fulllment of the requirements for the award of the Degree of BACHELOR of Technology in Electronics and Commu-

nication Engineering is a bona de record of the work carried out

by them under my supervision and guidance.

Signed by MAJOR PROJECT Supervisor

Dr.Praveen Sankaran

Assistant Professor ECED NIT Calicut

Place:

Date :

Signature of Head of the Department

(Office seal )

iv


5/70

Copyright, 2013, by Aditya Ritesh B090609EC

Atul Kumar B090739EC

Bharat Raj Meena B090780EC

Gaurav Singh B090559EC

Gourab Ray B090402EC

Prabhat Prakash Verma B090829EC, All Rights Reserved

v


6/70

TABLE OF CONTENTS

Page

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

CHAPTERS

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 A demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Dimensionality reduction . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Linear Dimensionality ReductionTechniques . . . . . . . . . . . . . . . . 83.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Statistics involved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2.1 Mean image from the face database . . . . . . . . . . . . . . . 10

3.2.2 Covariance matrix from the face database . . . . . . . . . . . 10

3.3 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . 11

3.3.1 Mathematical Analysis of PCA . . . . . . . . . . . . . . . . . 12

3.3.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . 163.3.3 Calculation of eigenvalues and eigenvectors . . . . . . . . . . . 17

3.3.4 Formation of a Feature Vector . . . . . . . . . . . . . . . . . . 18

3.3.5 Derivation of a new data set . . . . . . . . . . . . . . . . . . . 18

3.3.6 Eigen Faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

vi


7/70

vii

3.4 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . 21

3.4.1 Mathematical Analysis of LDA . . . . . . . . . . . . . . . . . 22

3.5 Independent Component Analysis (ICA) . . . . . . . . . . . . . . . . 24

3.5.1 Mathematical analysis of ICA . . . . . . . . . . . . . . . . . . 24

3.5.2 ICA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Non-Linear Dimensionality Reduction Techniques . . . . . . . . . . . . . 27

4.1 Kernel Methods for face recognition: . . . . . . . . . . . . . . . . . . 27

4.1.1 Kernel Principal Component Analysis: . . . . . . . . . . . . . 27

4.1.1.1 Algorithm: . . . . . . . . . . . . . . . . . . . . . . . 27

4.1.1.2 Dimensionality Reduction and Feature Extraction: . 28

4.1.2 Kernel Fisher Analysis: . . . . . . . . . . . . . . . . . . . . . . 28

4.1.2.1 Algorithm: . . . . . . . . . . . . . . . . . . . . . . . 29

5 Face recognition using Gabor Wavelets . . . . . . . . . . . . . . . . . . . 31

5.1 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2 Gabor Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2.1 Extracting features with Gabor lters . . . . . . . . . . . . . . 32

6 Distance Measures , ROC and ORL database . . . . . . . . . . . . . . . 33

6.1 Distance measures for Face Recognition . . . . . . . . . . . . . . . . . 33

6.1.1 Euclidean distance . . . . . . . . . . . . . . . . . . . . . . . . 33

6.1.2 Mahalanobis distance . . . . . . . . . . . . . . . . . . . . . . . 33

6.1.3 Cosine Similarity . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.1.4 City Block distance . . . . . . . . . . . . . . . . . . . . . . . . 34

6.1.5 Chessboard Distance . . . . . . . . . . . . . . . . . . . . . . . 34

6.2 ROC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.3 ORL Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7 Real Time Face Recognition System using OpenCV . . . . . . . . . . . . 38

7.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.2 Preprocessing Facial Images . . . . . . . . . . . . . . . . . . . . . . . 38

7.3 Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39


8/70

viii

8 Method of Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 40

8.1 PCA and LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

8.2 ICA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

8.3 KPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

8.4 Gabor PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

8.5 Gabor LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

9 Results and Observations . . . . . . . . . . . . . . . . . . . . . . . . . . 43

9.1 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

9.2 LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

9.3 ICA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

9.4 Kernel PCA and Kernel LDA . . . . . . . . . . . . . . . . . . . . . . 46

9.5 Gabor Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

9.6 Gabor PCA and Gabor LDA . . . . . . . . . . . . . . . . . . . . . . . 47

9.7 Real time Face Recognition System . . . . . . . . . . . . . . . . . . . 47

10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56


9/70

ABSTRACT

DIMENSIONALITY REDUCTION TECHNIQUES FORFACE RECOGNITION

Aditya Ritesh B090609ECAtul Kumar B090739EC

Bharat Raj Meena B090780ECGaurav Singh B090559ECGourab Ray B090402EC

Prabhat Prakash Verma B090829ECNational Institute of Technology Calicut 2013

Project Guide: Dr. Praveen Sankaran

Data dimensionality algorithms try to project input high dimensional data to a lowerdimensional space providing better data classication ability. In this work, we aimto study various data dimensionality reduction algorithms with specic applicationto face recognition techniques. Our main aim would be to treat the problem froma purely mathematical or computational point of view. The input of a face recog-nition system is always an image or video stream, which is then projected from theoriginal vector space to a carefully chosen subspace. The next step, feature extrac-

tion, involves obtaining relevant facial features from the data. We implement linear(like PCA, LDA) and non linear and wavelet based algorithms and do a comparativestudy under different circumstances simulated with standard face databases. Theoutput is an identication or verication of the subject or subjects that appear inthe image or video. In an identication task, the system would report an identityfrom a database.

ix


10/70

LIST OF FIGURES

Page2.1 curse of dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 1 dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 2 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4 3 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.1 Linear subspace technique general algorithm . . . . . . . . . . . . . . 93.2 Matrix representation of N images present in the database . . . . . . 103.3 PCA classication of given data (a) Worse classication of data (b)

The best classication of data . . . . . . . . . . . . . . . . . . . . . . 173.4 Principal component analysis algorithm for face recognition . . . . . . 19

3.5 Fishers linear discriminant algorithm for face recognition . . . . . . 213.6 Flow chart of linear discriminant analysis algorithm . . . . . . . . . 223.7 ICA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.8 Explanation of KPCA . . . . . . . . . . . . . . . . . . . . . . . . . . 284.9 KDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296.1 Typical Receiver Operating Characteristic Curve . . . . . . . . . . . 366.2 Sample images of a single person from ORL database . . . . . . . . . 378.1 Schematic of the Face Recognizer . . . . . . . . . . . . . . . . . . . . 41

8.2 Blind Source Separation model . . . . . . . . . . . . . . . . . . . . . 418.3 Finding statistically independent basis images . . . . . . . . . . . . . 429.1 Mean Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439.2 Eigenfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449.3 Identifying similar faces . . . . . . . . . . . . . . . . . . . . . . . . . 449.4 Performance of PCA based Face Recognition with ORL Face Database 459.5 ROC for PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469.6 Recognition rate vs number of eigen faces for PCA . . . . . . . . . . 479.7 Recognition rate vs number of training images for PCA . . . . . . . . 489.8 Fisher faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489.9 Performance Of LDA Based Face Recognition With Orl Face Database 489.10 ROC for LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499.11 Recognition rate vs number of training images for PCA . . . . . . . . 499.12 Source images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

x


11/70

9.13 Aligned faces-features extracted . . . . . . . . . . . . . . . . . . . . . 509.14 Mixed images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509.15 Independent Components (Estimated Sources) . . . . . . . . . . . . . 519.16 ROC for KPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

9.17 ROC for KLDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529.18 Magnitude response with no downsampling . . . . . . . . . . . . . . . 529.19 Magnitude response with downsampling factor 64 . . . . . . . . . . . 539.20 ROC for Gabor PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . 539.21 ROC for Gabor LDA . . . . . . . . . . . . . . . . . . . . . . . . . . 549.22 Opencv implementation . . . . . . . . . . . . . . . . . . . . . . . . . 54

xi


12/70

CHAPTER 1

INTRODUCTION

1.1 BACKGROUND

During the past decades, advances in data collection and storage capabilities haveled to an information overload in most sciences. Researchers working in domains asdiverse as engineering, astronomy, biology, remote sensing, economics, and consumertransactions, face larger and larger observations and simulations on a daily basis.Such data sets, in contrast with smaller, more traditional data sets that have beenstudied extensively in the past, present new challenges in data analysis. Traditionalstatistical methods break down partly because of the increase in the number of ob-

servations, but mostly because of the increase in the number of variables associatedwith each observation. This is also seen in the case with face recognition, as theresolution of images have increased drastically it has become difficult to for patternrecognition in these images.

Face recognition is one of the most efficient biometric techniques used for iden-tifying humans from a database. Biometrics use physical and behavioral propertiesof the human. Face plays a primary role in identifying the person and also face isthe more easily rememberable part of the human.This skill is very robust because

we identify a person even after so many years with different aging, different lightconditions and viewing conditions. The biometric techniques like nger print, irisand signature need some cooperation from the person for identication. The majoradvantage with face recognition is, it does not need any physical cooperation of thehuman at the time of recognition.

1.2 MOTIVATION

Denitely, the wide applicability of face recognition is the main motivating factor.Face recognition plays an important role in various applications (e.g. computer vi-sion, image processing and pattern recognition). The ideal computer vision modelswork more or less like humans vision. In computer vision related applications, facerecognition is useful in taking the decision based on the information present in the

1


13/70

images or video. There are many computer vision based applications on image anal-ysis like recognizing and tracking humans in some public and private areas. Even infuture driver assistance systems, driver face observation will play an important role.

In many security related applications like banking, border check, etc, the person

identication and verication is one of the main issues . In these applications per-sons must be recognized or identied. Also, face recognition could be employed inintelligent PCs, for instance to automatically bring up a users individual desktopenvironment. Even more advanced, when employed in a gesture and facial expres-sion recognition environment, it could turn personal computers into personalizedcomputers able to interact with the user on a level higher than just mouse clicks andkey strokes. Such recognition would make possible intelligent man-machine interfacesand, in the future, intelligent interaction with robots.

The face recognition system is very useful in criminal identication. In this appli-cation, the images of a criminal can be stored in the face recognition system database.In recognition algorithms based on matching methods, image acquisition is one of theimportant tasks. The image of a person must be taken directly from a digital cam-era or from the video sequence such that it contains maximum possible informationabout that person. The images must be taken quickly with a small resolution or sizein order to speed the algorithms. If we take the high resolution images it takes muchtime to recognize the persons. Then the matching algorithms compare the acquired

image with the images in the database to identify the criminal. In real time facerecognition, the system must analyze the images and recognize the person very fast.The face recognition system only recognizes the person stored in the database.

As such, face recognition is an extensive and ever evolving research area, withwide range of algorithms for dimensionality reduction. All these motivated us tocarry out this project.

1.3 PROBLEM STATEMENT

While developing a face recognition system, the performance depends upon the giveninput data or image [ 6]. However, there are general problems faced in real timeface recognition; like illumination, head pose, facial expressions etc. In differentenvironmental conditions like lighting conditions, background variation images vary;we cannot extract the correct features and so the recognition rate is less. Similarly,in different face expressions like smiling, tearing etc., and in different poses, it is

2


14/70

difficult to recognize the person.As such, it requires an efficient method to represent the face image. Deriving

a discriminative and compact representation of a face pattern is of paramount im-portance for the success of any face recognition approach. Here is where various

dimensionality reduction methods comes to play. Overall, the problem is of choosingthe best method that gives efficient image representation and in turn good recogni-tion.

1.4 OBJECTIVES

Summing up, the overall objectives of the project are:

Study of different linear techniques PCA and LDA , non linear methods

(KPCA, KLDA) and Wavelet based methods(Gabor)

Implementation of PCA,LDA,KPCA,KLDA,Gabor based face recognition Sys-tem in Matlab.

Comparison of verication rate of different algorithms by varing parameterslike number of eigen faces,number of sher faces, number of images in trainingset and the distance metric used.

Development of wavelets based linear techniques(Gabor) which have very highverication rates and their comparison with the other techniques.

Computation of Receiver operating characteristics for different algorithms.

Implementation of Real-Time Face Recognition System (PCA based) usingOpenCV functions.

1.5 LITERATURE REVIEW

Worldwide progressive efforts are being made for the efficient storage dimension andretrieval of images using different techniques. Several techniques exist to tackle thecurse of dimensionality out of which some are linear methods and others are non-linear. PCA, LDA, LPP are some popular linear methods and non-linear methodsinclude ISOMAP and Eigenmaps. This technique makes it possible to use the facial

3


15/70

images of a person to authenticate him into a secure system, for criminal identi-cation, for passport verication. Face recognition approaches for still images canbe broadly categorized into holistic methods and feature based methods. Holisticmethods use the whole face region as the raw input to a recognition system. One

of the most widely used representations of the face region is eigen pictures whichare based on principal component analysis. In feature-based (structural) matchingmethods local features such as the eyes, nose, and mouth are rst extracted and theirlocations and local statistics (geometric and/or appearance) are fed into a structuralclassier. There is third kind of methods known as Hybrid methods . Just as thehuman perception system uses both local features and the whole face region to rec-ognize a face, a machine recognition system should use both. One can argue thatthese methods could potentially offer the best of the two types of methods.

The usage of Gabor lters for face recognition is presented in-depth by VitomirStruc, Rok Gajek and Nikola Paveisic in their paper, Principal Gabor Filters for FaceRecognition.

1.6 OUTLINE

Chapter 2: Why are we going for dimensionality reduction?

Chapter 3: Linear techniques for dimensionality reduction.

Chapter 4: Non-linear techniques for dimensionality reduction.

Chapter 5: Face recognition using Gabor Wavelets.

Chapter 6: Distance Measures , ROC and ORL database.

Chapter 7: Real Time Face Recognition System using OpenCV.

Chapter 8:Method of Implementation.

Chapter 9: Results and Observations.

Chapter 10: Conclusion.

4


16/70

CHAPTER 2

DIMENSIONALITY REDUCTION

2.1 CURSE OF DIMENSIONALITY

Putting it simply, dimensionality reduction is the process of reducing the number of random variables under consideration. There is an exponential growth with dimen-sionality to accurately estimate a function. This is called the curse of dimensionality.In practice, the curse of dimensionality means that for a given sample size, there is amaximum number of features above which the performance of a classier will degraderather than improve. In most cases, the information that was lost by discarding somefeatures is compensated by a more accurate mapping in lower-dimensional space.

2.2 A DEMONSTRATION

Consider three types of objects shown in gure 2.2 have to be classied based on thevalue of a single feature:

A simple procedure would be toa) Divide the feature space into uniform binsb) Compute the ratio of examples for each class at each bin and,c) For a new example, nd its bin and choose the predominant class in that binWe decide to start with one feature and divide the real line into 3 bins. Notice

that there exists a lot of overlap between classes , i.e. to improve discrimination, we

Figure 2.1: curse of dimensionality

5


17/70

Figure 2.2: 1 dimension

Figure 2.3: 2 dimensions

decide to incorporate a second feature.Moving to two dimensions increases the number of bins from 3 to 3 2=9. So, which

should we maintain constant?a)The density of examples per bin? This increases the number of examples from

9 to 27b)The total number of examples? This results in a 2D scatter plot that is very

sparse.Moving to three features . . .a)The number of bins grows to 3 3 =27b)To maintain the initial density of examples, the number of required examples

grows to 81c)For the same number of examples the 3D scatter plot is almost emptySo, there is an exponential growth with dimensionality in the number of examples

6


18/70

Figure 2.4: 3 dimensions

required to accurately estimate a function.

2.3 DIMENSIONALITY REDUCTION

Two approaches to perform dimensionality reduction from N to M dmensions(M isless than N)

a) Feature selection: choosing a subset of all the features.b) Feature extraction: creating new features by combining existing ones.In either case, the goal is to nd a low-dimensional representation of the data

that preserves (most of) the information or structure in the data.In the following chapters we would look into various linear and non linear tech-

niques.

7


19/70

CHAPTER 3

LINEAR DIMENSIONALITY REDUCTIONTECHNIQUES

3.1 INTRODUCTION

Several techniques exist to tackle the curse of dimensionality out of which someare linear methods and others are nonlinear. For reasons of computational andconceptual simplicity, the representation of low-dimension is often sought as a lineartransformation of the original data. That is, each component of the representation isa linear combination of the original variables. Such a representation seems to capturethe essential structure of the data in many applications, including feature extractionand signal separation.

Thus, to reduce the dimensionality and reducing redundancy without sacricingaccuracy, linear subspace techniques are useful. It takes only the important featuresfrom the data set. If we select the inadequate features the accuracy will be reduced.So we need to acquire full knowledge about similarities and differences in data. In ourproject, linear techniques are implemented and some statistical methods like meanand covariance are used to reduce the redundancy. In these methods, the averageimage of all the persons (also of each person) in the database is calculated. Eachimage has been translated to average face by subtracting average image from the

each face image.Some well-known linear transformation methods include:

Principal Component Analysis (PCA)

Linear Discriminant Analysis (LDA)

Independent Component Analysis (ICA)

3.2 STATISTICS INVOLVED

Statistical measurements like mean and covariance are used. Statistics are based onanalyzing the high amount of data in terms of relationship between the variablesin the data set. Each technique contains train database and test database. Thistrain database contains different images of the same person. Test database containsone image for one person. Each technique calculates the basis vector by using some

8


20/70

Figure 3.1: Linear subspace technique general algorithm

statistical properties. Take an image of size LxM , the pixel array can be representedas a point or vector in an LM dimensional space or image space. If we apply facerecognition techniques in this image space, it is computationally expensive becauseof matching algorithms. Further the number of parameters or features used whichare exponentially increasing (some time more than the number of images).

The basis vector is of high dimensional, so we need to reduce the dimensionality.After forming the basis vector, the feature vector is calculated by projecting the traindatabase images into the basis vector. Then the matching is done by using distancemeasures. In our project, two data sets are taken. The rst set is for training andthe second set is for test. The training set contains many folders depending upon theselection of the data base. Each folder contains different pose varying images of thesame person. Test set contains one image for one person. For this test images are indifferent poses from train set images. The basic working of the linear technique forface recognition is shown in gure 3.1

Many faces contain similar features because generally the face is smooth andregular texture. So many faces frontal view appearances are similar to eyes, noseand mouth etc. So we classify the similar and different features in all the features.First we take an image in the training set or from the image space. Convert thisimage into column vector. Doing so, this convert all images in the train set to columnvectors. For example our training set contains N images and size of each image isLxM.

We convert each image into column vector of size LM x1. After converting all

9


21/70

Figure 3.2: Matrix representation of N images present in the database

images into column vectors we append all columns. It forms a matrix called data

matrix X with size LMxN as shown in gure 3.2 . LM xN means a very highdimensionality. So we have to reduce this dimensionality by using linear subspacetechniques. After forming data matrix, we have to calculate most used statisticalmeasures (mean and covariance).

3.2.1 Mean image from the face database

The mean of the random vector x is calculated from:

m = E [x] (3.1)E [x] are the expected values of the argument xWhere x is the random sample corresponding column vector in the data matrix.

Here columns of the data matrix x(i) use the expression as shown:

m = 1N

N

i=1

xi (3.2)

Where N is the number of images in the training set. m represents the mean

vector from the data matrix. It also represents the mean face (when it is convertedfrom column vector to matrix) in the training set

3.2.2 Covariance matrix from the face database

Covariance measures the linear relationship between the two variables. So the di-mension of the covariance is two. If we have a data set with more than 2 dimensions,

10


22/70

then we have so many different covariance values. If we have n dimensional data setthen we have different covariance values. For example we have 3 dimensional dataset with dimensions x, y,z . We calculate the covariance of x, y and the covariance of y, z and the covariance of z , x. The covariance matrix C is a matrix containing each

entry as a covariance value. High covariance value indicates the high redundancyand low value indicates low redundancy.

The covariance matrix C of the random vector x is calculated using the equation3.3 or 3.4

C = E [(x m)(x m)T ] (3.3)

C = 1

N

N

i=1

(x

m)(x

m)T (3.4)

C = AAT (3.5)

Where A is a matrix formed using the technique given in 3.2.1 and is of orderLM x LM.If we calculate the covariance matrix by using given equation 3.5, it takeshigh memory because of the dimensions of C . The size of A is LM x N . The sizeof C is LM x LM which is very large. It is not practical to calculate C as shown inequation 3.5. Let us consider the matrix L instead of C

L = AT A (3.6)

The dimension of L is N xN which is much smaller than the dimensions of C .

3.3 PRINCIPAL COMPONENT ANALYSIS

PCA is a primary technique and is regarded as the theoretical foundation of manydimension reduction techniques. It seeks a linear projection that best ts a data set

in the least-square sense and has been widely used for feature extraction in patternclassication due to its computational and analytical simplicity [13]. It is also calledas Karhunen-Loeve transform (KLT) [12, 21]. It uses the covariance or correlationmatrix of given data, the basics of which were mentioned earlier in the statisticsinvolved.

11


23/70

3.3.1 Mathematical Analysis of PCA

let x be a m dimensional random vector and E [x ] = 0 but in practical data it maynot be zero.In that case mean is subtracted from each of the term and a zero meanrandom vector is obtained

q: m dimensional unit vectorConsider A be the projection of x on q

A = xT .q = q T .x

subject to

||q ||= 1in the projected space E [A] = q T E [x] = 0and

2 = E [A2] = E [ q T .x . q.xT ]

= q T E [x.x T ]q

E [x.x T ] is the correlation matrix and denoted by R

E [A2] = q T .R.q

Also R is symmetric i.e RT = Rnow E [A2] should be minimized with an proper choice or qlet us dene a function

(q ) = q T Rq = 2 (3.7)

but at the point of extremal(minimal) we have even if we slightly very q the above

function would not change much

(q + q ) = ( q ) (3.8)

(q + q ) = ( q + q )T R(q + q )

12


24/70

= q T Rq + 2 ( q )T Rq + ( q )T Rq

(q )T Rq is not of much signicant since q is very small we can write

(q + q ) = q T Rq + 2 ( q )T R q (3.9)

using equation 3.7 and 3.9 it can be inferred

(q )T Rq = 0 (3.10)

we have changed q to q + q but still q + q will be a unit vector

||q + q ||= 1

(|q + q )T (|q + q ) = 1

q T q + ( q )T q + q T (q ) + ( q T )(q ) = 1

again neglecting ( q T )(q ) and using the property that q T q = 1and ( q )T q =q T (q )

(q )T

q = 0 (3.11)

q and q are orthogonal so only change in direction of q is possiblefrom equation 3.10 and 3.11 we infer that there is a way to combine both the terms

so a scale term is introduced in equation 3.11 so as to match the dimensionalityand both the equations are combined as

(q )T Rq (q )T q = 0

(q ) [R q q ] = 0since q is not equal to zero we have

R q = q (3.12)

Given a matrix R some combinations of q and will satisfy the relation 7

13


25/70

:eigenvalue of matrix R there will be m solutions i.e 1, 2, ......... m andq 1, q 2,......... q m are the corresponding eigen vectors

There are m solutions to eq 3.12

R q j = q j j = 1, 2, 3,...m (3.13)

place the eigenvalues in decreasing order such that 1 2 3 ......................... m where 1 = max

now we dene a vector Q such that

Q = [q 1, q 2...........q m ]

in a compact form equation 3.13 can be written as

R Q = Q (3.14)

where = diag[1, 2, 3................. m ]the matrix Q is an orthogonal matrix satisfying

q i .q j =1 if i = j

0 otherwise

therefore we can write

QT Q = I

QQ 1 = I

which gives QT = Q 1 and using equation 3.14 we can write

QT RQ = (3.15)

in the expanded form

q T j R q k = j for k = j

0 otherwise

from equation 3.7 we have

14


26/70

(q j ) = j for j = 1, 2, 3...m (3.16)

pre multiply Q and post multiply QT to equation 3.10 we get

R = [1q 1q T 1 2q 2q T 2 .... m q m q T m ] (3.17)

eliminating low values of j means eliminating terms with low variancethere are m possible solutions for unit vector q let a j be the projection of x on q j i.e the projection along principle directions

a j = ( q T j x) = ( xT q j ) for j = 1, 2,...m (3.18)

this a j s are called the principle components

equation 3.18 represents the analysis i.e derivation of principle components fromthe input vector x

from these a j s we should be able to recover xwe dene a projection vector such that

a = [a1, a2, a3, ........a m ]T

a = [xT q 1, xT q 2, ......x T q m ]T

a = QT x

pre-multiply both sides Q we get

Q a = x

x = [a1q 1 a2q 2... am q m ] (3.19)

equation 3.19 represents the synthesis equationq j acts as basis vector for synthesis .Here we are forming basis vectors out of the

random process.For reducing the dimensionality of the input vector we need top take l (l < m )

largest eigenvalues of the correlation matrix and the reconstructed signal x is givenby:

x = [a1q 1 a2q 2... a lq l]

15


27/70

so at the encoder side we have

x1x2

x3..

xm

q T 1q T 2

q T 3

.

.q T l

a1a2

a3..a l

(3.20)

it can be seen that the dimension of input vector is reduced from R m to R l

total variance of m components of is given by x is given by

m

i=1

2i =m

i=1

i

total variance of l components of is given by x is given by

l

i=1

2i =l

i=1

i

total variance of m-l components of is given by x is given by

m

i= l+1

2i =m

i= l+1

i

3.3.2 Dimensionality Reduction

For reducing the dimensionality of the input vector we need top take l (l < m ) largesteigenvalues of the correlation matrix and the reconstructed signal x is given by:

x = [a1q 1 a2q 2... a lq l]

so at the encoder side we have

x1x2x3..

xm

q T 1q T 2q T 3..

q T l

a1a2a3..a l

(3.21)

16


28/70

Figure 3.3: PCA classication of given data (a) Worse classication of data (b) Thebest classication of data

it can be seen that the dimension of input vector is reduced from R m to R l

total variance of m components of is given by x is given by

m

i=1

2i =m

i=1

i

total variance of l components of is given by x is given by

l

i=1

2i =l

i=1

i

total variance of m-l components of is given by x is given by

m

i= l+1

2i =m

i= l+1

i

3.3.3 Calculation of eigenvalues and eigenvectors

Eigenvectors and eigenvalues give the important information regarding our data.Eigenvectors give the uncorrelated variables. These uncorrelated variables are calledprincipal components. The rst principal component describes the high amount of

variation [ 4].The eigenvalues of the covariance matrix describes the variance of the correspond-

ing principal component i.e. it describes that the rst principal component exhibitsthe highest amount of variation and the second principal component exhibits thesecond highest amount of variation and so on. Figure 3.3 shows the worse selectionof principal components and best selection of principal components.

17


29/70

3.3.4 Formation of a Feature Vector

After calculating eigenvectors of the covariance matrix, the dimensionality reductiontakes place. Here we do not consider all the eigenvectors as principal components.We arrange all eigenvalues in the descending order and we take rst few highesteigenvalues and corresponding eigenvectors. These eigenvectors e1, e1 and so on arethe principal components as shown in equation 3.22

W pca = [e1 e2 ....en ] (3.22)

We neglect or ignore the remaining less signicant eigenvalues and correspondingeigen- vectors. These neglected eigenvalues represent a very small information loss[16]. The principal component axis pass through the mean values. With these

principal components (eigenvectors) we form a matrix called feature vector (alsocalled eigen space)[21, 4].

3.3.5 Derivation of a new data set

To derive a new data set with reduced dimensionality, we take the transpose of thefeature vector matrix (now each row of the matrix represents the eigenvector) andwe project this matrix on to the original data set with subtracted mean. Then thedata set is formed with new representation called face space. Based on the covariance

matrix we try to minimize the mean square error between the feature vectors andtheir projections [2] . This transformation has few important properties. The rstproperty is that the mean of the new data set is zero and the covariance matrix is adiagonal matrix containing elements equal to eigenvalues of the covariance matrix of the original data set. The second property is that the covariance matrix of the originaldata set is real and symmetric (image contains only real values) so eigenvalues of thecovariance matrix are real. These corresponding eigenvectors form a orthonormalbasis. The orthonormal basis is shown in equation 3.22

W 1 = W T (3.23)

Using this property the original data set is reconstructed using the new data[4]. Here also we are not using all eigenvectors. We take only highest eigenvaluescorresponding eigenvectors.

18


30/70

Figure 3.4: Principal component analysis algorithm for face recognition

3.3.6 Eigen Faces

Eigenfaces are a set of eigenvectors used in the computer vision problem of human facerecognition. Eigenfaces assume ghostly appearance. They refer to an appearance-based approach to face recognition that seeks to capture the variation in a collectionof face images and use this information to encode and compare images of individualfaces in a holistic manner. Specically, the eigenfaces are the principal components

of a distribution of faces, or equivalently, the eigenvectors of the covari-ance matrixof the set of face images, where an image withMbyMconsidered as a point in M 2

dimensional Space.Eigenfaces are mostly used to:1) Extract the relevant facial information, which may or may not be directly

related to human intuition of face features such as the eyes, nose, and lips. One wayto do so is to capture the statistical variation between face images.

2) Represent face images efficiently. To reduce the computation and space com-

plexity, each face image can be represented using a small number of dimensions Theeigenfaces may be considered as a set of features which characterize the global vari-ation among face images. Then each face image is approximated using a subset of the eigenfaces, those associated with the largest eigenvalues. These features accountfor the most variance in the training set. In the language of information theory, we

19


31/70

want to extract the relevant information in face image, encode it as efficiently as pos-sible, and compare one face with a database of models encoded similarly. A simpleapproach to extracting the information contained in an image is to somehow cap-ture the variations in a collection of face images, independently encode and compare

individual face images [21].

20


32/70

Figure 3.5: Fishers linear discriminant algorithm for face recognition

3.4 LINEAR DISCRIMINANT ANALYSIS

The objective of LDA is to perform dimensionality reduction while preserving asmuch of the class discrim-inatory information as possible.It seeks to nd directionsalong which the classes are best separated.It does so by taking into considerationthe scatter within-classes but also the scatter between-classes.It is also more capableof distinguishing image variation due to identity from variation due to other sourcessuch as illumination and expression [ 17].

The classical Linear Discriminant Analysis (LDA) is also called as Fishers LinearDiscriminant (FLD). This method was developed by Robert Fisher in 1936. Inthis method, training and test sets are projected into the same subspace and thesimilarities between these data sets are identied. The Fishers linear discriminantalgorithm is explained in gure 3.5

LDA gives a better classication of data sets when compared to the principalcomponent analysis. Main differences between the principal component analysis andlinear discriminant analysis are

LDA uses class information. PCA is not paying any attention to class infor-mation.

PCA takes complete data set as one entity. PCA compresses information effi-ciently but not the discriminatory information.

In PCA the shape and location of the data set changes due to the translation

21


33/70

Figure 3.6: Flow chart of linear discriminant analysis algorithm

of the original space into a new space and LDA only tries to separate the classby drawing a decision region between the given classes (does not change thelocation of the data set)

3.4.1 Mathematical Analysis of LDA

The classical Linear Discriminant Analysis (LDA) is also called as Fishers LinearDiscriminant (FLD). This method was developed by Robert Fisher in 1936. Inthis method, training and test sets are projected into the same subspace and thesimilarities between these data sets are identied. The Fishers linear discriminantalgorithm is explained in gure 3.6

The objective of LDA is to perform dimensionality reduction while preserving asmuch of the class discriminatory information as possible.It seeks to nd directionsalong which the classes are best separated.It does so by taking into considerationthe scatter within-classes but also the scatter between-classes.It is also more capableof distinguishing image variation due to identity from variation due to other sourcessuch as illumination and expression.

22


34/70

suppose there are C classesLet ibe the mean of the vector of class i , i = 1, 2, 3, . . ,C Let M i be the samples within class i , i = 1, 2, 3,.. ,C Let

C i=1 M i be the total number of samples.

the within class matrix is given by:

S W =C

i=1

M i

j =1

(y j i)(y j i)T

Between-class scatter matrix:

S B =C

i=1

( i)( i)T

= 1/C C

i=1

i

LDA computes a transformation that maximizes the between-class scatter whileminimizing the within-class scatter.

If S W is nonsingular the optimal projection W lda is chosen such that which max-imizes the ratio of determinant of between class scatter matrix of the projectedsamples to determinant of within class scatter matrix of the projected samples

W lda = arg max |W T

S B W ||W T S W W | = [w1, w2, . . ,wm ]

where upper bound on m is c-1.Where wis are the generalized eigen vectors of S W and S B corresponding to set of decreasing generalized eigen values.

S B wi = iS W wi

But In practice, S W is often singular since the data are image vectors with largedimensionality while the size of the data set is much smaller (M N )

To alleviate this problem [ 15, 11], we can perform two projections:1. PCA is rst applied to the data set to reduce its dimensionality.

2. LDA is then applied to further reduce the dimensionality of C - 1.

23


35/70

In the nal step of calculation the optimal transform matrix is given by

W opt = W lda .W pca

The PCA reduces the dimension of the feature space rst and then by apply theLDA on this the feature space dimensionality is further reduced. This technique iscalled subspace LDA.

3.5 INDEPENDENT COMPONENT ANALYSIS (ICA)

Independent component analysis (ICA) is a statistical method, the goal of which is todecompose multivariate data into a linear sum of non-orthogonal basis vectors withcoefficients (encoding variables, latent variables, hidden variables) being statistically

independent.The independent components are latent variables, meaning that theycannot be directly observed.Also the mixing matrix is assumed to be unknown.Theconcept of ICA may be seen as an extension of Principal Component Analysis, whichonly imposes independence up to second order and consequently denes directionsthat are orthogonal. Applications of ICA include data compression, detection andlocalization of sources, or blind identication and deconvolution [ 7].

3.5.1 Mathematical analysis of ICA

Let us denote the random observed vector X= [X1,X2,. . . ..,Xm] whose m elementsare mixtures of m independent elements of a random vector S=[S1,S2,. . . ...,Sm]given by :

X = AS where A represents an m m mixing matrix. The goal of ICA is to nd the unmix-

ing matrix W (i.e. the inverse of A) that will give Y, the best possible approximationof S:

Y = W X

1) MINIMIZATION OF MUTUAL INFORMATIONThe conditional entropy of Y given X measures the average uncertainty remaining

about y when x is known, and is:H (Y |X ) = P (x, y)log2P (y|x)The mutual information between Y and X is:I (Y, X ) = H (Y ) + H (X ) H (Y |X ) = H (Y ) H (Y |X )

24


36/70

Figure 3.7: ICA algorithm

Entropy can be seen as a measure of uncertainty. By having an algorithm thatseeks to minimize mutual information, we are searching for components that are max-imally independent. Maximizing the joint entropy consists of maximizing individualentropies while minimizing mutual information.

2) Maximum Likelihood EstimationIt is possible to formulate directly the likelihood in the noise-free ICA model and

then estimate the model by a maximum likelihood method:L = logf i(wT i x(t) + Tlog|detW |,where the f i are the density functions of the s i (here assumed to be known), and

the x(t),t = 1, ..., T are the realizations of x.3) Infomax AlgorithmAlgorithm implied in Infomax to compute the unmixing matrix W1. Initialize W(0) (e.g. random)2.W (t + 1) = W (t) + ( I f (Y )Y T )W (t), where t represents a given approxi-

mate step and f(Y) is a nonlinear function usually chosen according to the type of distribution(generally exponential)[ 3, 7].

3.5.2 ICA Algorithm

The ICA algorithm is shown in gure 3.7We obtain the overall transformation matrix using

25


37/70

W ica = W pca W k (3.24)

The size of W ica is nxN where n is the number of pixels in the image and N isthe total ica number of images in the data set. The test image set is taken and theyare mean centered by subtracting the mean image m of the data matrix X and thenproject onto the new subspace W ica

Y = W T ica X (3.25)

We compare the projections of test images and training images and nd the bestmatch using a appropriate classier.

26


38/70

CHAPTER 4

NON-LINEAR DIMENSIONALITY REDUCTION TECHNIQUES

Like linear subspace techniques, this is also one of the appearance based technique. Aspatially sampled image representation can be fully nonlinear. The nonlinear natureis differentiated by inner nonlinearity in the data or nonlinearity due to the choiceof parameters . We must know that mathematics related to nonlinear techniques arenot applicable to all types of data. We have many nonlinear techniques for principalmanifolds, one such method is the nonlinear PCA.

4.1 KERNEL METHODS FOR FACE RECOGNITION:

So far, we have seen that PCA and LDA methods for face recognition have demon-strated their success. However, in both of them, the representation is based onsecond order statistics of the image set, without considering higher order statisticaldependencies among three or more pixels. Now, we move to the Kernel methods:Kernel PCA and Kernel Fisher Analysis, which provide higher order correlations [ 25].Presently, we give the detailed algorithm of these methods and implement them.

4.1.1 Kernel Principal Component Analysis:

A nonlinear form of Principal Component Analysis, which efficiently computes prin-cipal components in higher dimensional feature spaces via a nonlinear mapping.

In some high dimensional feature space F (bottom right), we are performing linearPCA just as a PCA in input space (top). Since F is nonlinearly related to input spacevia the contour lines of constant projections onto the principal Eigenvector drawn asan arrow become nonlinear in input space [ 9].

4.1.1.1 Algorithm:

1. Considering a set of M centered observations x k , k varying from 1 to M, we rstcompute the dot product matrix,

K ij = ( k(xi , x j )) ij

2. Next, we solve by diagonalizing K:

27


39/70

Figure 4.8: Explanation of KPCA

k(k . k) = 1

3. compute projections onto the Eigenvectors:

(kP C )n = ( V n .(x)) = ni k(xi , x)

4.1.1.2 Dimensionality Reduction and Feature Extraction:

Here in KPCA, we can exceed the input dimensionality. Suppose that the numberof observations M exceeds the input dimensionality N. Linear PCA even when it is

based on the MxM dot product matrix can nd at most N nonzero Eigenvalues. Incontrast kernel PCA can nd up to M nonzero Eigenvalues [24].

4.1.2 Kernel Fisher Analysis:

Kernel Fisher Analysis (KFA), also known as generalized discriminant analysis andkernel discriminant analysis, is a kernelized version of linear discriminant analysis

28


40/70

Figure 4.9: KDA

[24]. Using the kernel trick, LDA is implicitly performed in a new feature space,

which allows non-linear mappings to be learned. KPCA and GDA are based on theexactly same optimization criteria to their linear counterparts, PCA and LDA.

The algorithm generalizes the strengths of D-LDA and the kernel techniques.Following the SVM paradigm, the algorithm rst nonlinearly maps the original inputspace to an implicit high-dimensional feature space, where the distribution of facepatterns is hoped to be linearized and simplied. Then, a new variant of the D-LDA method is introduced to effectively solve the SSS problem and derive a set of optimal discriminant basis vectors in the feature space. The kernel machines provide

an elegant way of designing nonlinear algorithms by reducing them to linear ones insome high-dimensional feature space nonlinearly related to the input sample space.

4.1.2.1 Algorithm:

1. Calculate kernel matrix K, as above:K ij = ( k(xi , x j )) ij2. Next, the objective is to nd a transformation , based on optimization of

certain separability criteria, which produces a mapping, yi = (xi), that leads to an

enhanced separability of different face objects.3. Calculate T b b ::T b b =

1L B. (A

T LC .K.A LC 1L (AT LC .K. 1LC ) 1L (1T LC .K.A LC ) 1L 2 (1T LC .K. 1LC )).B

where , B= diag [ c1... cc] is a matrix with terms all equal to: one, ALC =diag [ac1 ...a cc ] is a LXC block diagonal matrix, and aci is a C ix1 vector with all termsequal to: ( 1C i ).

29


41/70

4. Find E m and bfrom T b b5. Calculate U T S W T H U using the equation:U T S W T H U = ( E m

1/ 2b )T (T b S W T H b)(E m b

1/ 2)6. Calculate

y = . ((z ))where = (1 / L)(E m 1/ 2b .P. 1/ 2w )T (B(AT LC - 1(1/L )1T LC )) is a M X L matrix7. For input pattern x, calculate its kernel matrix ((z ))8. The optimal discriminant feature representation of x can be obtained by y =

((z ))

30


42/70

CHAPTER 5

FACE RECOGNITION USING GABOR WAVELETS

5.1 WAVELETS

Wavelets are mathematical functions that cut up data into different frequency com-ponents, and then study each component with a resolution matched to its scale.Over traditional Fourier methods, they have advantages in analyzing physical situa-tions when the signal contains discontinuities and sharp spikes [18]. Wavelets weredeveloped independently in the elds of mathematics, quantum physics, electricalengineering etc. Interchanges between these elds during the last ten years have ledto many new wavelets. We will use Gabor wavelets to implement our face recognition

system.

5.2 GABOR FILTERS

From a face image at several scales, Gabor lters are capable of deriving multi-orientational information , with the derived information being of local nature. Theamount of data, in the Gabor face representation, is commonly reduced to a moremanageable size by exploiting various downsampling, feature selection and subspaceprojection techniques before it is nally fed to a classier. Gabor lters are amongthe most popular tools for facial feature extraction. Their use in automatic facerecognition system is motivated by two major factors: their computational propertiesand their biological relevance. 2D Gabor lter in the spatial domain is dened bythe following expression:

u,v (x, y) = f 2u

e ( f

2

2x 2 + f

2

2y 2 )e j 2f u x ,

where, x = xcosv + ysin v and y = xsin v + ycosv, f u = f max / 2u2 , v = v/ 8

Hence, in a way, Gabor lters represent Gaussian kernel functions modulated bya complex plane wave whose center frequency and orientation are dened by f u andv, respectively, with the parameters and determining the ratio between the centerfrequency and the size of the Gaussian envelope [14].

31


43/70

5.2.1 Extracting features with Gabor lters

Considering a grey-scale face image dened on a grid of size a b denoted by I(x,y)and let u,v (x, y), represent a Gabor lter determined by the parameters f u and v.The ltering operation with the Gabor lter can then be written as follows:

Gu,v (x, y) = I (x, y)u,v (x, y), whereGu,v (x, y) denotes the complex convolutionresult which can be decomposed into a real and an imaginary part:

E u,v (x, y) = Re[Gu,v (x, y)]Ou,v (x, y) = Im[Gu,v (x, y)]Both the phase u,v (x, y) as well as the magnitude Au,v (x, y) lter responses can

be computed, based on the decomposed ltering result, as:Au,v (x, y) = E 2u,v (x, y) + O2u,v (x, y)u,v (x, y) = arctan (Ou,v (x, y)/E u,v (x, y)).Since the computed phase responses vary signicantly even for spatial locations

only a few pixels apart, Gabor phase features are considered unstable and are usuallydiscarded. The magnitude responses, on the other hand, vary slowly with the spatialposition, and are thus the preferred choice when deriving Gabor lter based features[26].

In our experiment, we use a simple rectangular sampling grid with 256 nodes forthe initial dimensionality reduction and a)PCA b)LDA for the subspace projectionof the feature vector built by concatenating the downsampled magnitude responses.

To derive the Gabor face representation from a given face image I(x,y), the Gabormagnitude responses for the entire lter bank of the 40 Gabor lters are commonlycomputed rst. However, since each of the responses is of the same dimensionalityas the input image, this procedure results in an ination of the original pixel spaceto 40 times its initial size. To cope with this problem, the magnitude responsesare typically downsampled using either a simple rectangular sampling grid or somekind of feature selection scheme. Nevertheless, even after the downsampling, anyface representation constructed, for example, by a concatenation of the downsampled

magnitude responses, still resides in a high dimensional space. Hence, use a subspaceprojection technique, such as principal component analysis or linear discriminantanalysis, to further reduce the datas dimensionality.

32


44/70

CHAPTER 6

DISTANCE MEASURES , ROC AND ORL DATABASE

6.1 DISTANCE MEASURES FOR FACE RECOGNITION

To nd the best matched image, we have several distance measuring techniques. Theyare euclidean distance, mahalanobis distance, city block, cosine and chessboard. Thefeature vector formed from the extracted features of the training database. Then wecalculate the feature vector from the test image. The test image feature vector ismatched with each of the training set feature vectors using the distance measure.

The training set feature vector with least distance gives the best match imagewith the test image. Let us take a training set of N images then calculate feature

vector Y with these images images i.e. Y has N number of (Kx 1) column vectorsas y1,y2,y3,...,yn . The feature vector of test image is ytst . Calculate the distance dbetween yi and ytst by using various distance measures.

6.1.1 Euclidean distance

The euclidean distance is commonly used distance measure in many applications.This distance gives the shortest distance between the two images or vectors it issame as the pythagoras equation in 2 dimensions . It is sum of squared distance of two feature vectors ( ytst ,yi)

d2 = ( y tst y i )T (y tst y i ) (6.1)The euclidean distance is sensitive to both adding and multiplying the vector

with some factor or value.

6.1.2 Mahalanobis distance

Mahalanobis distance comes from the gaussian multivariate probability density func-tion as

p(x) = (2 ) d/ 2|C | 1/ 2exp(1/ 2(x m)T C

1(x m)) (6.2)Where (x m)T C

1(x m) is called the called squared mahalanobis distance,which is very important in characterizing the distribution. Where C is the estimated

33


45/70

covariance matrix of y whose observations are yis. The mahalanobis distance of twofeature vectors ytst and yi is given by following equation

d2 = ( y tst y i)T C 1 (y tst y i ) (6.3)

6.1.3 Cosine Similarity

Cosine similarity is a measure of similarity between two vectors of an inner prod-uct space that measures the cosine of the angle between them.Cosine similarity isparticularly used in positive space, where the outcome is neatly bounded in [0,1].

Given two vectors of attributes, A and B, the cosine similarity, cos() is repre-sented using a dot product and magnitude as

d = cos() = ytest .y||y test ||||y|||

(6.4)

The resulting similarity ranges from 1 meaning exactly opposite, to 1 meaningexactly the same, with 0 usually indicating independence, and in-between valuesindicating intermediate similarity or dissimilarity.

6.1.4 City Block distance

Also known as the Manhattan distance. This metric assumes that in going from one

pixel to the other it is only possible to travel directly along pixel grid lines. Diagonalmoves are not allowed. Therefore the city block distance is given by

d1(ytst , y) = ||y tst y i ||=n

i=1|ytst,i yi| (6.5)

City block distance depends on the rotation of the coordinate system, but doesnot depend on its reection about a coordinate axis or its translation.

6.1.5 Chessboard Distance

This metric assumes that you can make moves on the pixel grid as if you were a Kingmaking moves in chess, i.e. a diagonal move counts the same as a horizontal move.This means that the metric is given by:

dchess = max {|y tst , y|} (6.6)

34


46/70

The last two metrics are usually much faster to compute than the Euclideanmetric and so are sometimes used where speed is critical but accuracy is not tooimportant.

6.2 ROC

Face recognition can be classied into face identication and face verication. In faceidentication, we stored all test image identications. The system identify the testimage by comparing the identication of images in the database. It recognizes thecorrelated image.

In face verication, we store some template images in the database. We give thetest image which is not stored in the database. Then compare the test image featurevector with the feature vectors of images present in the database.

Receiver Operating Characteristic curve (ROC) is plotted with the vericationrate (persons are recognized correctly) and the false acceptance rate (number of timeswrong persons are recognized) by varying the threshold value. We should balancethese two rates based on the applications.

In verication ROC, camera takes the image of the person face and that face maybe in data base or not stored in the data base. Some individuals are claimed thefeature vector of the person.

Some linear subspace technique are used to calculate the feature vectors of storedimages and feature vector of acquired image or claimed feature vector . This claimedfeature vector is compared with the stored feature vectors by using distance measure.If the distance is lower than the threshold then the system take decision that thatacquired image is the image of the claimed person . If the distance is greater thanthe threshold, then the acquired person image is not claimed person image.

In the verication method one error is, we may verify the wrong person as thecorrect person. Because sometimes individuals makes the wrong claim to their iden-tity. Then the distance measure is lower than threshold then individual is claimed

that person even though he is not really claimed person. The number of times thisoccurs over all individuals is called False Acceptance Rate.

The second type of error is individual make proper claim but the distance ishigher than the threshold then system thinks that the individual is not claimedperson even through he is claimed person. The number of times this error occurs iscalled false Reject rate. Subtracting the false reject rate from 1 gives the Probability

35


47/70

Figure 6.1: Typical Receiver Operating Characteristic Curve

of Verication.A typical ROC curve is shown below.

6.3 ORL DATABASE

ORL face database contains a set of face images taken by the AT&T Laboratories.AT&T Laboratories was found in 1986 at Cambridge as the Olivetti Research Labo-ratories fa- mously known as ORL . The Laboratory took the face images in between

April 1992 and April 1994. They have taken these face images for Face recogni-tion pro ject in collaboration with the Speech, Vision and Robotics Group of theCambridge university Engineering Department.

The ORL database contains images of 40 persons or subjects. 10 images pereach person with di erent face expressions (open/closed eyes or smiling/not smiling),varying light conditions, facial details (glass/without glasses or slight rotation of face). Some images are taken at di erent times. All images are taken in frontalposition against a dark homogeneous background. There are also small variations in

the background gray level. Figure 6.2 shows the sample data set of one person indifferent facial expressions.The size of each image in the database is 92x112 . These images are gray scale

images. Each image contains pixel values from 0 to 255 that is 256 gray level valuesper pixel

36


48/70

Figure 6.2: Sample images of a single person from ORL database

6.4 SUMMARY

Different distance measure techniques are dened and compared. In all these distancemeasures, the best performed distance measure is used to calculate the matchingbetween stored training images and given test images.

37


49/70

CHAPTER 7

REAL TIME FACE RECOGNITION SYSTEM USING OPENCV

Face Recognition generally involves two stages:

Face Detection, where a photo is searched to nd any face in it

Face Recognition, where that detected and processed face is compared to adatabase of known faces, to decide who that person is.

7.1 FACE DETECTION

As mentioned above, the rst stage in Face Recognition is Face Detection. The

OpenCV library makes it fairly easy to detect a frontal face in an image using itsHaar Cascade Face Detector (also known as the Viola-Jones method).

The function cvHaarDetectObjects in OpenCV performs the actual face detection[10]:

For frontal face detection, We can chose one of these Haar Cascade Classiersthat come with OpenCV (in the data \haarcascades \ folder):

haarcascade frontalface default.xml

haarcascade frontalface alt.xml

haarcascade frontalface alt2.xml

haarcascade frontalface alt tree.xml

7.2 PREPROCESSING FACIAL IMAGES

It is extremely important to apply various image pre-processing techniques to stan-dardize the images that we supply to a face recognition system. Most face recognitionalgorithms are extremely sensitive to lighting conditions, so that if it was trained torecognize a person when they are in a dark room, it probably wont recognize themin a bright room, etc. This problem is referred to as lumination dependent, andthere are also many other issues, such as the face should also be in a very consistentposition within the images (such as the eyes being in the same pixel coordinates),

38


50/70

consistent size, rotation angle, hair and makeup, emotion (smiling, angry, etc), posi-tion of lights (to the left or above, etc). This is why it is so important to use a goodimage preprocessing lters before applying face recognition.

Histogram Equalization [1] is as a very simple method of automatically standard-

izing the brightness and contrast of facial images

7.3 FACE RECOGNITION

Now that you have a pre-processed facial image, you can perform Eigenfaces (PCA)for Face Recognition. OpenCV comes with the function cvEigenDecomposite(),which performs the PCA operation, however we need a database (training set) of images for it to know how to recognize each of the people.

So we should collect a group of preprocessed facial images of each person wewant to recognize. For example, if we want to recognize someone from a class of 10students, then you could store 20 photos of each person, for a total of 200 preprocessedfacial images of the same size (say 100x100 pixels).

The Eigen faces methord has been throughly explained in Chapter 2It is very easy to use a webcam stream as input to the face recognition system

instead of a le list. Basically you just need to grab frames from a camera insteadof from a le, and you run forever until the user wants to quit, instead of just run-ning until the le list has run out. OpenCV provides the cvCreateCameraCapture()function (also known as cvCaptureFromCAM()) for this.

Grabbing frames from a webcam can be implemented easily using this function[10]:

Once the face has been captured the feature vector of this test image is comparedwith the features of tranning set using nearest neighbour classier the image withleast euclidian distance metric is declared as the match.

39


51/70

CHAPTER 8

METHOD OF IMPLEMENTATION

8.1 PCA AND LDA

The whole recognition process involves two steps:

Initialization Process

Recognition Process

In the training phase we have to extract feature vector from each of the image in thetraining stage. Let Abe the image Training image of person A which has a pixelresolution of M N . In order to extract the PCA features of A we have to convertthe image into pixel vector by concatenation of all rows of A into a single vector[3].

Let that pixel vector be denoted by A ,but the length of this vector is M Nwhich is very large and we use the PCA algorithm to reduce its dimensionality. wedenote the new reduced vector by A which has a dimensionality d M N .Foreach training image i feature vectors i should be calculated and stored.In the recognition phase (or, testing phase), a test image j will be given.Calculatethe feature vector j of this image using PCA.In order to identify the j the similarity

between all feature vectors stored in the training set and j is found outThis is done by measuring the Euclidean distance of j with all the feature vector

in the set.That feature vector is selected whose Euclidean distance with j is foundto be minimum and the corresponding image is said to be matched with j .

For increasing the class separability, we project the feature vectors into a newsubspace called Fisher Space, to make use of class information. This method iscalled LDA.

The following Figure shows the whole procedure in a compact form:

8.2 ICA

We performed ICA on the image set under two architectures. Architecture I treatedthe images as random variables and the pixels as outcomes, whereas Architecture IItreated the pixels as random variables and the images as outcomes, as in Figure 2and Figure 3:

40


52/70

Figure 8.1: Schematic of the Face Recognizer

Figure 8.2: Blind Source Separation model

8.3 KPCA

We follow the same implementation strategy as we followed in PCA, only that inKPCA, weve to transform the non linear input vector space, to a linear space sothat we can apply PCA on that.Polynomial and gaussian kernels were used for trans-formation.

8.4 GABOR PCA

We propose a method that uses Gabor lter response as an input of PCA insteadof raw face image to overcome the shortcomings Using the Gabor lter responsesas input vector, the sensitive reaction due to the rotation and illumination can bereduced if we use M gallery images, construct (N 40) by M matrix could be

41


53/70

Figure 8.3: Finding statistically independent basis images

constructed and the eigenvalues and eigenvectors can be calculated form the ensemblematrix AAT, where the matrix A = N 40 by M matrix. From the eigenvalues,we can select the effective Gabor lter responses and construct the eigen space with

the appropriate number of eigenvectors. Then the training sets of face image areprojected in the eigen space and the testing set of face images are also projected.

8.5 GABOR LDA

First, discriminant vectors are computed using LDA, from the given training images. The function of the discriminant vectors is two-fold. First, discriminant vectors areused as a transform matrix, and LDA features are extracted by projecting gray-level

images onto discriminant vectors. Second, discriminant vectors are used to selectdiscriminant pixels, the number of which is much less than that of a whole image.Gabor features are extracted only on thesediscriminant pixels. Then, applying LDAon the Gabor features, one can obtain reduced Gabor-LDA features. Finally, acombined classier is formed based on these two types of LDA features.

42


54/70

CHAPTER 9

RESULTS AND OBSERVATIONS

The algorithms were implemented in MATLAB.The performance of different tech-niques were evaluated against ORL database.The images in the database were dividedinto Testing and Training set .The rst 4 images of each person was used for trainingand rest were for testing. Nearest Neignbour classier was with different distancemetrics and Recognition rates and ROC curve was plotted for each of them.

9.1 PCA

The ORL Database of Faces was used to test the face recognition algorithm.There

are ten different images of each of 40 distinct subjects. For some subjects, the imageswere taken at different times, varying the lighting, facial expressions (open / closedeyes, smiling / not smiling) and facial details.

The mean face of all 400 images is shown Figure 9.1.1)using PCAThe eigenfaces that were generated is shown Figure 9.2 :an image was selected from the database the face recognition algorithm was ap-

plied to it to recognize it from the database

the following results were obtained:The number above each images represents the euclidean distance of the test image

and the image from the database.The rst image has zero euclidean distance becauseit is the same as the test image and other eight are similar to the rst image.

For calculating ROC we calculate False Accept Rate which is the probabilitythat system incorrectly matches with images stored with input image database and

Figure 9.1: Mean Face

43


55/70

Figure 9.2: Eigenfaces

Figure 9.3: Identifying similar faces

False Rejection Rate which is the the ratio of number of correct persons rejectedin the database to the total number of persons in database.A function is used fromPhD toolbox [19] to generate ROC curve data from genuine (client) and impostor

matching scores. The function takes either two or three input arguments with therst being the a vector of genuine matching scores (i.e., the client scores), the secondbeing a vector of impostor matching scores, and the third being the number of points(i.e., the resolution) at which to compute the ROC cruve data

The ROC Curve is given in g 7:The Receiver operating characteristic for PCA for different distances metrics is

shown in Fig 9.5. We see that PCA gave better verication rates with Mahalanobisdistance.

The graph between Recognition rate and number of eigen vectors retained isshown in Fig 9.6.

Eigen vectors give the uncorrelated information or variation present in the image.The highest eigen values contain more information than the lowest ones. In the di-mensionality reduction techniques we discard some eigen vectors related information.Due to this information loss hence the recognition rate is reduced

44


56/70

Figure 9.4: Performance of PCA based Face Recognition with ORL Face Database

The graph between Recognition rate and number of training images uses is shownin Fig9.7 .The recognition rate increases up to a certian extent when the number if

training images used is increased.Although the face recognition results were acceptable ,the system only using eigen

faces might not be applicable to a real time system.It needs to be more robust andhave more discriminnant features.

9.2 LDA

The rst 4 sher faces obtained are shown in gure 9.8

the ROC curve is shown in Fig 9.9The Receiver operating characteristic for LDA has been shown in Fig 9.10. Wesee that LDA gives better performance than PCA because it uses the additional classinformation.Again better verication rates were obtained using Mahalanobis distancemetric.

The graph between Recognition rate and number of training images uses is shown

45


57/70

Figure 9.5: ROC for PCA

in Fig 9.11

9.3 ICA

list of gures:

source imagesAligned faces-features extractedMixed ImagesIndependent Components (Estimated Sources).

9.4 KERNEL PCA AND KERNEL LDA

The ROC for KLDA is show in g 9.16

Gaussian Kernel of the form k(x, y) = exp(|x y|2

/2

) was used to extract thenon linear features from the face. The value of variance was chosen to be 0.05 anddifferent ROCs were plotted for different distance metrics and it was found that theuse of Mahalanobis distance metric gave the best result.

The ROC for KLDA is show in g 9.17

46


58/70

Figure 9.6: Recognition rate vs number of eigen faces for PCA

9.5 GABOR FILTER

Face images after Gabor ltering with no downsampling is shown in gure 9.18A downsampling factor of 64 results in the gure 9.19.

9.6 GABOR PCA AND GABOR LDA

It is difficult to use these linear techniques to classify nonlinear features in the data.For nonlinear features, wavelets are used. This work considers the Gabor waveletsmethod to classify the nonlinear features in data. Gabor Wavelet Transform is ap-plied before the linear subspace technique for the nonlinear feature extraction

The ROC for Gabor is shown in g 9.20The ROC for Gabor LDA is shown in g 9.21Thus, from the ROC diagram, we see that the accuracy level of Gabor wavelet

based face recognition system is much higher than the other linear and non lineartechniques studied and implemented.

9.7 REAL TIME FACE RECOGNITION SYSTEM

The C++ program was compiled using gcc and the face detection and recognitionwas done on the image captured from the web cam as shown in gure 9.22

47


59/70

Figure 9.7: Recognition rate vs number of training images for PCA

Figure 9.8: Fisher faces

Figure 9.9: Performance Of LDA Based Face Recognition With Orl Face Database

48


60/70

Figure 9.10: ROC for LDA

Figure 9.11: Recognition rate vs number of training images for PCA

49


61/70

Figure 9.12: Source images

Figure 9.13: Aligned faces-features extracted

Figure 9.14: Mixed images

50


62/70

Figure 9.15: Independent Components (Estimated Sources)

Figure 9.16: ROC for KPCA

51


63/70

Figure 9.17: ROC for KLDA

Figure 9.18: Magnitude response with no downsampling

52


64/70

Figure 9.19: Magnitude response with downsampling factor 64

Figure 9.20: ROC for Gabor PCA

53


65/70

Figure 9.21: ROC for Gabor LDA

Figure 9.22: Opencv implementation

54


66/70

CHAPTER 10

CONCLUSION

In this project, we look into various linear and non-linear techniques for dimension-ality reduction with special emphasis on face recognition. The ROC Characteristicsof the various methods are found out and plotted. Lets summarize the results:

Amongst the linear techniques, LDA is better for large databases. However, fora task with very high-dimensional data, the traditional LDA algorithm encoun-ters several difficulties.Hence we implemented it after reducing the dimensionsby PCA. LDA is primarily used to reduce the number of features to a moremanageable number before classication.

Besides the ROC, during the analysis on PCA and LDA, as the number of Eigen faces retained is increased, the recognition is increased. However, thefeature vector size also increases, and so, there is a sort of trade-off betweendimensionality reduction and recognition rate.

ICA generalizesPCA and, like PCA, has proven a useful tool for reductionin data dimensionality. It extracts features from natural scenes. It can beadopted for image change detection. In addition ICA is sensitiveto lines and

edges of varying thickness of images. Also, the ICAcoefficients leads to efficientreduction of Gaussian noise.

We also saw that, in many practical cases linear methods are not suitable. LDAand PCA can be extended for use in non-linear classication via the kernelmethod. Here, the original observations are effectively mapped into a higherdimensional non-linear space. Linear classication in this non-linear space isthen equivalent to non-linear classication in the original space.

Analyzing the kernel methods of dimensionality reduction, we nd that Gaus-sian kernel PCA/ LDA succeeded in revealing morecomplicated structures of data than theirlinearcounterparts and achieve much lower classication errorrates as is evident from the ROCs.

Analysis was also done based on various distance measures. In all the cases,methods implemented with Mahalanobis distance gives better results.

55


67/70

We use wavelet (Gabor) to extract the nonlinear features in images and sub-sequently, apply PCA and LDA. The ROC curve shows considerable improve-ment in recognition rates. So, of all the methods we use, Wavelet based facerecognition techniques give the best results.

Finally, we implement a real time face recognition system where the detectedfaces are recognized with a database of known faces.

56


68/70

BIBLIOGRAPHY

[1] Sapana Shrikrishna Bagade and Vijaya K Shandilya. Use of histogram equal-ization in image processing for image enhancement. International Journal of

Software Engineering Research and Practices , 1(2):610, 2011.

[2] Robert J Baron. Mechanisms of human facial recognition. International Journal of Man-Machine Studies , 15(2):137178, 1981.

[3] Marian Stewart Bartlett, Javier R Movellan, and Terrence J Sejnowski. Facerecognition by independent component analysis. Neural Networks, IEEE Trans-actions on , 13(6):14501464, 2002.

[4] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman. Eigenfaces vs. sherfaces:recognition using class specic linear projection. Pattern Analysis and Machine Intelligence, IEEE Transactions on , 19(7):711 720, jul 1997.

[5] Anthony J Bell and Terrence J Sejnowski. The independent components of natural scenes are edge lters. Vision research , 37(23):3327, 1997.

[6] Kevin Bowyer and P Jonathon Phillips. Empirical evaluation techniques in computer vision . IEEE Computer Society Press, 1998.

[7] Pierre Comon. Independent component analysis. Higher-Order Statistics , pages2938, 1992.

[8] Bruce A Draper, Kyungim Baek, Marian Stewart Bartlett, and J Ross Beveridge.Recognizing faces with pca and ica. Computer vision and image understanding ,91(1):115137, 2003.

[9] H.M. Ebied. Feature extraction using pca and kernel-pca for face recognition.In Informatics and Systems (INFOS), 2012 8th International Conference on ,

pages MM72 MM77, may 2012.

[10] Shervin Emami. Introduction to face detection and face recognition. http://www.shervinemami.info/faceRecognition.html . Accessed January 12, 2013.

[11] Kamran Etemad and Rama Chellappa. Discriminant analysis for recognition of human face images. JOSA A, 14(8):17241733, 1997.

57
http://www.shervinemami.info/faceRecognition.htmlhttp://www.shervinemami.info/faceRecognition.htmlhttp://www.shervinemami.info/faceRecognition.htmlhttp://

Prabhat Report11

Documents

Transcript of Prabhat Report11