Pose and Illumination Insensitive Face Recognition ...homepages.cae.wisc.edu › ~ece738 › projs03...

ECE 738 Course Project

Face Recognition: A Literature Survey

Lin,Wei-Yang

9022090964

May 16, 2003

1

Face Recognition: A Literature Survey

1 Introduction

Face Recognition Technology (FRT) is a research area spanning several disciplines such

as image processing, pattern recognition, computer vision and neural network. There are

many applications of FRT as shown in Table 1. These applications range from matching

of photographs to real time matching of surveillance video. Depending on the specific

application, FRT has different level of difficulty and requires wide range of techniques.

In 1995, a review paper by Chellappa et al. [1] gives a through survey of FRT at that

time. During the past few years, FRT is still under rapid evolution.

In this report, I will provide a brief review of most recent developments in FRT.

Table 1: Applications of face recognition technology [1].

2 The Challenges in FRT

Though many FRT have been proposed, robust face recognition is still difficult. The

recent FERET test [5] has revealed that there are at least two major challenges:

The illumination variation problem

2

The pose variation problem

Either one or both problems can cause serious performance degradation in most of

existing systems. Unfortunately, these problems happen in many real world applications,

such as surveillance video. In the following, I will discuss some existing solutions for

these problems.

The general face recognition problem can be formulated as follows: Given single image

or sequence of images, recognize the person in image using a database. Solving the

problem consists of following steps: 1) face detection, 2) face normalization, 3) inquire

database.

2.1 The illumination problem

Images of the same face appear differently due to the change in lighting. If the change

induced by illumination is larger than the difference between individuals, systems would

not be able to recognize the input image. To handle the illumination problem, researchers

have proposed various methods. It has been suggested that one can reduce variation by

discarding the most important eigenface. And it is verified in [18] that discarding the first

few eigenfaces seems to work reasonably well. However, it causes the system

performance degradation for input images taken under frontal illumination.

In [19], different image representations and distance measures are evaluated. One

important conclusion that this paper draws is that none of these method is sufficient by

itself to overcome the illumination variations. More recently, a new image comparison

method was proposed by Jacobs et al. [20]. However this measure is not strictly

3

illumination-invariant because the measure changes for a pair of images of the same

object when the illumination changes.

An illumination subspace for a person has been constructed in [21, 22] for a fixed view

point. Thus under fixed view point, recognition result could be illumination–invariant.

One drawback to use this method is that we need many images per person to construct the

basis images of illumination subspace.

In [23], the authors suggest using Principal Component Analysis (PCA) to solve

parametric shape-from-shading (SFS) problem. Their idea is quite simple. They

reconstruct 3D face surface from single image using computer vision techniques. Then

compute the frontal view image under frontal illumination. Very good results are

demonstrated. I will explain their approach in detail later. Actually, there are a lot of

issues in how to reconstruct 3D surface from single image.

I will discuss two important illumination-invariant FRT in the follow sections.

2.2 The pose problem

The system performance drops significantly when pose variations are present in input

images. Basically, the existing solution can be divided into three types: 1) multiple

images per person are required in both training stage and recognition stage, 2) multiple

images per person are used in training stage but only one database image per person is

available in recognition stage, 3) single image based methods. The second type is the

most popular one.

4

Multiple images approaches an illumination-based image synthesis

method [24] has been proposed for handling both pose and illumination problems. This

method is based on illumination cone to deal with illumination variation. For variations

due to rotation, it needs to completely resolve the GBR (generalized-bas-relief) ambiguity

when reconstructing 3D surface.

Hybrid approaches So many algorithms of this type have been proposed. It is

probably the most practical solution up to now. Three reprehensive methods are reviewed

in this report: 1) linear class based method [25], 2) graph matching based method [26], 3)

view-based eigenface method [27]. The image synthesis method in [25] is based on the

assumption of linear 3D object classes and extension of linearity to images. In [26], a

robust face recognition scheme based on EBGM is proposed. They demonstrate

substantial improvement in face recognition under rotation. Also, their method is fully

automatic, including face localization, landmark detection and graph matching scheme.

The drawback of this method is the requirement of accurate landmark localization which

is not easy when illumination variations are present. The popular eigenface approach [28]

has been modified to achieve pose-invariant [27]. This method constructs eigenfaces for

each pose. More recently, a general framework called bilinear model has been proposed

[41]. The methods in this category have some common drawbacks: 1) they need many

images per person to cover possible poses. 2) The illumination problem is separated from

the pose problem.

5

Single Image Based Approaches Gabor wavelet based feature extraction is proposed

for face recognition [38] and is robust to small-angle rotation. There are many papers on

invariant features in computer vision literature. There are little literatures talking about

using this technology to face recognition. Recent work in [39] sheds some light in this

direction. For synthesizing face images under different lighting or expression, 3D facial

models have been explored in [40]. Due to its complexity and computation cost it is hard

to apply this technology to face recognition.

3 The State of Art

In the following sections, I will discuss some recent research works in face recognition.

3.1 Applying Shape-from-Shading (SFS) to Face Recognition

The basic idea of SFS is to infer the 3D surface of object from the shading information in

image. In order to infer such information, we need to assume a reflectance model under

which the given image is generated from 3D object. There are many illumination models

available. Among these models, the Lambertian model is the most popular one and has

been used extensively in computer vision community for the SFS problem [42]. The

nature of SFS makes it an ill-posed problem in general. In other words, the reconstructed

3D surface cannot synthesize the images under different lighting angle. Fortunately,

theoretical advances make SFS problem a well-posed problem under certain conditions.

The key equation in SFS problem is the following irradiance equation [43]:

6

where is the image, R is the reflectance map and are the shape

gradients (partial derivatives of the depth map).With the assumption of a Lambertian

surface and a single, distant light source, the equation can be written as follows:

or

Since SFS algorithm provides face shape information, illumination and pose problems

can be solved simultaneously. For example, we can solve the illumination problem by

rendering the prototype image Ip from a given input image I. This can be done in two

steps: 1) apply SFS algorithm to obtain (p,q), 2) the new generate the prototype image Ip

under lighting angle = 0.

To evaluate some existing SFS algorithms, Zhao [13] applies several SFS algorithms to

1) synthetic face images which are generated based on Lambertian model and constant

albedo, 2) real face images. The experiment results show that these algorithms are not

good enough for real face images such that a significant improvement in face recognition

can be achieved. The reason is that face is composed of materials with different reflecting

properties: cheek skin, lip skin, eye, etc. hence, Lambertian model and constant albedo

can not provide good approximation. Zhao et al. [3] develop a symmetric SFS algorithm

using the Lambertian and varying albedo (x,y) as a better alternative. With the aid of a

generic 3D head model, they can shorten the two-step procedure of obtaining prototype

image (1. input image to shape via SFS, 2. shape to prototype image) to one step: input

image to prototype image directly.

7

Their algorithm is applied to more than 150 face images from the Yale University and

Weizmann database. The results clearly indicate the superior quality of prototype images

rendered by their method. They also conduct three experiments to evaluate the influence

in recognition performance when their algorithm is combined with existing FRT. The

first experiment demonstrates the improvements in recognition performance by using the

new illumination-invariant measure they define. The results are shown in table 2 and

table 3. The second experiment shows that using the rendered prototype images instead of

original input images can significantly improve existing FRT such as PCA and LDA. The

results of second experiment are shown in table 4 where P denotes prototype image.

Finally, in the third experiment they demonstrate that the recognition rate of subspace

LDA can be improved.

Database Image Measure Gradient Measure Illumination-Invariant Measure

Yale 68.3% 78.3% 83.3%

Weizmann 86.5% 97.9% 81.3%

Table 2: Recognition performance using three different measures (One image per person).

Database Image Measure Gradient Measure Illumination-Invariant Measure

Yale 78.3% 88.3% 90.0%

Weizmann 72.9% 96.9% 87.9%

Table 3: Recognition performance using three different measures (Two images per person).

Database PCA LDA P-PCA P-LDA

Yale 71.7% 88.3% 90.0 95.0%

8

Weizmann 97.9% 100% 95.8% 98.9%

Table 4: Recognition performance with/without using prototype image. (P denotes prototype image)3.2 Applying Illumination Cone to Face Recognition

In earlier work, it is shown that the images under arbitrary combination of light sources

form a convex cone in image space. This cone, called illumination cone, can be

constructed from as few as three images. Figure 1 demonstrates the process of

constructing the illumination cone. Figure 1a show seven original images with different

illumination used in estimation of illumination cone. Figure 1b shows the basis images of

illumination cone. They can be used to generate images under arbitrary illumination

condition. Figure 1c shows the synthesized images from illumination cone of one face.

The reconstructed 3D face surface and illumination cones can be combined together to

synthesize images under different illumination and pose. In [14], Georghiades et al. use

prior knowledge about the shape of face to resolve the Generalized bas-relief (GBR) [15]

ambiguity. Once the GBR parameters are calculated, it is a simple matter to render

synthetic images under different illumination and pose. Figure 2 shows the reconstructed

face surface. Figure 7 shows the synthetic images of a face under different pose and

illumination. Note that these images are generated from the seven training images in

figure 1a where the pose is fixed and only small variation in illumination. In contrast, the

synthetic images exhibit not only large variation in pose but also in illumination. They

performed two sets of recognition experiments. The first experiment, where only

illumination varies while pose remains fixed, was designed to compare other recognition

algorithms to illumination cone method. There are a total of 450 images (45 illumination

9

conditions × 10 faces). These images are divided into for groups (12°, 25°, 50° and 77°)

according to the angle between light source and camera axis. Table 5 shows the results.

Cones-attached means that illumination cone was constructed without cast shadow and

(a)

(b)

(c)

Figure 1: the process of constructing illumination cone. [14]

Figure 2: the reconstructed face surface of 10 persons. [14]

10

cones-cast means that the reconstructed face surface was used to determine cast shadow.

Notice that the cone subspace approximation has the same performance as the original

illumination cone.

Figure 3: Synthesized images under different pose and illumination.

Table 5: Error rate under different illumination while pose is fixed.

11

Figure 4: error rate under different pose and illumination.

In the second experiment, they are evaluating the recognition performance under

variation in pose and illumination. There are a total of 4,050 images (9 poses × 45

illumination conditions × 10 faces). Figure 4 shows the results. Their algorithm has very

low error rate for all poses except on the extreme lighting condition.

We can draw the following conclusions from their experiment results: 1) we can achieve

pose/illumination invariant recognition by using small number of images with fixed pose

and slightly different illumination, 2) the images of face under variable illumination can

be well approximated by a low-dimensional subspace.

3.3 Linear Object Classes Method

Consider the problem of recognizing a face under different pose or expression when only

one picture is given. Human visual system is certainly able to perform this task. The

possible reason is that we exploit the prior information about how face images transform.

Thus, the idea here is to learn image transformation from examples and then apply it to

the new face image in order to synthesize the virtual view that can be used in existing

face recognition system. Poggio and Vetter [25] introduce the technique of generating

artificial new images of an object. Their work is based on the idea of linear object

classes. These are 3D objects whose 3D shape can be represented as linear combination

of a small number of prototype objects. Thus, if the example set consists of frontal and

12

rotated view images, we can synthesize images of rotated view from the given input

image.

For human-made objects, which often consist of cuboids, cylinders, or other geometric

primitives, the assumption of linear object classes seems natural. However, in the case of

face, it is not clear how many examples are sufficient. They test their approach on a set of

13

Figure 5: each test face is rotated by using 49 faces as examples (not shown) and the result are marked as output. Only for comparison the true rotated test face is shown on the lower left (this face was not used in computation) [25].

50 faces, each given in two orientations (22.5˚ and 0˚). In their experiment, one face is

chosen as test face, and the other 49 faces are used as examples. In Figure 5, each test

face is shown on the upper left and the synthesized image is shown on lower right. The

true rotated test face is shown on the lower left. In the upper right, they also show the

synthesis of the test face through 49 examples in test orientation. This reconstruction of

the test face should be understood as the projection of the test face into the subspace

spanned the other 49 examples. The results are not perfect, but considering the small size

of example set, the reconstruction is pretty good. In general, the similarity of the

reconstruction to the input test face allows us to speculate that an example set of

hundreds faces may be sufficient to construct a huge variety of different faces. We can

conclude that the linear object class approach maybe a satisfactory approximation, even

for complex objects as faces.

Therefore, given only a single face image, we are able to generate additional synthetic

face images under different view angle. For face recognition task, these synthetic images

could be used to handle the pose variation. And this approach does not need any depth

information, so the difficult steps of generating 3D models can be avoided.

3.4 View-Based Eigenspace

The eigenface technique of Turk and Pentland [28] was generalized to view-based

eigenspace technique for handling pose variation [27]. These extension accounts for

variation in pose and lead to a more robust recognition system.

14

They formulate the problem of face recognition under different pose as follows: given N

individuals under M different poses, one can build a “view-based” set of M separate

eigenspaces. Each eigenspace captures the variation of N individuals in a common pose.

In view-based approach, the first step is to determine the pose of input face image by

selecting the eigenspace which best describes it. This could be accomplished by

calculating the Euclidian distance between input image and the projection of input image

in each eigenspace. The eigenspace yielding the smallest distance is the one with most

similar pose to input image. Once the proper eigenspace is determined, the input image is

coded using the eigenfaces of that space and then recognized.

They have evaluated the view-based approach with 189 images, 21 people with 9 poses.

The 9 poses of each person were evenly spaced from -90° to 90° along the horizontal

plane. Two different test mythologies were used to judge the recognition performance.

In the first series of experiments, the interpolation performance was tested by training on

the subset of available view {±90°, ±45°, 0°} and testing on the intermediate views

{±68°, ±23°}. The average recognition rate was 90% for view-based method. A second

series of experiments test the extrapolation performance by training on a range of

available view {e.g., -90° to +45°} and testing on views outside the training range {e.g.,

+68°, +90°}. For testing poses separated by ±23° from the training range, the average

recognition rate was 83% for view-based method.

3.5 Curvature-Based Face Recognition

In [7], they use curvature of surface to perform face recognition. This is a great idea since

the value of curvature at a point on the surface is invariant under the variation of

15

viewpoint and illumination. In this approach, a rotation laser scanner produces data of

high enough resolution such that accurate curvature calculation can be made. Face

segmentation can be made based on the sign of Gaussian curvature; this allows two

surface types: convex/concave and saddle regions. Their surface feature extraction

contains not just curvature sign but also principal curvature, principal direction, umbilic

points and extremes in both principal curvatures. The maximum and minimum curvature

at a point defines the principal curvatures. The directions associated with principal

curvatures are the principal directions. The principal curvatures and the principal

directions are given by the eigenvalues and eigenvectors of shape matrix. The product of

two principal curvatures is Gaussian curvature. And mean curvature is defined by the

mean value of two principal curvatures.

In practice, because these curvature calculations contain second order partial derivatives,

they are extremely sensitive to noise. A smoothing filter is required before calculating

curvature. Here, the dilemma is how to choose an appropriate smoothing level. If the

smoothing level is too low, twice derivative will amplify noise such that curvature

measurement is useless. On the other hand, over smoothing will modify the surface

features we are trying to measure. In their implementation, they precompute the curvature

values using several different levels of smoothing. They use the curvature maps from low

smoothing level to establish the location of features. Then, use the prior knowledge of

face structure to select the curvature values from the precomputed set. This is done

manually, I think. An example of principal curvature maps is given in figure 6.

Segmentation is somewhat straightforward. By using the sign of Gaussian curvature, K,

and mean curvature, H, face surface can be divided into four different kinds of regions:

16

K+, H+ is convex, K+, H- is concave, K-, H+ is saddle with and K-, H- is

saddle with . The boundary of these regions is called parabolic curve where

Gaussian curvature is zero. Figure 7 shows an example.

The author also talks about the calculation of surface descriptors. She tries to find as

much information as possible from range data such that this information is as unique as

the individual.

Figure 6: Principal curvature (a) and principal direction (c) of maximum curvature, principal curvature (b) and principal direction (d) of minimum curvature.

17

Figure 7: Segmentation of three faces by the sign of Gaussian and mean curvature: concave (black), convex (white), saddle with positive mean curvature (light gray) and saddle with negative mean curvature (dark gray).With such a rich set of information available, there are many ways to construct a

comparison strategy. The author uses feature extraction and template matching to

perform face recognition. In the experiment, test set consists of 8 faces with 3 views each.

For each face there are two versions without expression and one with expression. The

experiment results show that 97% of the comparisons are correct.

In my opinion, the advantages of curvature-based technique are: 1) it solves the problem

of pose and illumination variation ate the same time. 2) There is a great deal of

information in curvature map which we haven’t taken advantage of. It is possible to find

an efficient way to deal with it.

However, there are some inherent problems in this approach: 1) Laser range finder

system is much more expensive compared with camera. And this technique cannot be

applied to the existing image database. This makes people don’t want to choose it if they

have another choice. 2) Even though the rage finder is not an issue any more, the

computation cost is too high and the curvature calculation is very sensitive to noise. If we

use principal component analysis to deal with range data, the error rate probably will be

similar while the computation complexity is much lower. 3) We can construct 3D face

surface from 2D image instead of expensive range finder. There are a lot of algorithms

available to do this. But you will not be able to calculate curvature from reconstructed 3D

face surface. As mentioned earlier, curvature calculation involves second derivative of

18

surface. Only the high-resolution data such as laser range finder makes the accurate

curvature calculation possible.

3.6 3D model-Based Face Recognition

To reduce the cost of system, Beumier and Acheroy [8] choose the 3D acquisition system

consisting of standard CCD camera and structured light. It is based on the projection of a

known light pattern. The pattern deformation contains thee depth information of the

object. 3D surface reconstruction is done by stripe detection and labeling. Form each

point of a stripe and its label, triangulation allows for X, Y, Z estimation. This process is

very fast while offering sufficient resolution for recognition purpose.

There are 120 persons in their experiment. Each on is taken three shots, corresponding to

central, limited left/right rotation and up/down rotation. Automatic database uses the

automatic program to get 3D information of each individual. In manual database, the 3D

extraction process was performed by clicking initial points in the deformed pattern.

With the 3D reconstruction, they are looking for characteristics to reduce the 3D data to a

set of features that could be easily and quickly compared. But they found nose seems to

be the only robust feature with limited effort. So, they gave up feature extraction and

considered global matching of face surface.

15 profiles are extracted by the intersection of face surface and parallel plane spaced with

1 cm. A distance measurement called profile distance is defined to quantify the difference

between 3D surfaces. This approach is slow: about 1 second to compare two face

surfaces. In order to speed up this algorithm, they tried to use only the central profile and

19

Figure 8: a) Projection of known light pattern b) Analyze the pattern deformation.

Figure 9: Reconstructed face surface

two lateral profiles in comparison. ROC curves are shown in figure 10 to illustrate the

influence of comparison strategy. In central/lateral profile comparison, error rate is

sacrificed (from 3.5% to 6.2%) to earn the speed of surface comparison. In the left of

figure 10, the manual refinement gives us better recognition performance. This tells us

that there is room to improve automatic 3D acquisition system.

Advantage: 1) additional cost is only the projector and pattern slide. 2) Switching the

slide on and off allows acquiring both 2D image and 3D information. The fusion of 2D

and 3D information can increase the recognition performance. 3) The projector

illumination reduces the influence of ambient light. 4) 3D reconstruction and profile

comparison can avoid pose variation.

20

Problems: 1) automatic 3D reconstruction is not good enough. An obvious improvement

can be done by manual refinement. 2) Profile matching is very expensive computational

task. In face authentication, this is not an issue. But in face recognition with big database,

the speed would be terribly slow.

Figure 10. ROC curves of 3D face surface recognition, with 15 profiles (left) and with central/lateral profiles (right).

3.7 Elastic Bunch Graph Matching

In [26], they use Gabor wavelet transform to extract face features so that the recognition

performance can be invariant to the variation in poses. Here, I want to talk about some

terminologies they use first and discuss how they build the face recognition system.

For each feature point on the face, it is transformed with a family of Gabor wavelets. The

set of Gabor wavelets consists of 5 different spatial frequencies and 8 orientations.

Therefore, one feature point has 40 corresponding Gabor wavelet coefficients. A jet is

defined as the set of Gabor wavelet coefficients for one feature points. It can be

written as .

21

A labeled Graph G represents a face consists of N nodes connected by E edges. The

nodes are located at feature points called fiducial points. For example, the pupils, the

corners of mouth, the tip of nose are all fiducial points. The nodes are labeled with jets

. Graphs for different head pose differ in geometry and local features. To be able to

compare graphs of different poses, the manually defines pointers to associate

corresponding nodes in different graphs.

In order to extract graphs automatically for new face, they need a general representation

for face. This representation should cover a wide range of possible variations in

appearance of face. This representative set has stack-like structure, called face bunch

graph (FBG) (see figure 11).

Figure 11 : The Face Bunch Graph serves as a general representation of face. Each stack of discs represents a jet. One can choose the best match jet from a bunch of jets attached to a single node

22

A set of jets referring to on fiducial point is called a bunch. An eye bunch, for instance,

may includes jets from closed, open, female and male eyes etc. to cover possible

variation. The Face Bunch Graph is given the same structure as the individual graph.

In searching for fiducial points in new image of face, the procedure described below

selects the best fitting jet from the bunch dedicated to each fiducial point.

The first set of graphs is generated manually. Initially, when the FBG contains only few

faces, it is necessary to check the matching result. Once the FBG is rich enough

(approximately 70 graphs), the matching results are pretty good.

Matching a FBG on a new image is done by maximizing the graph similarity between

image graph and the FBG of the same pose. For an image graph G with nodes n = 1,…,N

and edges e = 1,…,E and FBG B with model graph m = 1,…,M the similarity is defined

as

since the FBG provides several jets for each fiducial point, the best one is selected and

used for comparison. The best fitting jets serve as local experts for the new image.

They use the FERET database to test their system. First, the size and location of face is

determined and face image is normalized in size. In this step several FBGs of different

size are required; the best fitting one is used for size estimation. In FERET database, each

image has a label indicating the pose, there is no need to estimate pose. But pose could be

estimated automatically in similar way as size.

After extracting model graphs from the gallery images, recognition is possible by

comparing an image graph to all model graphs and selecting the one with highest

similarity value. A comparison against a gallery of 250 individuals takes less than one

23

second. The poses used here are: neutral frontal view (fa), frontal view with different

expression (fb), half-profile right (hr) or left (hl), and profile right (pr) and left (pl).

Recognition results are shown in Table 6.

The recognition rate is high for frontal against frontal images (first row). This is due to

the fact that two frontal views show only little variation. The recognition rate is till high

for right profile against left profile (third row). When comparing left and right half-

profile, the recognition rate drops dramatically (second row). The possible reason is the

variation in rotation angle – visual inspection shows that rotation angle may differ by up

to 30°. By comparing frontal views or profile against half profile, a further reduction in

recognition rate is observed.

From the experiment results, it is obvious that Gabor wavelet coefficients are not

invariant under rotation. Before performing recognition, you still need to estimate pose

and find corresponding FBG.

Table 6: Recognition results for cross-run between different galleries. The combination in the four bottom rows are due to the fact that not all poses were available for all person. The table also shows how often the correct face is identified as rank one and how often it is among the top ten.

24

3.8 Face recognition from 2D and 3D images

Very few research groups focus on face recognition from both 2D and 3D face images.

Almost all existing systems rely on a single type of face information: 2D images or 3D

range data. It is obvious that 3D range data can compensate for the lack of depth

information in 2D image. 3D shape is also invariant under the variation of illumination.

Therefore, integrating 2D and 3D information will be a possible way to improve the

recognition performance.

In [29], corresponding 3D range data and 2D image are obtained at the same time and the

perform face recognition. Their approach consists of following steps:

Find feature points: In order to extract both 2D and 3D features, two type of feature

points are defined. One type is in 3D and is described by Point signature. The other type

is in 2D and is represented as Gabor wavelet response. Four 3D feature points and 10 2D

feature points are selected as shown in figure 12.

Figure 12: 2D feature points are shown by “.” And 3D feature points are shown by “x”.

Gabor wavelet function has tunable orientation and frequency. Therefore, it can be

configured to extract a specific band of frequency components from an image. This

makes it robust against translation, distortion and scaling. Here, they use a widely used

version of Gabor wavelets [26]. For each feature point, there a set of Gabor wavelets

25

consisting of 5 different frequencies and 8 orientations. Localizing a 2D feature point is

accomplished with the following steps.

1. For each pixel k in the search range, calculate the corresponding Jet Jk.

2. Compute the similarity between Jk and all the jets in corresponding bunch.

3. In the search area, the pixel with highest similarity value is chosen as the location

of feature point.

In this paper, point signature is chosen to describe feature points in 3D. Its main idea is

summarized in here. For a given point, we place a sphere with center at this point. The

intersection between sphere and face surface is a 3D curve C. a plane P is defined as the

tangential plane for the given point. The projection of C to P form a new curve called C’.

The distance between C and C’ is a function of . This function is called point signature.

Point signature is invariant under rotation and translation. The definition of point

signatures is also illustrated figure 13.

Figure 13: the definition of point signature.

26

Based on the principal component analysis, four eigenspace are constructed for four 3D

feature points. So, there are eigen point signatures for each 3D feature points. Localizing

a 3D feature point has similar steps as localizing 2D feature points:

1. For each point the search range, calculate the corresponding point signature.

2. The point signature can be approximated by the linear combination of eigen point

signatures.

3. Compute the error between real point signature and approximated point signature.

4. In the search area, the point with smallest error chosen as the location of feature

point.

Face recognition here, each face is expressed as in terms of shape vector and texture

vector. Shape vector consists of point signatures of four feature points and

their eight neighboring points. Each point signature is sampled with equal interval 10°.

The texture vector consists of Gabor wavelet coefficients of 10 2D feature

points. Different weight is assigned for different feature point based on their detection

accuracy.

With the shape vector and texture vector of each training face, texture subspace and

shape subspace can be constructed. For a new face image, their shape vector and

texture vector will be projected to shape subspace and texture subspace respectively. The

projection coefficient vector and are then obtained. The feature vector for this

face is calculated from and as the following equation:

.

27

With the feature vector, the remaining task is simply choosing a classifier with lower

error rate. In this paper, two classifiers are used and compared: one is based on maximal

similarity value, and the other one is based on support vector machine.

Experiment

Their experiment was carried out using face images of 50 individuals. Each person

provides six facial images with view point and expression variation. Some of these

images are chosen as the training images and the rest are taken as test image. So, the

training set and testing set are disjoint.

For a new test image, after localizing the feature points, a 36×9×4-dimension shape

vector and a 40×10-dimension texture vector are calculated. These two vectors are

projected into corresponding subspace. The projection coefficients are augmented to form

an integrated feature vector.

In order to evaluate recognition performance, two test cases are performed with different

features, different number of principal components and different classifiers.

Case 1: Comparison of recognition performance with different features, point signature

(PS), Gabor coefficients (GC) and PS+GC.

Figure 14 (a)shows the recognition rate versus subspace dimensions with different chosen

features. The results confirm their assumption that combination of 2D and 3D

information can improve recognition performance.

Case 2: compare the recognition performance of different classifiers, similarity function

and Support Vector Machine. Figure 14 (b), (c) and (d) show the recognition rate versus

subspace dimension with different classifiers. Result in (b) is obtained using point

signature as feature, (c) is obtained using Gabor coefficients as feature and (c) is obtained

28

using PS+GC as feature. With SVM as the classifier, higher recognition rate is obtained

in all three cases.

Figure 14: recognition rate vs. subspace dimensions: (a) different chosen feature with the same classifier; (b) chosen feature: PS; (c) chosen feature: GC; (d) chosen feature: PS+GC.

3.9 Pose estimation from single image

Generally, a face recognition problem can be divided into two major parts: normalization

and recognition. In normalization, we need to estimate the size, illumination, expression,

and pose of face from the given image and then transform input image into normalized

format which can be recognized by the recognition algorithm. Therefore, how to estimate

pose accurately and efficiently is an important problem in face recognition. Solving this

problem is a key step in building a robust face recognition system.

Ji [10] propose a new approach for estimating 3D pose of face from single image. He

assumes that the shape of face can be approximated by an ellipse. The pose of face can be

expressed in terms of yaw angle, pitch angle and roll angle of ellipse (see figure 15, 16).

His system consists of three major parts: pupil detection, face detection and pose

estimation.

29

Figure 15: face shape can be approximated by an ellipse

Figure 16: pose of face can be expressed in terms of yaw, pitch and roll angle.

In the following sections, I will discuss how his system works.

Pupil detection in order to find pupil easily, an IR light source is used in his

system. Under the IR illumination, pupil will be much brighter than the other parts of

face. Therefore, pupil detection can be bone easily by setting a threshold. This is a robust

operation and can be done in real time.

Face detection Face detection is somewhat more difficult because face

may have significantly different shape under different pose. In his system, face detection

consists of following steps: 1) find the approximate location of face based on the location

30

of pupil. 2) perform the constrained sum of gradients maximization procedure to search

for the exact face location.

In the second step, he applies a technique proposed by Birchfield [30]. Birchfiled’s

technique is aimed at detecting head instead of face. To detect face, he puts constraints on

the gradient maximization procedure. The constraints he exploits are size, location and

symmetry. Specifically, he draws a line between the detected pupils as shown in figure

17. The distance of pupils and their location are used to constraint the size and location of

ellipse. The ellipse should minimally and symmetrically include the two detected pupils.

Results of face detection are shown in figure 18.

Figure 17: face ellipse parameterization.

Figure 18: results of face detection at different pose.

31

Pose estimation with the pupil location, roll angle can be easily calculated from

following equation:

where and are the location of pupils. The basic idea about how to estimate the yaw

and pitch angle is simply figuring out the correspondence between 2D ellipse and 3D

ellipse. The details will not be discussed in this report since the derivation takes too much

space.

In order to evaluate his pose estimation algorithm, artificial ellipses are utilized to verify

the accuracy of estimation results. Experiment results are shown in figure 19.

Figure 19: pose estimation for the synthetic ellipses.

Actual ellipse images with different pitch and yaws are used to further investigate the

performance of his algorithm. The computed rotation angle is shown below each image in

figure 20. The estimation results are consistent with the perceived orientations.

32

Figure 20: the estimated orientation of actual ellipses. The number below each image is the estimated pitch or yaw angle.

Figure 21: pose estimation of face image sequence.

The ultimate test is to study its performance with human faces. For this purpose, image

sequences of face with different pitches and yaws are captured. Ellipse detection is

performed on each image. The detected ellipse is used to estimate the pose. Experiment

results are shown in figure 21.

By using his algorithm, we can find the location of face, track the location of face and

estimate the pose. In other words, his algorithm almost provides everything we need in

building a face recognition system. The better part is we can recognize face not only from

image but also from video.

33

, we can simply apply the hybrid mode face recognition where multiple images are

available for each person. The closest pose can be found in database by using the

estimated pose. Then, use eigenface approach to perform face recognition. As mentioned

before, there are two major drawbacks: 1) it needs many possible images to cover

possible views, and 2) it cannot handle illumination variation. But this type of face

recognition techniques is probably the most practical method up to now.

4 Discussion and Conclusion

In this paper, a variety of face recognition technologies are studied. Basically, they can be

divided into two major groups. The first one: Since eigenface is an efficient method to

perform face recognition under fixed illumination and pose, most of recent works focus

on how to compensate the variation in pose and illumination. For example, SFS,

illumination cone and linear object method are in this direction. The second one: Instead

of using eigenface, some other methods try to avoid illumination and pose problem at the

beginning of their algorithm. For example, curvature-based method and face bunch graph

matching are belonged to this category.

I give below a concise summary followed by conclusions in the same order as methods

appear in this report.

SFS has great potential to be applied in real world application since it requires

only one input image and one registered image per person. But, SFS problem is

still hard to solve so far. The current method has high computational complexity

34

and the re-rendering image has poor quality. So, there are many issues to be

solved before it can be applied to commercial or law enforcement application.

Illumination cone method shows the improvement in recognition performance

under the variation of pose and illumination. One problem is that it requires many

images per person to construct the subspace. For commercial applications, the use

of many images per person is not feasible due to the cost consideration. The other

issue is the computational cost when the database contains a large number of

images. If it took too much time to recognize face when database is large, it is

discouraged form further development.

Linear object method is inspired by the way human look at the world. It provides

a new angle to think about face recognition problem. In [25], they successfully

transform the image from half profile to frontal view given the 49 examples. So,

one possible future work in this direction is to cover more variation in pose or

illumination by expanding their method.

View-base eigenspace method constructs subspace for each pose. Their

experiments show that performance degradation occurs if the input image has

different pose from database. In other words, you need to construct a lot of

subspaces in order to achieve high recognition rate under any pose. So, this

method is an impractical solution for designing a pose-invariant face recognition

system. But, for recognizing face from mug shots in which only three poses exist,

it is probably most feasible solution so far.

Curvature-based face recognition doesn’t have illumination variation since it

recognizes face from range data. So, the cost for implementing this system is

35

higher than other face recognition system which uses camera to acquire data. The

other issue mentioned by the author is that curvature calculation is sensitive to

noise. We can say that this method has high computation complexity and high

cost. Even though it seems a good idea for handling pose and illumination

problem, not too many people try this method in the past few years.

3D model-based method performs face recognition from 3D face surface without

using curvature calculation. The face surface is constructed by projecting pattern

to face. The additional cost for this system is projector and pattern slide. Their

system provides an example of low cost face recognition system using 3D

information. One issue with their system is that their surface reconstruction

algorithm needs manual adjustment to achieve better recognition rate. The fully

automatic reconstruction algorithm is the possible future work in this direction.

Face bunch graph uses the Gabor wavelet transformation to extract face feature

and then collect the variations of feature points to form face bunch graph. The

primary advantage of this method is that the feature extracted by Gabor wavelet is

invariant in certain degree. So, this is an efficient way to design an robust face

recognition system.

Most of existing face recognition algorithms are based on either 2D images or 3D

range data. Combination of 2D and 3D information can further improve the

recognition performance. Here, the authors use point signature to extract 3D

features and Gabor wavelet transformation to extract 2D features. The feature

vector is defined as weighted combination of point signature and Gabor

36

coefficients. With this feature vector, you can perform recognition using any

classifier as long as it provides higher recognition rate.

Recognizing face from video is probably the most difficult problem. One should

be able to tract the location of face, estimate the pose of face and then recognize

face. In [10], their method provides the infrastructure for recognizing face from

video. Face tracking and pose estimation can be done by using their method. We

can expand it to a face recognition system by simply adding the view based

eigenspace algorithm.

5 Reference

[1] R. Chellappa, C.L. Wilson, and Sirohey, “Human and Machine Recognition of Faces, A survey,” Proc. of the IEEE, Vol. 83, pp. 705-740, 1995.

[3] 3D model enhanced face recognition Wen Yi Zhao; Chellappa, R.; Image Processing, 2000. Proceedings. 2000 International Conference on , Volume: 3 , 2000 Page(s): 50 -53 vol.3

[4] SFS based view synthesis for robust face recognition Wen Yi Zhao; Chellappa, R.; Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on , 2000 Page(s): 285 –292

[5] The FERET evaluation methodology for face-recognition algorithms Phillips, P.J.; Hyeonjoon Moon; Rizvi, S.A.; Rauss, P.J.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 22 Issue: 10 , Oct 2000 Page(s): 1090 -1104

[6] Human and machine recognition of faces: a survey Chellappa, R.; Wilson, C.L.; Sirohey, S.; Proceedings of the IEEE , Volume: 83 Issue: 5 , May 1995

37

Page(s): 705 -741

[7] G. Gordon, ``Face Recognition from Depth Maps and Surface Curvature", in Proc. of SPIE, Geometric Methods in Computer Vision, San Diego, July 1991. Vol. 1570.

[8] C. Beumier, M. Acheroy, Automatic 3D Face Authentication In Image and Vision Computing, Vol. 18, No. 4, pp 315-321, Feb 1999.

[9] ] Ali Md. Haider, and Toyohisa Kaneko, “Automated 3D-2D Projective Registration of Human Facial Images Using Edge Features”, Accepted for publication in the International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI).

[10] Qiang Ji, 3D Face pose estimation and tracking from a monocular camera ", Image and Vision Computing, Volume 20, issue 7, pages 499-511, 2002. download the paper

[11] S. Sakamoto, R. Ishiyama, and J. Tajima, “3D model-based face recognition system with robustness against illumination changes,” NEC Research and Development, vol.43, pp. 15-19, 2002.

[12] "What is the Set of Images of an Object under all Possible Illumination Conditions?" P.N. Belhumeur and D. Kriegman, International Journal of Computer Vision, 28(3), 245-260 (1998).

[13] W. Zhao, “Robust Image Based 3D Face Recognition,” PhD Thesis, University of Maryland, 1999.

[14] From few to many: illumination cone models for face recognition under variable lighting and pose Georghiades, A.S.; Belhumeur, P.N.; Kriegman, D.J.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 23 Issue: 6 , Jun 2001 Page(s): 643 -660

[15] The bas-relief ambiguity Belhumeur, P.N.; Kriegman, D.J.; Yuille, A.L.; Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on , 17-19 Jun 1997 Page(s): 1060 -1066

[18] Eigenfaces vs. Fisherfaces: recognition using class specific linear projection Belhumeur, P.N.; Hespanha, J.P.; Kriegman, D.J.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 , Jul 1997 Page(s): 711 -720

38

http://www.ecse.rpi.edu/homepages/qji/Papers/face_pose_ivc.pdf

ftp://ftp.elec.rma.ac.be/user/beumier/PAPERS/ivc.ps.gz

http://www.vincent-net.com/gaile/papers/SPIE_sandiego/spie_sandiego.html

[19] Face recognition: the problem of compensating for changes in illumination direction Adini, Y.; Moses, Y.; Ullman, S.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 , Jul 1997 Page(s): 721 -732

[20]Comparing images under variable illumination Jacobs, D.W.; Belhumeur, P.N.; Basri, R.; Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on , 23-25 Jun 1998 Page(s): 610 -617

[21]What is the set of images of an object under all possible lighting conditions? Belhumeur, P.N.; Kriegman, D.J.; Computer Vision and Pattern Recognition, 1996. Proceedings CVPR '96, 1996 IEEE Computer Society Conference on , 18-20 Jun 1996 Page(s): 270 -277

[22]A low-dimensional representation of human faces for arbitrary lighting conditions Hallinan, P.W.; Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE Computer Society Conference on , 21-23 Jun 1994 Page(s): 995 -999

[23]Statistical Approach to Shape from Shading: Reconstruction of 3D Face Surfaces from Single 2D Images (1997)Joseph J. Atick, Paul A. Griffin, A. Norman RedlichNeural Computation, Vol. 8, pp. 1321-1340, 1996

[24]Illumination-based image synthesis: creating novel images of human faces under differing pose and lighting Georghiades, A.S.; Belhumeur, P.N.; Kriegman, D.J.; Multi-View Modeling and Analysis of Visual Scenes, 1999. (MVIEW '99) Proceedings. IEEE Workshop on , 1999 Page(s): 47 -54

[25]Linear object classes and image synthesis from a single example image Vetter, T.; Poggio, T.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 ,

39

Jul 1997 Page(s): 733 -742

[26]Face recognition by elastic bunch graph matching Wiskott, L.; Fellous, J.-M.; Kuiger, N.; von der Malsburg, C.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 , Jul 1997 Page(s): 775 -779

[27]View-based and modular eigenspaces for face recognition Pentland, A.; Moghaddam, B.; Starner, T.; Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE Computer Society Conference on , 21-23 Jun 1994 Page(s): 84 -91

[28]M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of Cognitive Neuroscience, Vol. 3, No. 1, pp. 71-86, Winter 1991

[29] Integrated 2D and 3D images for face recognition Yingjie Wang; Chin-Seng Chua; Yeong-Khing Ho; Ying Ren; Image Analysis and Processing, 2001. Proceedings. 11th International Conference on , 26-28 Sep 2001 Page(s): 48 -53

[30]Elliptical head tracking using intensity gradients and color histograms Birchfield, S.; Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on , 23-25 Jun 1998 Page(s): 232 -237

[38]A feature based approach to face recognition Manjunath, B.S.; Chellappa, R.; von der Malsburg, C.; Computer Vision and Pattern Recognition, 1992. Proceedings CVPR '92., 1992 IEEE Computer Society Conference on , 15-18 Jun 1992 Page(s): 373 -378

[39]Geometric and illumination invariants for object recognition Alferez, R.; Yuan-Fang Wang; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 21 Issue: 6 ,

40

http://www.cs.ucsb.edu/~mturk/Papers/jcn.pdf

Jun 1999 Page(s): 505 –536

[40]Automatic creation of 3D facial models Akimoto, T.; Suenaga, Y.; Wallace, R.S.; Computer Graphics and Applications, IEEE , Volume: 13 Issue: 5 , Sep 1993 Page(s): 16 -22

[41]Learning bilinear models for two-factor problems in vision Freeman, W.T.; Tenenbaum, J.B.; Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on , 17-19 Jun 1997 Page(s): 554 -560

[42]Surface reflection: physical and geometrical perspectives Nayar, S.K.; Ikeuchi, K.; Kanade, T.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 13 Issue: 7 , Jul 1991 Page(s): 611 -634

[43]Shape from shadingBerthold K. P. Hornand and Michael J. Brooks, Cambridge, Mass. : MIT Press, 1989.

41

Pose and Illumination Insensitive Face Recognition ...homepages.cae.wisc.edu › ~ece738 › projs03...

Documents

Transcript of Pose and Illumination Insensitive Face Recognition ...homepages.cae.wisc.edu › ~ece738 › projs03...