Using Eye Reections for Face Recognition Under …...face recognition methods that require the...

Using Eye Reflections for Face Recognition Under Varying Illumination

Ko Nishino† Peter N. Belhumeur‡ Shree K. Nayar‡

†Dept. of Computer Science ‡Dept. of Computer ScienceDrexel University Columbia University

Philadelphia, PA 19104 New York, NY [email protected] {belhumeur,nayar}@cs.columbia.edu

AbstractFace recognition under varying illumination remains a chal-lenging problem. Much progress has been made toward asolution through methods that require multiple gallery im-ages of each subject under varying illumination. Yet formany applications, this requirement is too severe. In thispaper, we propose a novel method that requires only a singlegallery image per subject taken under unknown lighting. Themethod builds upon two contributions. We first estimate thelighting from its reflection in the eyes. This allows us to ex-plicitly recover the illumination in the single gallery imagesas well as the probe image. Next, we exploit the local linear-ity of face appearance variation across different people. Werepresent the gallery images as locally linear montages ofimages of many different faces taken under the same lighting(bootstrap images). Then, we transfer the estimated combi-nation of bootstrap images to synthesize each subject’s faceunder the probe lighting to accomplish recognition. Finally,we show through tests on the CMU PIE database that wecan achieve better recognition results using our lighting esti-mation method and locally linear montages than the currentstate-of-the-art.

1. IntroductionThe effect of illumination variation on the appearance offaces is known to be one of the major obstacles for facerecognition [17]. This is due to the fact that lighting vari-ation often results in larger intra-personal face appearancevariation than inter-personal variation [12].

Many previous approaches have shown that good recog-nition rates can be achieved even under extreme light-ing [20, 2, 6, 10, 16, 25, 24]. However, most of these meth-ods require multiple gallery (or training) images per subject.Each face has to be sampled under many different light direc-tions such that the space of images under all possible light-ing can be well reconstructed. For many applications thisrequirement is not practical. Rather, one would desire to ac-complish recognition using only a single image per subject.This would not only allow recognition on the existing singleimage databases but also simplify enrollment procedures forfuture databases.

If the single gallery images are all the information we

have, we would have to somehow extract an illumination in-variant feature from each gallery image as well as the probe(or test) image to accomplish recognition. However, it hasbeen shown that such discriminative illumination invariantfeatures do not exist [5]. One can also take an illuminationnormalization approach by deriving intrinsic images of eachimage [7, 26]. However, recovering intrinsic images from asingle image is an inherently ill-posed problem and requiresrestrictive assumptions.

Alternatively, in addition to the single gallery images, onecan collect relevant information from other peoples’ faces.For instance, a set of 3-D laser scans of many different facescan be used to compute a statistical generative model of facegeometry and photometry [4, 27, 23]. Then the generativeface model can be fit to the images to estimate identity-dependent parameter values for recognition. However, suchan approach is computationally expensive and difficult toscale since it requires 3-D information of many people.

Furthermore, while [4] reports excellent results on facerecognition under varying illumination using single galleryimages, it is only tested on images taken with ambient light-ing in addition to a single point source1. It has been shownin [10] that face recognition under single point light sourceis significantly more difficult compared to having multiplelight sources2. Yet face recognition under a single pointsource is also important, since we frequently encounter suchcases, for instance, in images taken outdoors on a sunny dayor in images taken indoors with the subject close to a domi-nant compact source.

Although it may be impractical to capture many imagesof each person who we would like to recognize, it is easy toprepare an image set in which faces of many different peo-ple are captured under many different lighting conditions. Infact, publicly available face databases, which we normallyuse to test face recognition algorithms, can readily be usedto provide general information of face appearance variation

1Note that the CMU PIE database [22] contains two sets of images takenunder varying illumination (“illum” and “lights”). One set of images (“il-lum”) are taken with single flashes while the other set (“lights”) also has theroom light turned on. We run our experiments on the “illum” set.

2In [11] it is shown that for the same data set we use, the error ratealmost halves every time another point source is lit in the image and reachesalmost zero with 11 point light sources. Similar results are reported on theYale Face database B [6] in [10].

1

under varying illumination. Let us refer to these images asthe bootstrap images [21]. Note that the people in the boot-strap set are not the same individuals as those in the gallery.In fact, we use an entirely different database for the bootstrapset. Thus, we have to prepare such a data set only once forany set of gallery images.

In this paper, we focus on face recognition under varyingillumination using a single gallery image per subject takenunder unknown lighting and a bootstrap set. Using theseimage sets, the task is to identify the person in the probe im-age taken under unknown illumination. Shashua et al. [21]tackle this problem by deriving an illumination invariant im-age termed the quotient image. However, their derivationrelies on restrictive assumptions such as faces exhibit pureLambertian reflection and different faces share exactly thesame surface normal distribution. Zhou et al. [28] formulatethis problem as a bilinear estimation of lighting and identityin each galley and probe image. However, bilinear estima-tion is prone to an inherent ambiguity and thus difficult tosolve. To our knowledge, [28] reports the best recognitionresults in this face recognition scenario. However, we willshow in this paper that the recognition rates can be signifi-cantly improved.

We propose a novel method that builds upon two contri-butions to accomplish face recognition in this setup. Thefirst contribution is the use of lighting information reflectedin the eyes for face recognition. Recently, it was shownin [14] that lighting information of the scene in an imagecan be estimated from the eye captured in the image. Weshow that – for images in which the eyes are open – we areable to reliably estimate the lighting for each of the givensingle gallery images and the probe image. The explicitknowledge of lighting allows us to develop the recognitionalgorithm described below. It can also benefit many otherface recognition methods that require the lighting to be esti-mated [4, 27, 23].

The second contribution is a simple yet effective recogni-tion algorithm which exploits the local linearity of face ap-pearance. It has been shown in the past that one person’sface can be well represented as a linear combination of otherpeoples’ faces [25, 24]. We take this idea one step further byexploiting the spatially localized linearity in the appearancevariation of different faces taken under the same lighting.We show that by simply subdividing the images with a reg-ular grid and by representing each individual subimage withthe bootstrap images, we are able to better synthesize the ap-pearance of one person’s face under a novel (probe) lightingfrom a single exemplar image. The work of [16] exploits thespatially localized linearity of face appearance variation us-ing modular eigenspaces in order to accomplish face recog-nition based on face parts such as the mouth and the nose.We, on the other hand, exploit the local linearity in order tosynthesize the appearance of the entire face with non-linearphotometric effects such as highlights and cast-shadows.

In particular, for each single gallery image, we first esti-

pola

r ang

le

azimuth angle�

0 �

180

�

180

pola

r ang

le

azimuth angle�

0 �

180

�

180

Figure 1: Left: One of the face images we used in our experiments.The image is cropped (the original size is 640×480). The estimatedeye limbuses are overlaid in red. Right: The environment map com-puted from the left eye. The intensity values are color coded withred indicating high and blue low. The estimated point source direc-tion is depicted by a cyan cross.

mate the lighting from the eyes. Using the estimated gallerylighting, we synthesize images of each person in the boot-strap set taken under the same lighting by a simple linearcombination. Then, we represent the gallery image as alocally linear montage of these bootstrap images taken un-der the same lighting. We do this by subdividing the im-ages with a regular grid and computing linear coefficients foreach subimage individually. In the recognition stage, giventhe probe image, we estimate the lighting from the eye andalso prepare bootstrap images taken under the probe light-ing. Then, by reusing the linear coefficients computed foreach subimage of the gallery image, we synthesize an im-age of each potential identity taken under the probe lighting.These reconstructed images are compared to the probe im-age for recognition. We refer to this method as the locallylinear montage method.

By exploiting these two factors, namely the lighting in-formation recovered from the eyes and the local linearity offace appearance, we can significantly increase the accuracyof face recognition under varying illumination using singlegallery images. We report recognition results on the CMUPIE database [22] using the Yale Face database B [6] as thebootstrap set. Compared to the result of the state-of-the-artalgorithm reported on the same data [28], we reduce the errorrate by more than 80%.

2. Estimating Lighting from the EyeRecently, it was shown in [14] that lighting information ofthe scene in an image can be estimated from an eye capturedin the image. It was shown that even from low resolutionimages, such as standard VGA images, one can recover anenvironment map of the scene which covers almost the entirefrontal hemisphere. The computed environment map can beused as the illumination distribution of the scene.

In general, face recognition requires detecting the eyesand the mouth locations in the image as a pre-processing step

2

-10 0 10 20 30 40 50 60 70-15

-10

-5

0

5

10

15

20

25

average estimateground truthaverage estimateground truth

azimuth angle (degrees)

pola

r ang

le (d

egre

es)

f8

f11

f20f21

f12

f9

f22

f14

f13

f15

f16

f17

Figure 2: Average estimates over 68 people of the 12 different pointsource directions computed from the eyes. The symbol f# denotesthe light source (flash) number.

to normalize the sizes of the faces. In our experiments, weassume that this pre-processing is already done with a facedetection algorithm and we are provided with the rough lo-cations of the eyes. Using the provided eye location as theinitial estimate of the center of the eye, we use the algorithmdescribed in [15] to fit an ellipse to the limbus (the contourof the cornea) in the image. The initial estimate of the el-lipse parameters is set to a circle of pre-determined radius (8pixels) for all images.

Figure 1 shows one of the face images used in our exper-iments and its estimated environment map. In the computedenvironment map, we thresholded the intensity values andcomputed the centroid of the remaining pixels which wasused as the point source direction (see Figure 1). In the im-ages of the CMU PIE database, since flashes were used asthe point sources, the reflection in the cornea was often ashort line rather than a point.

We evaluated the accuracy of the point source directionestimates on the 68× 12 = 816 images we used in the CMUPIE database. We were able to estimate the light source di-rection from at least one of the eyes in 96.4% (786 of 816)of the images. In the remaining 3.6% images, the eyes wereclosed or nearly closed. As we discuss later, for such images,we can fall back to other methods to estimate the lightingand use it in the algorithm we describe in Section 4. Fig-ure 2 shows the average estimates of the 12 different pointsource directions. Figure 3 shows the histogram of errors inthe estimated polar and azimuth angles of the point sourcedirections. The RMS error in polar and azimuth angles were2.4◦ and 3.5◦, respectively. The errors in polar angles werelarger than those in the azimuth angles. This is mainly dueto the fact that the eyelids often partially occlude the limbusand hence the estimate of the corneal orientation becomesless accurate in the polar direction.

These results show that, in images with open eyes, wecan estimate the lighting with very good accuracy. Whilethe lighting information estimated from the eyes may proveuseful in many different face recognition algorithms [4, 27,23], in this paper, we introduce a new recognition algorithm

-20 -15 -10 -5 0 5 10 15 200

5

10

15

-20 -15 -10 -5 0 5 10 15 200

num

ber o

f im

ages

(% o

f 816

)

error in polar angles (degrees) error in azimuth angles (degrees)

5

10

15

num

ber o

f im

ages

(% o

f 816

)

-20 -15 -10 -5 0 5 10 15 200

5

10

15

-20 -15 -10 -5 0 5 10 15 200

num

ber o

f im

ages

(% o

f 816

)

error in polar angles (degrees) error in azimuth angles (degrees)

5

10

15

num

ber o

f im

ages

(% o

f 816

)

Figure 3: Histograms of polar and azimuth angle errors of the es-timated point source directions. The RMS errors in polar and az-imuth angles were 2.4◦ and 3.5◦, respectively.

designed to explicitly exploit the known lighting.

3. Linear ReconstructionIt has been shown in the past that one person’s face can bewell-represented as a linear combination of other peoples’faces [25, 24]. Since we have already estimated the lightingfor each gallery image and probe image from the eyes, wecan represent each image as a linear combination of otherpeoples’ images in the bootstrap set taken under the samelighting as follows.

Once we estimate the lighting for each gallery image orthe probe image from the eyes, we are able to representthe estimated lighting as a linear combination of the light-ing sampled in the bootstrap set. Let us denote the esti-mated point source direction in either the gallery image orthe probe image with a 3× 1 vector s. Similarly, we will de-note the point source directions sampled in the bootstrap setwith s

j (j = 1, ..,K). Then we can compute the lighting co-efficients c = {c1, .., cK} which are the linear coefficients ofsj for s. Since lighting is generally frontal in face images, it

is natural to assume cj ≥ 0. Hence we solve a non-negativeminimization problem to estimate the lighting coefficients:

c = arg minc|s −

K∑

j

cjsj |L2

, (1)

subject to cj ≥ 0. If we have a complex lighting, for in-stance consisting of area light sources, we can estimate andrepresent it as a set of point sources by sampling the envi-ronment map computed from the eye. Then we can computethe corresponding lighting coefficients for each point sourceusing Eq. (1).

Once we compute the lighting coefficients c, we can usethem to synthesize face images of the people in the boot-strap set under the gallery/probe lighting s. Let us denotethe bootstrap images by x

i,jb where j (j = 1, ..,K) repre-

sents one point source direction sampled in the bootstrap setand i (i = 1, ..M) is the identity of the person in the boot-strap set. In our experiments, we use the Yale Face databaseB [6] as the bootstrap set in which we have M = 10 andK = 64. Then, we can synthesize bootstrap images underthe given lighting s as

3

xi,sb =

K∑

j

cjx

i,jb , (2)

where s is the estimate of s using c in Eq. (1). Note that wehave the light direction s in the superscript so that it is easierto see that the left hand side is a bootstrap image of person i

taken under the lighting s.Then, we can estimate the linear coefficients w to repre-

sent the gallery/probe image x with these bootstrap imagesby solving a simple linear estimation problem:

w = arg minw|x −M∑

i

wix

i,sb |L2

. (3)

Here again, we assume that the linear coefficients are non-negative and solve Eq. (3) subject to wi ≥ 0(i = 1, ..,M).Also, we normalize each image such that |x| = 1 to factorout the brightness variation in lighting among images anddifferences in the overall reflectivities of the different faces.

Let us refer to the linear coefficient set w as the identitycoefficients since it is unique for each individual. Zhou etal. [28] also estimate the lighting coefficients and identitycoefficients to represent the gallery image and the probe im-age as a linear combination of bootstrap images. However,they simultaneously estimate both coefficients in a bilinearformulation assuming pure Lambertian reflection. It is, how-ever, well-known that bilinear estimation suffers from ambi-guities [9]. It has been shown that lighting estimation fromshading in images suffer from bas-relief ambiguity [3]. Es-pecially for face images, since Lambertian reflection acts asa low-pass filter in the frequency domain [1, 18], simulta-neous estimation of both the identity and lighting in Eq. (1)and (3) is prone to local minima. In fact, as we will demon-strate later, even when the probe image and the gallery im-age are exactly the same, it is not guaranteed that the bilinearapproach will estimate the same identity and lighting coef-ficients. Note that we do not suffer from such ambiguities,since the lighting is explicitly recovered from the eyes andused to estimate the lighting coefficients separately from theidentity coefficients.

Since the identity coefficients are independent of the il-lumination, we can reuse the identity coefficients w

kg com-

puted from the gallery image of person k, to synthesizehis/her face image under the probe lighting sp:

xkp =

M∑

i

wi,kg x

i,spb . (4)

Here the probe lighting sp is computed from the eye and itsestimate sp is computed using Eq. (1). Once we reconstructeach person’s face image under the probe lighting (recon-structed gallery image), we can compare them to the actualprobe image to identify the person. We will describe thismethod and its related assumptions in detail in the followingsection.

This reconstruction approach using Eq. (4) allows us toassess the quality of subsequent recognition by looking at

Figure 4: Two example results of the linear reconstruction methoddescribed in Section 3. In each example, the left image is the probeimage and the right image is the reconstructed gallery image (re-constructed image of the person in the gallery image under thelighting in the probe image). Although the overall appearance issimilar to the probe image, the reconstructed gallery image doesnot include the details of the face.

how well each individual’s face appearance is synthesizedunder the probe lighting. Figure 4 shows two probe imagesand the same persons’ reconstructed gallery images. Al-though the reconstructed gallery images do capture the over-all appearance of each person’s face, they are very smoothand the details of each individual’s face are lacking. In orderto accomplish recognition with high accuracy, we need to re-construct the images with finer details – details that captureseach individual’s facial characteristics as well as correct il-lumination effects including cast-shadows and highlights.

4. Locally Linear ReconstructionIn order to see why the linear reconstruction method failsto reconstruct the appearance of each individual under theprobe lighting well, let us first take a geometrical view of themethod.

Each image in the gallery, probe, and bootstrap set can beconsidered as a unique point in R

n. Then, computing iden-tity coefficients w using Eq. (3) corresponds to projectingthe gallery image onto the hyperplane formed by the boot-strap images taken under the same lighting (gallery light-ing) and computing its relative coordinates with respect tothe bootstrap image points on this hyperplane. The relativecoordinates of this gallery image projection are then used toreconstruct a point on the hyperplane formed by the boot-strap images taken under the probe lighting, see Eq. (4). Theunderlying claim made here is that the image points of facestaken under the same lighting can be well-approximated witha linear hyperplane, and the relative position of one imagepoint with respect to other image points on this hyperplaneis invariant to the lighting change.

Clearly, this assumption does not hold for images offaces. Because of non-linear appearance variations due tospecular reflections and cast-shadows, we cannot expect thatlighting variations will preserve the relative coordinate sys-tem of image points of different faces. However, we canmake simple modifications to the way we compute the rela-tive coordinates such that they are well preserved under illu-mination variations.

4

2.0

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8��

� �R

MS

gra

y le

vel e

rror

s of

reco

nsut

ruct

edim

ages

100x80 50x40 20x20 10x8 5x4 2x2subimage size (height x width)

Figure 5: Example results of varying the amount of subdivision ofimages used to reconstruct the gallery images under the probe light-ing. The original image size was 100×80 and we compared the re-construction errors for 6 different levels of subdivision. The RMSerror of gray level values of the reconstructed images decreases asthe images are subdivided into smaller subimages.

4.1. Spatially Local LinearityWhile images of one person’s face taken under varying il-lumination have been shown to lie close to an extremelylow-dimensional subspace [6, 8, 1, 18], face images of dif-ferent people taken under the same lighting do not exhibitthe same behavior [24, 25]. Therefore, approximating thegallery/probe image using the hyperplane formed by the setof bootstrap images would simply not yield a good recon-struction. Once again, consider Figure 4. With only a mod-est number of bootstrap images it is difficult to represent thefine details in face images – details that are crucial for recog-nition.

However, if we observe locally in the spatial domain offace images, we can expect the intensity variation to be ofmuch lower-dimensional compared to the variation of the en-tire image. In other words, while the appearance variation ofthe entire face across different people is highly non-linear,local image regions exhibit spatially localized linearity be-cause of similar geometric and photometric properties3 [13].In [16], this spatially local linearity of face appearance varia-tion across different people has been exploited for construct-ing low-dimensional subspaces of facial parts such as thenose and mouth. Here, we can exploit this local linearity toreduce the dimensionality of the subspace formed by all im-ages points in which we compute the relative coordinates. Inother words, we can make the linear hyperplane approxima-tion of the image point of faces taken under the same lightingwork better by reducing the dimensionality of the subspacein which those image points lie.

3In [21], the authors assume pure Lambertian reflection and that thesurface normals are exactly the same among different people and only thealbedo varies. Note that we are not making such restrictive assumptions.

1 2 3 4 5 6 7 8 9 102.0

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

3.0��

� �

number of nearest neighbors

RM

S g

ray

leve

l err

ors

of re

cons

utru

cted

imag

es

Figure 6: Example results of varying the number of nearest neigh-bors used to reconstruct the gallery images under the probe lighting.The RMS error of gray level values of the reconstructed images de-creases when only neighboring bootstrap images are used (10 cor-responds to using all images). The optimal number of nearestneighbors was 2 for this specific example.

Since the faces are pre-aligned, we can simply subdividethe images into rectangular subimages with a regular gridand compute identity coefficients for each individual subim-age. In other words, we will first represent the gallery imageas a montage of bootstrap images taken under the same light-ing. Then, we use the same combination with the bootstrapimages taken under the probe lighting to reconstruct a mon-tage of the same person under the probe lighting. Figure 5shows example results of varying the degree to which the im-ages are subdivided to reconstruct the gallery images underthe probe lighting. In this example, the gallery lighting wasf8 and the probe lighting was f15 (see Figure 2 for lightingdirections). The RMS errors of gray level values were com-puted over 68 people (note that each image is normalizedsuch that the total energy is 1). The original image size was100 × 80 and we compared the reconstruction errors for 6different levels of subdivision. As can be seen in the results,the smaller the subimages become the better we can recon-struct the image. In the example shown in Figure 5, the bestresult was achieved when the images were subdivided into2 × 2 subimages. Note that, for different combinations oflighting, the optimal size of subimages might slightly vary.In our experiments, we use a fixed size (2 × 2) for all tests.

4.2. Reconstruction from NeighborsRecently, in the context of manifold embedding, Saul etal. [19] showed that one can effectively compute a low-dimensional embedding of image points in high-dimensionalspace by assuming that nearby points remain nearby. In ourcase, we would expect that faces that are close by in the im-age space would have more similar shape and photometricproperties than those faces that are far apart. As a result, we

5

expect that nearby faces to one face in the image space underone lighting would be better representative for that face un-der another lighting. To this end, we follow [19] and only useneighboring image points to represent an image point of in-terest in R

n (or Rn, where n =# of pixels in each subimage).

Instead of using all bootstrap images taken under the samelighting to compute the identity coefficients using Eq. (3),we will find q nearest neighbors to the gallery image anduse only those bootstrap images to compute the identity co-efficients4. Then we will use the same q people’s face im-ages taken under the probe lighting to reconstruct the galleryimage under the probe lighting from the computed identitycoefficients using Eq. (4).

Figure 6 shows example results of varying the number ofnearest neighbors used to reconstruct probe images. In thisexample, the gallery lighting was f8 and the probe lightingwas f15 (see Figure 2 for lighting directions). The imageswere subdivided into 2 × 2 subimages. The RMS errors ofgray level values were computed over 68 probe images (notethat each image is normalized such that the total energy is 1).Since the bootstrap set (Yale database B) has 10 people, 10nearest neighbors corresponds to using all bootstrap imagesunder the same lighting. As can be seen, using only 2 nearestneighbors yields the best reconstruction.

The optimal number of nearest neighbors depends on howdensely the image points are sampled as well as the dimen-sionality of the subspace that all image points form. As a re-sult, it is difficult to automatically determine the number ofnearest neighbors to be used. This remains as future work.In our experiments, we use q = 3 which we found to yieldbest results.

4.3. Locally Linear MontageLet us now summarize the face recognition method we pro-pose. We will refer to the proposed algorithm as the lo-cally linear montage method, where locally linear refers toboth spatially within the image and also locally in the imagespace. Also, remember that we need to first explicitly esti-mate the lighting in order to use this method, which we dofrom the eyes in the images.

Given a set of gallery images xg and bootstrap imagesXb = {xi,j

b |i = 1, ..M, j = 1, ..K}, we would like to iden-tify the person in the probe image xp. Note that the galleryimages need not be taken under the same lighting.

1. For each gallery image xkg

(a) Estimate the lighting skg from the eye.

(b) Compute the lighting coefficients ckg using

Eq. (1).

(c) Synthesize bootstrap images taken under lighting

skg as X

skg

b using Eq. (2).4We use the Euclidean distance for measuring distances between the

images.

(d) Subdivide both the gallery image and the boot-strap images into rectangular patches: x

k,mg ,

Xs

kg ,m

b where m is the patch number.

(e) Compute the identity coefficients wk,m using

Eq. (3) for each rectangular patch m with q near-est neighbor bootstrap images.

2. Given the probe image xp

(a) Estimate the lighting sp from the eye.

(b) Compute the lighting coefficients cp using Eq. (1).

(c) Synthesize bootstrap images taken under lightingsp as X

sp

b .

(d) Subdivide the bootstrap images taken underthe probe lighting X

sp

b into rectangular patchesX

sp,m

b .

3. For each person k in the gallery image set, reconstructthe image under the probe lighting x

kp with w

k,m and

Xsp,m

b using Eq. (4) for each patch.

4. Compare every reconstructed image xkp with the probe

image xp. The identity k that gives the smallest error inL2 is the estimate of identity.

Note that step 1 is an off-line procedure and the actualrecognition process (step 2 to 4) only involves simple mul-tiplications and additions which can potentially be imple-mented to run in real time.

5. Experimental ResultsIn order to conduct a comprehensive evaluation and a faircomparison with the state-of-the-art method of Zhou etal. [28], we evaluated our algorithm on the same setup usedin [28]. We used Yale Face database B [6] which has 10 peo-ple taken under 64 different point source directions as thebootstrap images. Following [28], we ran recognition testson the 12 light source subset of CMU PIE database whichhas 68 people [22]. As in [28], we tested our algorithm onall possible combinations of gallery lighting and probe light-ing.

We ran three recognition methods on the data set: the bi-linear estimation method [28], the global linear reconstruc-tion method described in Section 3, and the locally linearmontage method given in Section 4. The table shown inFigure 7 shows the recognition rates for all combinations ofgallery and probe lightings5.

5The images which we could not estimate the lighting from the eyes(3.4% of the images) were not used in the experiments shown in Figure 7for fair and clear comparison. However, in practice, we can fall back to thebilinear estimation in [28] to estimate the lighting for such images and useit in our locally linear montage method. The results did not change with thishybrid method (92.8%) while the bilinear method ran on all images also didnot change (67.4%).

6

gallery lighting

prob

e lig

htin

g

��

��

��

��

��

��

��

��

��

��

��

��

��

��

gallery lighting

prob

e lig

htin

g

��

��

��

��

��

��

��

��

��

��

��

��

��

��

gallery lighting

prob

e lig

htin

g

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Figure 7: Recognition results of bilinear estimation [28] / global linear reconstruction (Section 3) / locally linear montage. The locallylinear montage method achieves over 25% better recognition rate compared to the bilinear estimation method.

In the diagonal of the results, one can see that the bilinearestimation method does not achieve 100% recognition rateswhile the other two methods do. These are the cases whenexactly the same images are used for both the gallery andthe probe. We ran the bilinear alternating minimization de-scribed in [28] with random initial estimates, since there areno legitimate reasons to use a constant initial value for ei-ther the identity or lighting coefficients. These results showthat the ambiguity between lighting and identity in the bilin-ear estimation is difficult to resolve. The other two methodswhich explicitly estimates the lighting from the eyes do notsuffer from such ambiguity.

As shown in Figure 7, once the lighting estimated fromthe eyes is used (global linear reconstruction), the over-all recognition rate increases 6% compared to the bilinearmethod. Furthermore, by exploiting the local linearity offace appearance taken under the same lighting in the locallylinear montage method, we achieve 94% recognition rate. In[28], the authors increase the number of people in the boot-strap images to 100 by using synthetic face images renderedfrom the Blanz and Vetter 3-D Face database [4]. This yields93% overall recognition rate. This recognition rate is aboutthe same as what we can achieve with only 10 people inthe bootstrap set. We expect that the locally linear montagemethod can scale better.

Figure 8 shows several reconstructed gallery images un-der a specific probe lighting. By comparing the third row(global linear reconstructions) with the second row (bilinearreconstructions), one can see that the overall shading on theface is more accurate once the lighting information recov-ered from the eyes is used. This can be observed by lookingat the sharpness of the shadow boundaries cast by the noses.In the locally linear montages (the fourth row), the facial fea-tures of each individual is recovered in detail, as well as thenon-linear effect of highlights and cast-shadows.

The locally linear montages are blocky since each indi-vidual subimage is currently handled independently. In or-der to achieve even better reconstruction we would need toexploit the spatial correlation of face appearance variationamong different people. For instance, instead of subdivid-ing the images with a regular grid and handling each subim-age independently, we would have to simultaneously esti-mate both the optimal spatial partitioning/segmentation andthe identity coefficients such that the reconstruction errors

are minimized. However, such an optimization is extremelydifficult to solve. We are currently investigating efficient al-gorithms which will yield sub-optimal but effective solutionsto circumvent this problem.

6. ConclusionIn this paper, we first showed that the lighting informationrecovered from the eyes in face images can be of significantuse for face recognition under varying illumination. Next,we showed that the explicit information of lighting allowsus to exploit the local linearity of face appearances of differ-ent people taken under the same lighting. By reconstructingthe gallery image as a locally linear montage of bootstrapimages taken under the probe lighting using the combina-tions computed for the single gallery image, we are able tobetter synthesize each person’s facial details and non-linearillumination effects. We showed that these two contributionsresult in significantly better recognition rates through a com-prehensive test on face recognition with varying illuminationfrom single gallery images.

AcknowledgmentsThis work was conducted at the Columbia Vision and Graph-ics Center of Columbia University while the first author wasaffiliated with Columbia University as a postdoctoral re-search scientist. This work was supported by NSF IIS 00-85864 and NSF IIS 03-08185 in part.

References[1] R. Basri and D.W. Jacobs. Lambertian Reflectance and Linear

Subspaces. IEEE TPAMI, 25(2):218–233, 2003.[2] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman. Eigen-

faces vs. Fisherfaces: Recognition Using Class Specific Lin-ear Projection. IEEE TPAMI, 19(7):711–720, 1997.

[3] P.N. Belhumeur, D.J. Kriegman, and A.L. Yuille. The Bas–Relief Ambiguity. IJCV, 35(1):33–44, 1999.

[4] V. Blanz and T. Vetter. Face Recognition Based on Fitting a3D Morphable Model. IEEE TPAMI, 25:1063–1074, 2003.

[5] H.F. Chen, P.N. Belhumeur, and D.W. Jacobs. In Search ofIllumination Invariants. In IEEE CVPR, volume 2, pages 254–261, 2000.

[6] A.S. Georghiades, P.N. Belhumeur, and D.J. Kriegman. FromFew to Many: Illumination Cone Models for Face Recog-nition under Variable Lighting and Pose. IEEE TPAMI,23(6):643–660, 2001.

7

global linear reconstructionglobal linear reconstruction

bilinear reconstructionbilinear reconstruction

probe imageprobe image

locally linear montagelocally linear montage

Figure 8: Reconstruction results of the three different methods. First row: probe images, second row: bilinear reconstruction [28], thirdrow: global linear reconstruction (Section 3), and fourth row: locally linear montage. The locally linear montages capture both the detailsof each individual face and the illumination effects including shading, cast shadows and highlights.

[7] R. Gross and V. Brajovic. An Image Processing Algorithmfor Illumination Invariant Face Recognition. In Int’l Conf.on Audio- and Video-Based Biometric Person Authentication,pages 10–18, 2003.

[8] P. Hallinan. A Low-Dimensional Representation of HumanFaces for Arbitrary Lighting Conditions. In IEEE CVPR,pages 959–999, 1994.

[9] J.J. Koenderink and A.J. van Doorn. The Generic BilinearCalibration-Estimation Problem. IJCV, 23(3):217–234, 1997.

[10] K-C. Lee, J. Ho, and D. Kriegman. Nine Points of Light: Ac-quiring Subspaces for Face Recognition under Variable Light-ing. In IEEE CVPR, pages 519–526, 2001.

[11] K-C. Lee, J. Ho, and D. Kriegman. Acquiring Linear Sub-spaces for Face Recognition under Variable Lighting. IEEETPAMI, 27(5):684–698, May 2005.

[12] Y. Moses, Y. Adini, and S. Ullman. Face Recognition: TheProblem of Compensating for Changes in Illumination Direc-tion. In ECCV, pages 286–296, 1994.

[13] S.K. Nayar, P.N. Belhumeur, and T.E. Boult. Lighting Sensi-tive Display. ACM TOG, 23(4):963–979, 2004.

[14] K. Nishino and S.K. Nayar. Eyes for Relighting. ACM TOG,23(3):704–711, 2004.

[15] K. Nishino and S.K. Nayar. The World in Eyes. In IEEECVPR, volume I, pages 444–451, 2004.

[16] A. Pentland, B. Moghaddam, and T. Starner. View-Based andModular Eigenspaces for Face Recognition. In IEEE CVPR,pages 84–91, 1994.

[17] P.J. Phillips. Human Identification Technical Challenges. InICIP, volume 1, pages 49–52, 2002.

[18] R. Ramamoorthi. A Signal-Processing Framework for Reflec-

tion. ACM TOG, 23(4):1004–1042, 2004.[19] L.K. Saul and S.T. Roweis. Think Globally, Fit Locally: Un-

supervised Learning of Low Dimensional Manifolds. JMLR,4:119–155, Dec. 2003.

[20] G. Shakhnarovich and B. Moghaddam. Handbook ofFace Recognition, chapter Face Recognition in Subspaces.Springer-Verlag, 2004.

[21] A. Shashua and T. Riklin-Raviv. The Quotient Image: ClassBased Re-rendering and Recognition With Varying Illumina-tion. IEEE TPAMI, 23(2):129–139, 2001.

[22] T. Sim, S. Baker, and M. Bsat. The CMU Pose Illuminationand Expression Database. IEEE TPAMI, 25(12):1615–1618,2003.

[23] T. Sim and T. Kanade. Combining Models and Exemplarsfor Face Recognition: An Illumination Example. In CVPRWorkshop on Models versus Exemplars in Computer Vision,2001.

[24] L. Sirovitch and M. Kirby. Low-Dimensional Procedure forthe Characterization of Human Faces. JOSA, A, 2:519–524,1987.

[25] M. Turk and A. Pentland. Eigenfaces for Recognition. Jour-nal of Cog. Neuroscience, 3(1):71–96, 1991.

[26] H. Wang, S.Z. Li, and Y. Wang. Generalized Quotient Image.In IEEE CVPR, volume 2, pages 498–505, 2004.

[27] L. Zhang and D. Samaras. Face Recognition Under VariableLighting using Harmonic Image Exemplars. In IEEE CVPR,volume I, pages 19–25, 2003.

[28] S. Zhou and R. Chellappa. Rank Constrained Recognitionfrom Unknown Illuminations. In IEEE AMFG, 2003.

8

Using Eye Reections for Face Recognition Under …...face recognition methods that require the...

Documents

Transcript of Using Eye Reections for Face Recognition Under …...face recognition methods that require the...