Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability...

16
Author's personal copy Recognizability assessment of facial images for automated teller machine applications Jae Kyu Suhr a , Sungmin Eum b , Ho Gi Jung c , Gen Li b , Gahyun Kim b , Jaihie Kim b,n a Research Institute of Automotive Electronics and Control, Hanyang University, Seoul, Republic of Korea b Biometrics Engineering Research Center, School of Electrical and Electronic Engineering, Yonsei University, Seoul, Republic of Korea c School of Mechanical Engineering, Hanyang University, Seoul, Republic of Korea article info Article history: Received 17 March 2011 Received in revised form 4 November 2011 Accepted 18 November 2011 Available online 1 December 2011 Keywords: Recognizability assessment Facial image Automated teller machine (ATM) Facial occlusion Component-based face detection abstract Crimes related to automated teller machines (ATMs) have increased as a result of the recent popularity in the devices. One of the most practical approaches for preventing such crimes is the installation of cameras in ATMs to capture the facial images of users for follow-up criminal investigations. However, this approach is vulnerable in cases where a criminal’s face is occluded. Therefore, this paper proposes a system which assesses the recognizability of facial images of ATM users to determine whether their faces are severely occluded. The proposed system uses a component-based face candidate generation and verification approach to handle various facial postures and acceptable partial occlusions. Element techniques are implemented via grayscale image-based methods which are robust against illumination conditions compared to skin color detection approach. The system architecture for achieving both high performance and cost-efficiency is proposed to make the system applicable to practical ATM environments. In the experiment, the feasibility of the proposed system was evaluated using a large- scale facial occlusion database consisting of 3168 image sequences including 21 facial occlusions, 8 illumination conditions, and 2 acquisition scenarios. Based on the results, we drew up the guidelines of recognizability assessment systems for ATM applications. & 2011 Elsevier Ltd. All rights reserved. 1. Introduction Recently, automated teller machines (ATMs) have become widely used by customers choosing not to go to banks and banks trying to lessen the work load of their tellers [1,2]. On the other hand, crimes related to ATMs have also increased due to the absence of human operators [3]. Extensive research has been carried out to prevent these crimes and can be divided into three categories. The first category is concerned with the protection of an ATM user’s banking information. The system in [4] warns users of the presence of loiterers behind them. The system in [5] detects illegal objects attached to ATMs such as card reproducers and cameras used to obtain card and password information. The system in [6] diversifies the methods for entering passwords in order to prevent criminals from looking over the shoulders of ATM users. The second category focuses on the detection of physical damage to an ATM itself. The system in [4] introduces a method which uses a motion sensor to detect cases where criminals try to destroy or move ATMs. The last category aims to prevent financial transactions made by inappropriate methods or users. The sys- tems in [5,7] detect forged notes in ATM environments. The systems in [5,710] identify cardholders via biometrics (face, iris, fingerprint or finger vein) in order to prevent criminals from using stolen cards. The systems in [4,1120] try to record facial images of ATM users for follow-up criminal investigations. Of these approaches, the system which records an ATM users’ facial image [4,1121] is widely used in practical situations since it inhibits potential criminals from using the ATMs inappropri- ately, the facial images acquired are useful for criminal investiga- tion, and it requires only a single conventional camera and storage. However, this system is highly vulnerable to criminals who occlude their faces, many of whom take advantage of this weakness [1221]. Therefore, it is necessary to develop a system which can determine whether recognizable facial images are acquired by cameras in ATMs. Such a system will be useful and practical for increasing the security level of ATMs. That is, the system first checks the recognizability of the facial images, and then accepts the user’s requests only if the facial images are classified as recognizable. Otherwise, all the requested financial transactions will be rejected. Detecting a recognizable facial image in ATM environments has been researched in [1221], and can be categorized into three Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/pr Pattern Recognition 0031-3203/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2011.11.014 n Corresponding author. Tel.: þ82 2 2123 2869. E-mail addresses: [email protected] (J.K. Suhr), [email protected] (S. Eum), [email protected] (H.G. Jung), [email protected] (G. Li), [email protected] (G. Kim), [email protected] (J. Kim). Pattern Recognition 45 (2012) 1899–1914

Transcript of Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability...

Page 1: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

Recognizability assessment of facial images for automated tellermachine applications

Jae Kyu Suhr a, Sungmin Eum b, Ho Gi Jung c, Gen Li b, Gahyun Kim b, Jaihie Kim b,n

a Research Institute of Automotive Electronics and Control, Hanyang University, Seoul, Republic of Koreab Biometrics Engineering Research Center, School of Electrical and Electronic Engineering, Yonsei University, Seoul, Republic of Koreac School of Mechanical Engineering, Hanyang University, Seoul, Republic of Korea

a r t i c l e i n f o

Article history:

Received 17 March 2011

Received in revised form

4 November 2011

Accepted 18 November 2011Available online 1 December 2011

Keywords:

Recognizability assessment

Facial image

Automated teller machine (ATM)

Facial occlusion

Component-based face detection

a b s t r a c t

Crimes related to automated teller machines (ATMs) have increased as a result of the recent popularity

in the devices. One of the most practical approaches for preventing such crimes is the installation of

cameras in ATMs to capture the facial images of users for follow-up criminal investigations. However,

this approach is vulnerable in cases where a criminal’s face is occluded. Therefore, this paper proposes a

system which assesses the recognizability of facial images of ATM users to determine whether their

faces are severely occluded. The proposed system uses a component-based face candidate generation

and verification approach to handle various facial postures and acceptable partial occlusions. Element

techniques are implemented via grayscale image-based methods which are robust against illumination

conditions compared to skin color detection approach. The system architecture for achieving both high

performance and cost-efficiency is proposed to make the system applicable to practical ATM

environments. In the experiment, the feasibility of the proposed system was evaluated using a large-

scale facial occlusion database consisting of 3168 image sequences including 21 facial occlusions,

8 illumination conditions, and 2 acquisition scenarios. Based on the results, we drew up the guidelines

of recognizability assessment systems for ATM applications.

& 2011 Elsevier Ltd. All rights reserved.

1. Introduction

Recently, automated teller machines (ATMs) have becomewidely used by customers choosing not to go to banks and bankstrying to lessen the work load of their tellers [1,2]. On the otherhand, crimes related to ATMs have also increased due to theabsence of human operators [3]. Extensive research has beencarried out to prevent these crimes and can be divided into threecategories. The first category is concerned with the protection ofan ATM user’s banking information. The system in [4] warns usersof the presence of loiterers behind them. The system in [5] detectsillegal objects attached to ATMs such as card reproducers andcameras used to obtain card and password information. Thesystem in [6] diversifies the methods for entering passwords inorder to prevent criminals from looking over the shoulders of ATMusers. The second category focuses on the detection of physicaldamage to an ATM itself. The system in [4] introduces a methodwhich uses a motion sensor to detect cases where criminals try to

destroy or move ATMs. The last category aims to prevent financialtransactions made by inappropriate methods or users. The sys-tems in [5,7] detect forged notes in ATM environments. Thesystems in [5,7–10] identify cardholders via biometrics (face, iris,fingerprint or finger vein) in order to prevent criminals from usingstolen cards. The systems in [4,11–20] try to record facial imagesof ATM users for follow-up criminal investigations.

Of these approaches, the system which records an ATM users’facial image [4,11–21] is widely used in practical situations sinceit inhibits potential criminals from using the ATMs inappropri-ately, the facial images acquired are useful for criminal investiga-tion, and it requires only a single conventional camera andstorage. However, this system is highly vulnerable to criminalswho occlude their faces, many of whom take advantage of thisweakness [12–21]. Therefore, it is necessary to develop a systemwhich can determine whether recognizable facial images areacquired by cameras in ATMs. Such a system will be useful andpractical for increasing the security level of ATMs. That is, thesystem first checks the recognizability of the facial images, andthen accepts the user’s requests only if the facial images areclassified as recognizable. Otherwise, all the requested financialtransactions will be rejected.

Detecting a recognizable facial image in ATM environmentshas been researched in [12–21], and can be categorized into three

Contents lists available at SciVerse ScienceDirect

journal homepage: www.elsevier.com/locate/pr

Pattern Recognition

0031-3203/$ - see front matter & 2011 Elsevier Ltd. All rights reserved.

doi:10.1016/j.patcog.2011.11.014

n Corresponding author. Tel.: þ82 2 2123 2869.

E-mail addresses: [email protected] (J.K. Suhr),

[email protected] (S. Eum), [email protected] (H.G. Jung),

[email protected] (G. Li), [email protected] (G. Kim),

[email protected] (J. Kim).

Pattern Recognition 45 (2012) 1899–1914

Page 2: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

approaches: typical attack detection, frontal bare face detection,and skin color detection. The first category is the typical attackdetection approach [12–14]. This approach tries to detect specificobjects commonly used by criminals to conceal their faces. Wenet al. introduced a mask detection method that checked facialfeatures based on Gabor filter responses [12]. Wen et al. alsoproposed a helmet detection method using a circular arc detector[13]. Min et al. presented a scarf detection method which utilizesGabor wavelet-based feature and support vector machine (SVM)classifier [14].

The second category is the frontal bare face detection approach[15–18]. This approach attempts to find frontal bare faces using aholistic face detector, and then determines facial occlusion basedon the detection results. Yoon and Kee [15] and Kim et al. [16]proposed an occluded face detection method which applies aprinciple component analysis (PCA) feature extractor and SVMclassifier to upper and lower parts of facial images. Choi and Kim[17,18] presented a facial occlusion discrimination method whichutilizes modified census transform-based detectors and a neuralnetwork-based classifier to search frontal upright faces and facialfeatures inside the heuristically determined regions.

The last category is the skin color detection approach [19–21].This approach searches for facial regions using motion edgedetection or background subtraction, and then determines facialocclusion based on skin color analysis. Lin and Liu [19] proposed amethod which locates an ATM user’s face by fitting an ellipse tomotion edges and determines facial occlusion by calculating theratio of skin color area. Kim et al. [20] proposed a method whichdetects a face by applying a B-spline-based active contour to themotion edge and checks facial occlusion by analyzing the skincolor ratios in eye and mouth regions. Dong and Soh [21]suggested a method which finds a face based on backgroundsubtraction and skin color detection. It examines facial occlusionusing aspect ratio of skin color foreground region and relativelocations of facial features.

To develop a practical system, we analyzed a system whichassesses the recognizability of facial images for ATM applicationsand drew the following four requirements. A practical systemshould (1) be able to handle various attacks which can bepresented in real situations, (2) be able to deal with various facialpostures and acceptable partial occlusions for user convenience,(3) be robust to various illumination conditions, and (4) havereasonable computational complexity to operate in real time.Considering the previous approaches in terms of these require-ments, the typical attack detection approach does not satisfy thefirst requirement because it can only handle a few specific attacksituations, such as when the assailant is wearing a mask, helmet,

or scarf. The frontal bare face detection approach fails to meet thesecond requirement since it can only accept frontal bare faceswhile rejecting others with various facial postures or acceptablepartial occlusions. The skin color detection approach does notfulfill the third requirement due to its unstableness againstvarious illuminations [22,23].

Based on these analyses, we propose a novel and practicalsystem which can be applied to real ATM environments in thefollowing ways. Firstly, this paper proposes a recognizabilityassessment system (rather than an occlusion detection system)using component-based face candidate generation and verifica-tion methodologies. This system is able to handle various occlu-sion attacks, facial postures and acceptable partial occlusions. Aselement techniques are implemented using grayscale image-based methods, it is robust to illumination conditions comparedwith the skin color-based approach. Secondly, this paper proposesa system architecture which first conducts the recognizabilityassessment inside the regions of interest produced by holistic facedetectors (specialized for frequently occurring postures) beforedealing with the whole image region. This architecture achievesboth high performance and cost-efficiency so that it makes thesystem applicable to practical ATM environments. The balance ofperformance and computational cost is critical in a resource-limited real-time embedded system such as ATMs [24]. Lastly,and differing from previous works which were tested with a smalldatabase acquired in restricted environments, the feasibility ofthe proposed system was evaluated using a large-scale facialocclusion database which can reflect practical ATM situations.This database consists of 3168 image sequences (each imagesequence contains 5 images) including 21 facial occlusions,8 illumination conditions, and 2 acquisition scenarios. The com-parison of the proposed method with the previous works issummarized in Table 1.

The rest of this paper is organized as follows. An overview ofthe proposed system is presented in Section 2. Face candidategeneration and verification methodologies are described inSection 3. Holistic face detection for establishing regions ofinterest is explained in Section 4. Database acquisition andperformance evaluation are presented in Section 5. Finally, inSection 6, we conclude the paper with a summary and somesuggestions for future work.

2. Overview of the proposed method

To undertake a recognizability assessment of facial images,it is necessary to have a definition of a recognizable facial image.

Table 1Comparison of proposed method with previous works.

Method Pros Cons

Typical attack detection – Efficient for dealing with few frequent occlusion

attacks

– Low computational cost

– Unable to handle various types of occlusions in real

situations

Frontal bare face detection – Effective for faces in strictly intrusive conditions

– Able to obtain most recognizable facial images

– Unable to handle various facial postures and

acceptable partial occlusions

Skin color detection – Robustly obtain face regions in various facial

postures

– Low computational cost

– Sensitive to illumination conditions

– Difficult to distinguish face from other skin areas

and objects with similar color

Proposed method – Robust against various facial postures and

acceptable partial occlusions

– Insensitive to illumination conditions

– Slightly high computational cost

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–19141900

Page 3: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

In this paper, recognizability of facial image is defined based on theknowledge regarding ‘face recognition by human’ since the facialimages classified as recognizable by the proposed method willmost likely be used in the follow-up criminal investigationsconducted by humans. We define a facial image as recognizablewhen a mouth and at least one eye are visible. Our definition isbased on the visibility of facial components since it is well knownthat the most salient parts for face recognition by human are, inorder of importance, eyes, mouth, and nose [25,26]. Among fourmost salient facial components (two eyes, mouth, and nose), oneeye and nose are dropped for recognizability test. Firstly, one eye isdropped since the shape of one eye is predictable by observing theother visible eye due to the bilateral symmetry in human faces[27,28], and one eye can often be occluded in natural situationsmainly by an eye patch, hair or a hat. Secondly, nose is droppedfrom recognizability test because its importance for face recogni-tion by human is less than those of eyes and mouth [25,26] andcriminals seldom occlude their noses alone but usually occludenoses along with mouths. This definition can reduce false rejec-tions which may occur when acceptable partial occlusions such assingle eye occlusion caused by an eye patch, hair or a hat, arepresent. This reduction of false rejection is expected to reducecomplaints from ATM users who have no intention of committing acrime. Fig. 1 shows example facial images acquired in ATMenvironments. In this figure, (a) and (b) are recognizable andnon-recognizable facial images according to our definition, respec-tively. These example images are cropped from our facial occlusiondatabase which will be introduced in the experimental section.

To detect a recognizable facial image which satisfies ourdefinition, the proposed system utilizes three techniques: com-ponent-based face candidate generation, component-based facecandidate verification, and holistic face detection. The first tech-nique is component-based face generation. This method gener-ates face candidates by combining pre-detected mouth and eyeregions. After that, it selects appropriate candidates which satisfypre-determined geometric constraints. This approach is morerobust to different facial postures and various acceptable partialocclusions compared to the holistic face detection approach [29].The second technique is component-based face candidate verifi-cation. The proposed system does not verify the recognizability offacial images by using a whole facial region but by using theregions of facial components. This verification scheme is morerobust to various acceptable partial occlusions which frequentlyoccur on a subject’s forehead, chin or cheek. The last technique isholistic face detection. Detecting small facial components in alarge image by searching various locations and scales requires

high computational costs. Therefore, component-based face can-didate generation and verification are first applied to the regionsof interest (ROIs) found by holistic face detectors specialized forfrontal and inclined faces which are likely to occur in ATMenvironments, and then the same process is applied to the wholeimage region if the former procedure fails. This approach canreduce the average computation time.

Fig. 2 shows the overall architecture of the proposed systemwhich consists of two searching approaches: ROI search and wholeimage search. In the ROI search, the recognizability assessmentbased on component-based face candidate generation and verifica-tion is conducted inside the restricted regions produced by the ROIestablishment procedure after acquiring each image. The wholeimage search is activated when no recognizable face is found in theROI search. In the whole image search, the recognizability assess-ment reuses the images acquired in the ROI search while expand-ing its search range into the whole image region withoutestablishing any ROIs. The ROI search is computationally moreefficient since it searches only inside the ROIs. On the other hand,the whole image search can handle more variations of facialpostures and acceptable partial occlusions because its search rangeis not limited by holistic face detection results. In order to combinethese advantages and find a recognizable facial image morequickly, the computationally more efficient approach (ROI search)is placed in the first trial and the higher detection rate approach(whole image search) in the second trial as shown in Fig. 2. Sincethe purpose of this system is to obtain a recognizable facial image,the system is terminated immediately and accepts the user’srequest as soon as at least one recognizable facial image is found.

3. Face candidate generation and verification

The proposed system assesses the recognizability of facialimages by using component-based face candidate generation andverification methods. The component-based face candidate genera-tion method produces face candidates using facial componentregions and their geometric constraints. After that, the recogniz-ability of each face candidate is assessed by verifying the facialcomponent regions which support the generated face candidates.

3.1. Facial component detection

Before performing the component-based face candidate gen-eration method, facial components such as a mouth and eyesshould be found. To accomplish this task, the proposed system

Fig. 1. Example facial images acquired in ATM environments. (a) and (b) show recognizable and non-recognizable facial images according to our definition, respectively.

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–1914 1901

Page 4: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

utilizes the Viola-Jones general object detector [30] which is oneof the most widely used detectors for various object classes suchas faces [31], facial components [32,33] and pedestrians [34]because of its high detection rate [35], computational efficiency[36] and availability of an open source implementation [37].

The Viola-Jones method selects the most discriminative Haarwavelets features, and builds a strong classifier by combiningmultiple features using the AdaBoost learning algorithm. Thismethod uses two approaches to improve detection performanceand speed. One is a rejection cascade and the other is an integralimage. The rejection cascade constructed by strong classifiersquickly rejects most negative examples by its initial layers, andonly difficult examples are dealt with by the latter layers. Theintegral image makes the computation of the Haar waveletfeature response the same through all scales. Since the Viola-Jones method detects an object by sliding a window in variouslocations and scales, its computational cost highly increases whenan initial scale of the object is much smaller than the size of thetarget image.

The Viola-Jones method has been used for detecting variousfacial components such as a mouth, nose and eyes [32,33], andthe quantitative evaluation of them was recently presented in[33]. We evaluated the mouth and eye detectors (available in[38]) which showed the best performances in [33]. The labels ofthe used detectors in [33] are M, LE, and RE for mouth, left eye,and right eye, respectively. These detectors locate mouths andeyes well, but produce many false positives, especially inside afacial region as shown in Fig. 3(a)–(c). These three figures showthe detection results of the mouth, left eye and right eye detectorsin [33], respectively. The false positives inside the facial region

can degrade the facial component-based recognizability assess-ment system since it could falsely accept users who cover theirmouth by misinterpreting a nose as a mouth. Therefore, werequire newly trained detectors for the mouth, left eye and righteye. To overcome the drawbacks present in the previous detec-tors, we added mouth-free and eye-free face images into negativetraining samples as well as face-free images when training mouthand eye detectors, respectively. Fig. 3(d)–(f) shows the resultsproduced by the newly trained mouth, left eye and right eyedetectors, respectively. It can be noticed that newly traineddetectors generate fewer false positives in a facial region com-pared with previous detectors proven to give the best perfor-mances in [33]. Detailed descriptions of training and tests will bepresented in the experimental section.

3.2. Component-based face candidate generation

Once the mouth and eye regions are detected via Viola-Jonesfacial component detectors, this system generates face candidatesusing the component-based method. Face candidates are gener-ated by combining a mouth and an eye. Final candidates are thenselected to satisfy the pre-determined geometric constraints. Thismethod is described in detail in Fig. 4. In this figure, (a) is a meanface model where the red square and inverted triangle indicatethe face area and connections of three facial feature points(mouth and two eyes), respectively. This model is produced byfacial images of 180 people where each face includes 80 featurepoints. Green points and blue circles are feature points fromeach person and their mean locations, respectively. Details of thisface database are described in [39]. The component-based face

Fig. 3. Comparison of the facial component detection results. (a)–(c) are the results of the mouth, left eye and right eye detectors in [33], respectively. (d)–(f) are the results

of the newly trained mouth, left eye and right eye detectors, respectively.

Fig. 2. Overall architecture of the proposed system.

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–19141902

Page 5: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

candidate generator first detects the mouth and eyes, and thenproduces initial face candidates by combining a mouth and aneye. This method calculates the scale, translation and rotationbetween two detected facial feature points (centers of thedetected mouth and eye regions) and the corresponding pointsin the mean face model via Procrustes analysis [40], and then theface area in the mean face model (a red square in Fig. 4(a)) istransformed into the image. In Fig. 4(b), it is assumed that two lefteyes (LE1 and LE2 with red dotted rectangles), one right eye (RE1with a green dotted rectangle) and two mouths (M1 and M2 withblue dotted rectangles) are detected. Fig. 4(c)–(h) with magentasquares show the generated face candidates from the mouth-eyecombinations: (LE1, M1), (LE1, M2), (RE1, M1), (RE1, M2), (LE2,M1), and (LE2, M2), respectively.

The final face candidates are selected by filtering the initialones via the pre-determined geometric constraints restricted byATM environments and the arrangement of facial features. We usethe following size and orientation constraints: (1) face size, (2)size ratios among face, mouth and eye, and (3) the in-planerotation angle of the face. Firstly, the range of face size isdetermined based on the configuration of the ATM camera. Thispaper considers an off-the-shelf ATM [41] widely used in Koreanbanks, a detailed description of which will be presented in theexperimental section. In the configuration of this ATM, thedistance between a user’s face and the camera is in the range of30cm to 100cm as the user must approach the ATM in order to useits touch panel. The camera used for the experiment has aresolution of 640�480 pixels and a 981�741 field of view (witha wide angle lens) in both horizontal and vertical directions. If we

assume that a user’s face size in Fig. 4(a) with a red square is about20�20cm and is projected near the image center of the camera,then the size range of face images will be approximately in therange of 240�240 pixels to 75�75 pixels when the camerafollows the equidistance projection model, the most commonprojection model for a wide field-of-view camera [42]. Secondly,the size ratio parameters are chosen based on the mean facemodel shown in Fig. 4(a). In this model, the ratio between face andmouth widths is 2.6 and the ratio between face and eye is 3.5. Weallow a 50% margin for minimum and maximum parameters ofthese ratios to cover facial posture variations. Lastly, the range ofthe in-plane rotation angle is simply set to þ301 to -301 byassuming that users tend not to roll their head too much. Theseloose constraints may produce some false candidates but they willbe rejected in the face candidate verification step in Section 3.2,and so we try not to lose true face candidates in this process. Ifthese geometric constraints are applied to the face candidates inFig. 4(c)–(h), only (c) and (e) are classified as reasonable candi-dates. Fig. 4(d) and (f) will be filtered out because face size, sizeratio between face and eye, and size ratio between face and mouthdo not satisfy the pre-defined criteria. While the face shown inFig. 4(g) will be discarded due to the large in-plane rotation, theface in Fig. 4(h) will be removed because the size ratio betweenface and eye does not satisfy the pre-determined criterion.

Since the final face candidates are generated by a mouth andan eye, two candidates can be produced for the same face regionas shown in Fig. 4(c) and (e). Therefore, two faces generated bythe same mouth region and different eye regions are merged viaProcrustes analysis of three point correspondences (one mouth

eye width

mouth width

face width

RE1LE1

M1

LE2M2

Fig. 4. Component-based face candidate generation. (a) and (b) are a mean face model and detected facial components, respectively. (c)–(f) are initial face candidates

generated using (a) and (b). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–1914 1903

Page 6: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

and two eyes). In this procedure, we do not remove two facecandidates supported by two facial components after the mergingprocess because the true face candidates should not be deleted byfalse merging. Fig. 5 shows the final face candidates generatedfrom Fig. 4(b). In this figure, (a) with a cyan square shows themerged face candidate of (b) and (c).

Example results of the component-based face candidate gen-eration are shown in Fig. 6. To present the details, face candidateregions generated from the large original images were cropped inthis figure. Blue, red and green dotted lines indicate the mouth,left eye and right eye, respectively, and magenta and cyan squaresare face candidates supported by two and three facial compo-nents, respectively. It can be seen that faces with various posturesand acceptable partial occlusions are successfully found by thecomponent-based face candidate generation procedure. Furtherexamples with different postures and occlusions will be pre-sented in the experimental section.

3.3. Component-based face candidate verification

After generating the face candidates, we need to determinewhether there are any recognizable facial images in those candi-dates. For this procedure, this system utilizes a component-basedface candidate verification approach which does not determinerecognizability via whole facial region but facial componentregions. This component-based method is robust against variousacceptable partial occlusions which frequently occur on a subject’s

forehead, chin or cheek due to an eye patch, hair or hat. Thus, itcauses the system to produce less false rejections produced byusers who have no intention of committing a crime. According tothe definition of recognizable facial images introduced in Section 2,the component-based face candidate verification procedure deter-mines a facial image as recognizable when both a mouth regionand at least an eye region are verified as being real. In this process,we give priority to the face candidate supported by three facialcomponents (a mouth and two eyes) as this candidate is morelikely to be real compared to those supported by two facialcomponents. This causes the system to quickly accept the barefaces so that it can reduce computational costs in acceptance cases.

To verify facial components, facial component regions arerotated to make them upright according to the in-plane rotationof the face candidate, and resized to have the pre-determinedimage resolution. After that, the intensity values of each facialcomponent region are normalized via the Z-score normalizationtechnique [43] for illumination compensation. Finally, the normal-ized facial component regions are verified using the principalcomponent analysis (PCA) feature extractor [44,45] and supportvector machine (SVM) classifier [46,47]. The detailed descriptionsof training and tests will be presented in the experimental section.

4. Region of interest establishment

The recognizability assessment performed using component-based face generation and verification is suited to our definition

Fig. 5. Final face candidates generated from Fig. 4(b). (a) is a merged face candidate of (b) and (c).

Fig. 6. Example results of the component-based face candidate generation. Blue, red and green dotted lines indicate the mouth, left eye and right eye, respectively, and

magenta and cyan squares are face candidates supported by two and three facial components, respectively. Face candidate regions generated from the large original

images were cropped to show the details. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–19141904

Page 7: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

of a recognizable facial image, and is robust to facial postures andacceptable partial occlusions. However, its computational cost(especially for the facial component detection process) is con-sidered high to be applied to whole image regions. This is afundamental drawback of the sliding window approach such asthe Viola-Jones method [30] which detects a small object bysliding a window in various locations and scales. This drawbackmakes the computational costs of facial component detectorsquite high because the initial sizes of the facial components aremuch smaller than the size of the original target image. If it isassumed that the initial object size, scaling factor, and initialdisplacement of the detection window are 20�15 pixels, 1.25and 2 pixels, respectively, Fig. 7 shows the number of windowsthat should be scanned by a detector with various original targetimage sizes. The numbers in the figure are calculated based on themethod presented in [31]. It can be noticed that the number ofwindows rapidly increases as the target image gets lager. Thenumbers in this figure are nearly proportional to the computa-tional costs of the detectors. This problem of computational costis critical for a resource-limited real-time embedded system suchas ATMs [24]. To overcome this problem, the proposed systemfirst establishes regions of interest (ROIs) via holistic face detec-tors, and then applies component-based face candidate genera-tion and verification to these regions before dealing with thewhole images as mentioned when describing the proposed

system architecture in Section 2. The computational cost of theholistic face detection procedure for establishing ROIs is muchsmaller than that of the facial component detection proceduresince the initial size of the face is much larger than those of thefacial components.

In this paper, Viola-Jones holistic face detectors [31] arechosen to establish ROIs. Firstly, we try to use a holistic frontalface detector which shows the best performance in [33]. The labelof this detector in [33] is FA1. Fig. 8(a)–(e) show the detectionresults of this detector with five facial postures: frontal, panningtoward the left and right, and tilting toward up and down,respectively. This frontal face detector works properly in the firstfour facial postures but tends to fail with the inclined faces inFig. 8(e). However, these inclined faces frequently occur whenusing the ATM model in [41] since users must look down at thetouch panel which is parallel to the ground plane as shown inFig. 9. This shape is typical for recent ATMs in Korea because itprevents criminals from looking at passwords over an ATM user’sshoulder, as compared to models whose touch panels are placednormally in relation to the ground. To overcome this difficulty, weconstructed an additional new holistic face detector suited forinclined faces since a single detector for all facial postures showedless accurate results in [48]. Fig. 8(f) shows the detection resultsof the newly trained inclined face detector, which works betterfor inclined faces. Two holistic face detectors (one for frontal facesand the other for inclined faces) are separately applied to thedown-sampled images, and the detected ROIs are then mergedinto one if they are sufficiently overlapped.

5. Experiments

5.1. Database acquisition

To evaluate the proposed system in a practical manner, weconstructed a large-scale facial occlusion database which canreflect real ATM environments. This database consists of 2688image sequences including 21 facial occlusions, 8 illuminationconditions, 4 subjects, and 2 acquisition scenarios. Each situationwas repeated twice. First, an ATM model was built resembling awidely used ATM in Korean banks [41] as shown in Fig. 9. Theresolution and field of view of the camera attached to the ATMmodel are 640�480 pixels and 981�741 in horizontal andvertical directions, respectively. This is one of the conventionallow priced cameras used for real ATMs.

0 0.5 1 1.5 2x 105

640×480

521×384

410×307

328×246

262×197

210×157

168×126

134×101

107×81

86×64

Number of windows

Targ

et im

age

reso

lutio

n (p

ixel

s)

Fig. 7. Number of windows to be scanned by a detector with various original

target image sizes.

Fig. 8. Example results of the holistic face detection for various facial postures. (a)–(e) are the results of the frontal face detector, and (f) shows the results of the inclined

face detector.

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–1914 1905

Page 8: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

To acquire a more realistic database for ATM applications, wedesigned two scenarios for image sequence acquisition. One is anon-intrusive scenario and the other is an intrusive scenario. Inthe non-intrusive scenario, the camera acquires an imagesequence while a user enters a card and selects menus as shownin Fig. 10(a). This period is chosen because the card slot on theATM model is located just below the camera, ensuring that itobtains frontal face images as often as possible. In the intrusivescenario, image sequences were taken after the ATM system madean announcement saying, ‘‘Please, look at the camera’’ as shownin Fig. 10(b). The total acquisition period takes approximately2–3 s. Fig. 11(a) and (b) show example image sequences takenunder non-intrusive and intrusive scenarios, respectively. In thisfigure, the numbers at the top-left corners indicate the acquisitionorders.

To diversify facial occlusion conditions, we added 21 differentfacial occlusions to the ATM database including 10 acceptancecases and 11 rejection cases. The acceptance cases contain barefaces and acceptable partial occlusions with a hand, an eye patch,normal glasses, a mobile phone, a headset, mustache, Band-Aids, abaseball cap, and hair. The rejection cases include severe occlu-sions involving a hand, sunglasses, a thick beard (mouth and chinare occluded), a mask, a muffler, a collar, a baseball cap (both eyesare occluded), a motorcycle helmet, and long hair. The rejectioncases also contain situations where the user hides, turns theirhead, and looks down. During database acquisition, we attemptedto make variations even in the same occluding object. Forinstance, the database includes different types of mustaches,masks, mufflers, collars, baseball caps, normal glasses, sunglasses,and wigs. In this database, false mustaches and beards were used.

Example images of the acceptance and rejection cases are shownin Figs. 12 and 13, respectively.

The database also contains eight different illumination condi-tions. These illumination variations were demonstrated by chan-ging the directions of the extra lights installed on a level of theuser’s face with the fixed ordinary ceiling lights as shown inFig. 14. Example images taken under 8 different illuminationconditions are presented in Fig. 15. In this figure, the numbers atthe top-left corners correspond to the types of illuminationconditions in Fig. 14. In this figure, it can be seen that thebrightness of the facial regions differ. The brightness of the wholeimage regions seem to be similar to one another as the automaticgain control (AGC) function was turned on during image acquisi-tion to ensure the most realistic situations.

The final database consists of 2688 image sequences including21 occlusion cases (10 for acceptance cases and 11 for rejectioncases), 8 illumination conditions, 4 subjects, 2 acquisition scenar-ios, and 2 trials for each situation. Each image sequence consistsof five images. A detailed description of the database is presentedin Table 2 in terms of the various occlusion cases. For both non-intrusive and intrusive databases, each occlusion case has 64image sequences (8 illuminations�4 subjects�2 trials), andthere are 640 image sequences for the acceptance case and 704images sequences for the rejection case.

5.2. Evaluation of element techniques

5.2.1. Facial component detection

As discussed in Section 3.2, facial component detectors for themouth, left eye and right eye have been newly trained via theViola-Jones object detection framework [30]. In our trainingprocedure, mouth-free face images and eye-free face images wereadded into negative training samples as well as face-free imageswhen learning mouth and eye detectors, respectively. To obtainthese training samples, we used publicly available face databasesincluding AR [49], CASPEAL-R1 [50], CMU-PIE [51], FERET [52],GEORGIA TECH [53], POINTING [54], INDIAN [55], VALID [56], andCVL [57] which include various facial postures, illuminations, andexpressions. From these databases, 6500 mouth images and 2000mouth-free face images were extracted to train a mouth detector,and 7000 left eye images and 2000 eye-free face images wereextracted to train an eye detector. Also, 1500 face-free imagesobtained from [58] were added into the negative images of bothdetectors. The right eye detector was built by flipping the Haarwavelet features of the left eye detector. This approach has beensuccessfully used for multi-view face detectors [48]. Eye detectorsare separated into left and right eyes due to the fact a singledetector covering both eyes showed a poorer performance

Insert card

Select menu

Enter password

Correct ?

Accept requestReject request

Yes

Yes

No

No Recognizable ?

ImagesequenceacquisitionperiodCheck facial images

Insert card

Select menu

Enter password

Correct ?

Announcement

Check facial images

Accept requestReject request

Recognizable ?

Yes

Yes

No

No

Imagesequenceacquisitionperiod

Fig. 10. Scenarios for image sequence acquisition. (a) non-intrusive scenario (b) intrusive scenario.

135

cm

40 cm

95 c

m40

cm camera

touch paneltouch panel

60 cm

40 c

m

camera

touchpanel

camera

touchpanel

Fig. 9. An ATM model in (a) side-view and (b) top-view.

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–19141906

Page 9: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

compared to the separated eye detectors in [33], and this aspect isalso discussed in [48]. These training procedures were conductedvia an open source implementation in [37]. The newly trainedmouth and eye detectors were evaluated using 1227 imagesselected from the intrusive ATM database in Section 5.1. In thisexperiment, a facial component is considered to have beendetected if the root of the Jaccard index (commonly used segmen-tation metric) [59] calculated from the detected region and theground truth region is larger than a pre-determined threshold(0.7). Detection of the opposite eye is not considered a falsedetection as in [33]. Detection rates of the newly trained left andright eye detectors are 79.8% and 71.7%, respectively, when thefalse detections per image are 4.6 and 3.9, respectively. The leftand right eye detectors in [33] (LE and RE) show detection rates of45.3% and 40.9%, and 6.5 and 5.5 of the false detections per imagewith the same test images, respectively. The detection rates of the

eye detectors are dramatically increased while reducing the falseacceptances. In the case of a mouth detector, the newly trainedone shows a 69.3% detection rate and 1.2 false acceptances per animage when the mouth detector in [33] (M) gives a detection rateof 63.3% and 13.2 false acceptances per image. The newly trainedmouth detector significantly decreases false acceptances whilesomewhat improving the detection rate.

5.2.2. Facial component verification

The detected face regions are verified using their facialcomponent regions. Feature vectors are extracted from the eyeand mouth regions via the principal component analysis (PCA)feature extractor [44–45] and classified by a support vectormachine (SVM [46,47]. To train the mouth verifier, mouth imagesamples were acquired by applying the newly trained mouth

Fig. 12. Example images of the acceptance cases.

54321

54321

Fig. 11. Examples of image sequences acquired with (a) non-intrusive scenario and (b) intrusive scenario. The numbers at the top-left corners indicate the acquisition

orders.

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–1914 1907

Page 10: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

detector to public face databases [49–57] and face-free images(obtained from [58,60], and the Internet). The images used totrain the facial component detectors were excluded from thisprocedure. After applying the mouth detector, we gathered thedetected regions and manually designated positive and negativesamples. True and false positives of the mouth detector weretreated as positive and negative samples for the mouth verifiertraining procedure, respectively. The sample data acquisitionprocedure for an eye verifier is exactly the same except the eyedetectors were used instead of the mouth detector. As a result,3000 positive and 3000 negative samples for training and 5000positive and 5000 negative samples for the tests were collectedfor each mouth and eye verifier. Samples obtained were resizedand normalized based on Z-score normalization [43]. After that,feature vectors were extracted by PCA and the verifiers weretrained via SVM classifier. Feature vector dimensions of mouthand eye verifiers were experimentally set to 45, and a radial basis

function (RBF) kernel was chosen for the SVM classifier. Throughthese procedures, we achieved 7.2% and 8.4% equal error rates(EERs) for mouth and eye verifiers, respectively.

5.2.3. Holistic face detection

The proposed system uses holistic face detectors specializedfor frontal and inclined faces to establish the regions of interest(ROIs). To detect frontal faces, we chose the detector which showsthe best performance in a recently published survey paper [33].This detector is labeled FA1 in [33]. The inclined face detector isnewly trained since the frontal face detector gives a poor resultfor this facial posture which frequently occurs in our ATM cameraconfiguration. To train this detector, 8000 positive samples werecollected from the CASPEAL-R1 face database [50] and 3000 face-free images from [58]. When acquiring positive samples, weapplied small in-plane-rotations to the original positive samplesfor making variations. This manipulation was originally used in[48]. The performances of the frontal and inclined face detectorswere evaluated using 1227 images selected from the intrusiveATM database (the same image set used to test the componentdetectors). In this experiment, a face detector is determined tosucceed if the detected regions include all mouth and eye regionssince the detector is used to restrict the search range in thissystem. As a result, the detection rates of the frontal and inclinedface detectors are 71.3% and 51.3%, respectively, and if two facedetectors are combined, the detection rate increases to 79.8%.When two detectors are combined, the search range of the facialcomponent detectors is reduced by 91.0%, including the falselydetected regions.

5.3. Performance evaluation of the whole system

To present the performance of the proposed system, wedepicted the receiver operating characteristic (ROC) curves which

Fig. 14. Different illumination conditions for database acquisition.

Fig. 13. Example images of the rejection cases.

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–19141908

Page 11: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

Table 2Description of database.

No. Occluding object (occluded

or visible regions)

Accept/

reject

No. of

subjects

No. of image sequences

Non-

intrusive

scenario

Intrusive

scenario

1 Bare face (no occlusion) Accept 4 64 64

2 Hand or eye patch (single eye occluded) Accept 4 64 64

3 Hand (cheek occluded) Accept 4 64 64

4 Normal glasses (both eyes and mouth visible) Accept 4 64 64

5 Mobile phone (cheek and single ear occluded) Accept 4 64 64

6 Headset (both ears occluded) Accept 4 64 64

7 mustache (both eyes and mouth visible) Accept 4 64 64

8 Band-aids (forehead, cheek or chin occluded) Accept 4 64 64

9 Baseball cap (more than one eye and mouth visible) Accept 4 64 64

10 Wig (more than one eye and mouth visible) Accept 4 64 64

11 Hand (both eyes occluded) Reject 4 64 64

12 Hand (mouth, cheek and chin occluded) Reject 4 64 64

13 Sunglasses (both eyes occluded) Reject 4 64 64

14 Beard (mouth, cheek and chin occluded) Reject 4 64 64

15 Mask (mouth, cheek and chin occluded) Reject 4 64 64

16 Muffler (mouth, cheek and chin occluded) Reject 4 64 64

17 Collar (mouth, cheek and chin occluded) Reject 4 64 64

18 Baseball cap (both eyes occluded) Reject 4 64 64

19 Helmet (whole face occluded except eyes) Reject 4 64 64

20 wig (both eyes occluded) Reject 4 64 64

21 Hiding or looking back (whole face occluded) Reject 4 64 64

Total 10 acceptance cases – 4 640 64011 rejection cases – 4 704 704both cases – 4 1,344 1,344

Fig. 15. Example images taken under 8 different illumination conditions. Numbers at the top-left corners correspond to the types of illumination conditions in Fig. 14.

0 0.1 0.2 0.3 0.4 0.50.5

0.6

0.7

0.8

0.9

False accept rate (FAR)

True

acc

ept r

ate

(TA

R)

ROI search + Whole image search(Proposed system)Whole image searchROI search(frontal & inclined face detectors)ROI search(only frontal face detector)

0 0.1 0.2 0.3 0.4 0.50.5

0.6

0.7

0.8

0.9

1

False accept rate (FAR)

True

acc

ept r

ate

(TA

R)

ROI search + Whole image search(Proposed system)Whole image searchROI search(frontal & inclined face detectors)ROI search(only frontal face detector)

Fig. 16. ROC curves of four different approaches in (a) a non-intrusive database and (b) an intrusive database. (For interpretation of the references to color in this figure

legend, the reader is referred to the web version of this article.)

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–1914 1909

Page 12: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

show both the false accept rate (FAR) and the true accept rate(TAR). Fig. 16(a) and (b) are the ROC curves of four differentapproaches in non-intrusive and intrusive scenarios, respectively.These curves were depicted by changing the threshold values ofthe facial component verifiers. In this figure, the red dotted lineindicates the performance of the proposed system combining the

ROI search and the whole image search via the proposed archi-tecture as described in Section 2 with Fig. 2. The green dash-dotline indicates the performance when using only the whole imagesearch. Blue dashed and magenta solid lines indicate the perfor-mances when using the ROI search with and without an inclinedface detector in the holistic face detection procedure, respectively.When using the proposed method, the threshold values of thefacial component verifiers in the whole image search were setmore tightly than those in the ROI search since the whole imagesearch could produce more false face candidates compared to theROI search.

In Fig. 16, it can be seen that the proposed system shows thebest performances in both non-intrusive and intrusive scenarioswhen comparing equal error rates (EERs). The EERs of theproposed system in these two scenarios are 8.3% and 3.0%,respectively. In the case of the non-intrusive scenario as shownin Fig. 16(a), the proposed system outperforms the ROI searchregardless of the inclusion of an inclined face detector. Such aperformance gain could be interpreted as evidence that theproposed method could handle more variations caused by com-binations of facial postures and acceptable partial occlusions. Theholistic face detectors in the ROI search exhibit limitations whendealing with these variations. The proposed method shows aslightly better performance than the whole image search. This isdue to the fact that facial component verifiers in the whole imagesearch are more tightly set when it is used as a part of theproposed system as mentioned previously, so that the proposedsystem produces less false positives than the whole image search.In the case of the intrusive scenario shown in Fig. 16(b), fourapproaches provide almost the same performances. This isbecause at least one frontal face is included in each imagesequence acquired under this intrusive scenario. The ROI searchshows slightly better performance than the whole image searchbecause the whole image search tends to produce a greaternumber of false acceptances than the ROI search.

Table 3 shows the processing times of the four approachesdepicted in Fig. 16. This result was obtained using a Core2Duo2.12 GHz CPU with 1 GB memory. It is almost the same

Table 3Average processing time of four approaches.

Approach Non-intrusive

database (s)

Intrusive database (s)

Accepted

sequence

Rejected

sequence

Accepted

sequence

Rejected

sequence

ROI searchþWhole image

search (proposed system)

0.83 6.47 0.62 6.48

Whole image search 2.32 4.09 2.09 4.09

ROI search (frontal &

inclined face detectors)

0.38 2.38 0.51 2.39

ROI search (only frontal face

detectors)

0.32 2.30 0.46 2.31

Table 4Average processing time of each module for one image.

Module name Processing

time (sec)

ROI establishment (image down-samplingþfrontal and inclined

face detection)

0.162

Component-based face candidate generation in ROI (facial

component detection in ROIþfiltering via geometric

constraints)

0.052

Component-based face candidate verification in ROI (eye and

mouth verification in ROI)

0.005

Component-based face candidate generation in whole image

(facial component detection in whole imageþfiltering via

geometric constraints)

0.822

Component-based face candidate verification in whole image

(eye and mouth verification in whole image)

0.007

Table 5Performance of the proposed system for various occlusions.

No. Occluding object (occluded or visible regions) Accept/Reject Accuracy (%)

Non-intrusive

scenario

Intrusive

scenario

1 Bare face (no occlusion) Accept 100.0 100.0

2 Hand or eye patch (single eye occluded) Accept 93.8 98.4

3 Hand (cheek occluded) Accept 90.6 95.3

4 Normal glasses (both eyes and mouth visible) Accept 92.2 100.0

5 Mobile phone (cheek and single ear occluded) Accept 78.1 85.9

6 Headset (both ears occluded) Accept 100.0 100.0

7 Mustache (both eyes and mouth visible) Accept 84.4 93.8

8 Band-Aids (forehead, cheek or chin occluded) Accept 93.8 100.0

9 Baseball cap (more than one eye and mouth visible) Accept 87.5 100.0

10 Wig (more than one eye and mouth visible) Accept 96.9 96.9

11 Hand (both eyes occluded) Reject 93.8 98.4

12 Hand (mouth, cheek and chin occluded) Reject 78.1 84.4

13 Sunglasses (both eyes occluded) Reject 78.1 96.8

14 Beard (mouth, cheek and chin occluded) Reject 100.0 100.0

15 Mask (mouth, cheek and chin occluded) Reject 96.9 95.3

16 Muffler (mouth, cheek and chin occluded) Reject 95.3 96.9

17 Collar (mouth, cheek and chin occluded) Reject 90.6 96.9

18 Baseball cap (both eyes occluded) Reject 92.2 98.4

19 Helmet (whole face occluded except eyes) Reject 87.5 100.0

20 Wig (both eyes occluded) Reject 100.0 100.0

21 Hiding or looking back (whole face occluded) Reject 96.9 100.0

Avg. True accept rate (TAR) for 10 accept cases – 91.7 97.0

True reject rate (TRR) for 11 reject cases – 91.8 97.0

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–19141910

Page 13: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

specification of the computing unit in the off-the-shelf ATM. Forinstance, the Comnet-9000DM in [41] uses a Core2Duo 2.2 GHzCPU with 1 GB memory. To show more realistic results, weinclude the image acquisition interval (0.5 s per an image) whenmeasuring the computing time as presented in the proposedsystem architecture in Fig. 2. For the accepted image sequences,the proposed system runs approximately three times faster thanthe whole image search. This is because the proposed systemactivates the computationally heavy whole image search onlywhen the ROI search fails to find a recognizable facial image. TheROI search is faster than the proposed system because it tries tofind a recognizable facial image only inside the ROIs designated

by the holistic face detection procedure. For the rejected imagesequences, the proposed system needs a higher computationalcost compared to the others since all the images in a sequenceshould be checked by both the ROI search and the whole imagesearch before it is rejected. However, the computing time of therejected case is less important than that of the accepted casein real ATM situations because the rejected cases usually causedby intentional attacks seldom occur compared to the acceptedcases.

For showing the processing time of the proposed method indetail, we divided the proposed method into several modules andcalculated the average processing time of each module by using

Fig. 17. Example images of (a) successful cases and (b) failed cases. Blue, red and green dotted lines indicate the mouth, left eye and right eye, respectively, and magenta

and cyan squares are face regions supported by two and three facial components, respectively. Face regions generated from large original images were cropped to show the

details. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–1914 1911

Page 14: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

2688 images (one image from each sequence in Table 2). Theaverage processing time of each module for one image is pre-sented in Table 4. It can be noticed that the component-based facecandidate generation in whole image is the most time-consumingmodule since this module searches for small regions of facialcomponents in large images.

By considering the performances and computational costs ofthe four approaches, we can draw up the following guidelines ofthe recognizability assessment system for ATM applications. (1)When user cooperation is not guaranteed, the proposed system isrecommended rather than using either the ROI search or thewhole image search. (2) If there is a severe limitation of comput-ing power, it is recommended that the ROI search be used whileurging user cooperation. (3) If it is necessary to use the ROI searchin cases where user cooperation is not guaranteed, its perfor-mance can be enhanced by adding holistic face detectors specia-lized for frequently occurring postures, such as an inclined facedetector.

Table 5 presents the detailed performances of the proposedsystem for 21 facial occlusions at the equal error operating point,and example results of the recognizability assessment are shownin Fig. 17. To present the details, face regions produced from thelarge original images (shown in Figs. 12 and 13) were cropped inthis figure. Table 5 and Fig. 17(a) show that the proposed systemcorrectly finds the recognizable facial images in cases of variousfacial postures and acceptable partial occlusions. In the accep-tance cases, the performance of the proposed system is slightlydegraded with a mobile phone and mustache. In the case of usinga mobile phone, several sequences include faces with severe in-plane rotations, and component detection failed. In the case of amustache, the mouth detector occasionally gives poor detectionresults due to a thick black mustache above the upper lips. Typicalfalse acceptances are shown in Fig. 17(b). These failures occurredbecause the facial component detectors and verifiers have diffi-culty correctly distinguishing local mouth and eye regions. Forexample, strong horizontal edges on fingers, masks, and collars

Table 6Performance of the proposed system for various illuminations.

DB Accuracy Illumination condition

1 2 3 4 5 6 7 8

Non-intrusive database TAR (%) 91.3 92.5 92.5 92.5 91.3 88.8 92.5 92.5

TRR (%) 88.6 92.0 90.9 87.5 92.0 94.3 94.3 94.3

Intrusive database TAR (%) 96.3 96.3 96.3 96.3 97.5 97.5 98.8 97.5

TRR (%) 100.0 95.5 96.6 96.6 98.9 94.3 98.9 95.5

Table 7Description of extra database.

No. Occluding object (occluded or visible regions) Accept/

Reject

No. of subjects No. of image sequences

Non-

intrusive

scenario

Intrusive

scenario

1 Bare face (no occlusion) Accept 15 30 30

2 Normal glasses (both eyes and mouth visible) Accept 15 30 30

3 Baseball cap (more than one eye and mouth visible) Accept 15 30 30

4 Eye patch (single eye occluded) Accept 15 30 30

5 Sunglasses (both eyes occluded) Reject 15 30 30

6 Mask (mouth, cheek and chin occluded) Reject 15 30 30

7 Baseball cap (both eyes occluded) Reject 15 30 30

8 Helmet (whole face occluded except eyes) Reject 15 30 30

Total 4 acceptance cases – 15 120 1204 rejection cases – 15 120 120both cases – 15 240 240

Table 8Performance of proposed system for extra database.

No. Occluding object (occluded or visible regions) Accept/

Reject

Accuracy (%)

Non-

intrusive

scenario

Intrusive

scenario

1 Bare face (no occlusion) Accept 96.7 100.0

2 Normal glasses (both eyes and mouth visible) Accept 90.0 100.0

3 Baseball cap (more than one eye and mouth visible) Accept 90.0 100.0

4 Eye patch (single eye occluded) Accept 100.0 96.7

5 Sunglasses (both eyes occluded) Reject 90.0 100.0

6 Mask (mouth, cheek and chin occluded) Reject 100.0 100.0

7 Baseball cap (both eyes occluded) Reject 90.0 100.0

8 Helmet (whole face occluded except eyes) Reject 96.7 96.7

Avg. True accept rate (TAR) for 4 accept cases – 94.2 99.2True reject rate (TRR) for 4 reject cases – 94.2 99.2

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–19141912

Page 15: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

could locally look similar to a mouth as shown in the first row ofFig. 17(b).

Table 6 shows the true accept rates (TARs) and true reject rates(TRRs) of the proposed system in terms of 8 different illuminationconditions at the equal error operating point. The proposedsystem shows consistently good performances in this table, so itcan be said that this system is robust against illuminationconditions. This is because we try not to use skin color detectionapproach which can be easily affected by illumination conditions[22,23] but grayscale image-based pattern recognition methodsare utilized.

Since the database in Table 2 consists of small number ofsubjects (although having various facial occlusions), we acquiredan extra database with 15 more subjects who were not includedin the previous database. This extra database consists of 480image sequences includes 8 typical occlusion cases (4 for accep-tance cases and 4 for rejection cases), 2 acquisition scenarios, and2 trials. Since the insensitivity of the proposed method to variousillumination conditions has already been proven in Table 6, onlythe normal illumination condition (#1 in Fig. 14) was included inthe extra database. A detailed description of the extra database ispresented in Table 7 in terms of the various occlusion cases, andthe performance of the proposed method for this database ispresented in Table 8 at the equal error operating point. It caneasily be seen that the performance of the proposed method for15 people in extra database is similar (even about 2% higher) to itsperformance for 4 people in previous database.

6. Conclusion and future work

This paper proposes a system that assesses the recognizabilityof facial images for automated teller machine (ATM) applications.The proposed system achieves the robustness against facialpostures and acceptable partial occlusions using a component-based approach, and insensitivity toward illumination conditionsvia the grayscale image-based methods. Its feasibility in bothnon-intrusive and intrusive scenarios was proven with a largescale facial occlusion database. According to the experimentalresults, several guidelines were drawn up for the recognizabilityassessment system in ATM environments. In the future, we willdevelop a method to prevent spoofing attacks using 2D or 3Dfacial masks.

Acknowledgment

This work was supported by the National Research Foundationof Korea (NRF) grant funded by the Korea government (MEST)(No. 2011-0015321).

References

[1] A. Hutchinson, The top 50 inventions of the past 50 years, Popular Mechanics(2005).

[2] R. Lemos, J. Timmis, M. Ayara, S. Forrest, Immune-inspired adaptable errordetection for automated teller machines, IEEE Transactions on Systems, Man,and Cybernetics — Part C 37 (5) (2007) 873–886.

[3] A.T.M. Crime: Overview of the European situation and golden rules on how toavoid it, European Network and Information Security Agency (2009).

[4] Y. Tang, Z. He, Y. Chen, J. Wu, ATM intelligent surveillance based on omni-directional vision, in: Proceedings of World Congress on Computer Scienceand Information Engineering, 2009, pp. 660–664.

[5] H. Sako, T. Watanabe, H. Nagayoshi, T. Kagehiro, Self-defence-technologiesfor automated teller machines, in: Proceedings of International MachineVision and Image Processing Conference, 2007, pp. 177–184.

[6] C.S. Kim, M.K. Lee, Secure and user friendly pin entry method, in: Proceedingsof International Conference on Consumer Electronics, 2010, pp. 203–204.

[7] H. Sako, T. Miyatake, Image-recognition technologies towards advancedautomated teller machines, in: Proceedings of International Conference onPattern Recognition, 2004, pp. 282–285.

[8] S. Prabhakar, S. Pankanti, A.K. Jain, Biometric recognition: security andprivacy concerns, IEEE Security & Privacy 1 (2) (2003) 33–42.

[9] G. Graevenitz, Biometric authentication in relation to payment systems andATMs, Datenschutz und Datensicherheit 31 (9) (2007) 681–683.

[10] M. Negin, T.A. Chmielewski, M. Salganicoff, U.M. Seelen, P.L. Venetainer,G.G. Zhang, An iris biometric system for public and personal use, IEEEComputer Magazine 33 (2) (2000) 70–75.

[11] L. Duan, X. Yu, Q. Tian, Q. Sun, Face pose analysis from MPEG compressed videofor surveillance applications, in: Proceedings of International Conference onInformation Technology: Research and Education, 2003, pp. 549–553.

[12] C. Wen, S. Chiu, Y. Tseng, C. Lu, The mask detection technology for occludedface analysis in the surveillance system, Journal of Forensic Science 50 (3)(2005) 1–9.

[13] C. Wen, S. Chiu, J. Liaw, C. Lu, The safety helmet detection for ATM’ssurveillance system via the modified Hough transform, in: Proceedings ofAnnual IEEE International Carnahan Conference on Security Technology,2003, pp. 364–369.

[14] R. Min, A. Dangelo, J.-L. Dugelay, Efficient scarf detection prior to facerecognition, in: Proceedings of European Signal Processing Conference,2010, pp. 259–263.

[15] S.M. Yoon, S.C. Kee, Detection of partially occluded face using support vectormachines, in: Proceedings of IAPR Conference on Machine Vision Applica-tions, 2002, pp. 546–549.

[16] J. Kim, Y. Sung, S.M. Yoon, B.G. Park, A new video surveillance systememploying occluded face detection, Lecture Notes in Computer Science 3533(2005) 65–68.

[17] I. Choi, D. Kim, Facial fraud discrimination using detection and classification,Lecture Notes in Computer Science 6455 (2010) 199–208.

[18] I. Choi, D. Kim, Facial fraud detection using adaboost and neural network,in: Proceedings of the Postech-Kyutech Joint Workshop, 2010.

[19] D. Lin, M. Liu, Face occlusion detection for automated teller machinesurveillance, Lecture Notes in Computer Science 4319 (2006) 641–651.

[20] G. Kim, J.K. Suhr, H.G. Jung, J. Kim, Face occlusion detection by using b-splineactive contour and skin color information, in: Proceedings of InternationalConference on Control, Automation, Robotics and Vision, 2010, pp. 627–632.

[21] W. Dong, Y. Soh, Image-based fraud detection in automatic teller machine,International Journal of Computer Science and Network Security 6 (11)(2006) 13–18.

[22] M.-H. Yang, D.J. Kriegman, N. Ahuja, Detecting faces in images: a survey, IEEETransactions on Pattern Analysis and Machine Intelligence 24 (1) (2002) 34–58.

[23] P. Kakumanu, S. Makrogiannis, N. Bourbakis, A survey of skin-color modelingand detection methods, Pattern Recognition 40 (3) (2007) 1106–1122.

[24] B. Kisacanin, S.S. Bhattacharyya, S. Chai, Embedded Computer Vision,Springer, New York, 2009.

[25] A. O’Toole, M. Tistarelli, Face Recognition in Humans and Machines, in:M. Tistarelli, S.Z. Li, R. Chellappa (Eds.), Handbook of Remote Biometrics forSurveillance and Security, Springer, London, 2009.

[26] P. Sinha, B. Balas, Y. Ostrovsky, R. Russell, Face Recognition by Humans:Nineteen Results All Computer Vision Researchers Should Know About,Proceedings of IEEE 94 (11) (2006) 1948–1962.

[27] N.F. Troje, H.H. Bulthoff, How is Bilateral Symmetry of Human Faces Used forRecognition of Novel Views? Vision Research 38 (1) (1998) 79–89.

[28] W.Y. Zhao, R. Chellappa, Symmetric Shape-from-Shading Using Self-ratioImage, International Journal of Computer Vision 45 (1) (2001) 55–75.

[29] B. Heisele, T. Serre, T. Poggio, A component-based framework for facedetection and identification, International Journal of Computer Vision 74(2) (2007) 167–181.

[30] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simplefeatures, in: Proceedings of International Conference on computer vision andpattern recognition, 2001, pp. 511–518.

[31] P. Viola, M. Jones, Robust real-time face detection, International Journal ofComputer Vision 57 (2) (2004) 137–154.

[32] M. Castrillon, O. Deniz, C. Guerra, M. Hernandez, ENCARA2: Real-time detection ofmultiple faces at different resolutions in video streams, Journal of VisualCommunication and Image Representation 18 (2) (2007) 130–140.

[33] M. Castrillon, O. Deniz, D. Hernandez, J. Lorenzo, A comparison of face andfacial feature detectors based on the Viola–Jones general object detectionframework, Machine Vision and Applications, Published online (2010).

[34] A. Broggi, P. Cerri, S. Ghidoni, P. Grisleri, H.G. Jung, A new approach to urbanpedestrian detection for automatic braking, IEEE Transactions on IntelligentTransportation Systems 10 (4) (2009) 594–605.

[35] J. Wu, S.C. Brubaker, M.D. Mullin, J.M. Rehg, Fast asymmetric learning forcascade face detection, IEEE Transactions on Pattern Analysis and MachineIntelligence 30 (3) (2008) 369–382.

[36] M. Hussein, F. Porikli, L. Davis, A comprehensive evaluation framework and acomparative study for human detectors, IEEE Transactions of IntelligentTransportation Systems 10 (3) (2009) 417–427.

[37] Intel Open Source Computer Vision Library, /http://sourceforge.net/projects/opencvlibrary/S.

[38] Haar Cascades, /http://www.alereimondo.com.ar/OpenCV/34S.[39] S.J. Lee, K.R. Park, J. Kim, A SfM-based 3D face reconstruction method robust

to self-occlusion by using a shape conversion matrix, Pattern Recognition 44(7) (2011) 1470–1486.

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–1914 1913

Page 16: Author's personal copy - Yonsei Universityweb.yonsei.ac.kr/jksuhr/papers/Recognizability assessment...Author's personal copy Recognizability assessment of facial images for automated

Author's personal copy

[40] J.C. Gower, G.B. Dijksterhuis, Procrustes problems, Oxford University Press,New York, 2004.

[41] Comnet-9000DM, /http://www.chunghocomnet.com/english/index.aspS.[42] C. Hughes, P. Denny, M. Glavin, E. Jones, Equidistant fish-eye calibration and

rectification by vanishing point extraction, IEEE Transactions on PatternAnalysis and Machine Intelligence 32 (12) (2010) 2289–2296.

[43] K.L. Priddy, P.E. Keller, Artificial neural networks: an introduction, SPIE Press,Bellingham, 2005.

[44] M. Turk, A. Pentland, Eigenfaces for recognition, Journal of CognitiveNeurosicence 3 (1) (1991) 71–86.

[45] Y. Dong, G.N. DeSouza, Adaptive learning of multi-subspace for foregrounddetection under illumination changes, Computer Vision and Image Under-standing 115 (1) (2011) 31–49.

[46] C. Cortes, V. Vapnik, Support-vector networks, Machine Learning 20 (3)(1995) 273–297.

[47] LIBSVM — A library for support vector machines, /http://www.csie.ntu.edu.tw/�cjlin/libsvm/S.

[48] P. Viola, M.J. Jones, Fast multi-view face detection, Technical Report TR2003-96,Mitsubishi Electric Research Laboratories, 2003.

[49] A.M. Martinez, R. Benavente, The AR face database, CVC Technical Report#24, 1998.

[50] W. Gao, B. Cao, S. Shan, X. Chen, D. Zhou, X. Zhang, D. Zhao, The CAS-PEALlarge-scale Chinese face database and baseline evaluations, IEEE Transactionson System Man, and Cybernetics — Part A 38 (1) (2008) 149–161.

[51] T. Sim, S. Baker, M. Bsat, The CMU pose, illumination, and expressiondatabase, IEEE Transactions on Pattern Analysis and Machine Intelligence25 (12) (2003) 1615–1618.

[52] P.J. Phillips, H. Moon, P.J. Rauss, S. Rizvi, The FERET evaluation methodologyfor face recognition algorithms, IEEE Transactions on Pattern Analysis and

Machine Intelligence 22 (10) (2000) 1090–1104.[53] Georgia tech face database, /http://www.anefian.com/research/face_reco.htmS.[54] N. Gourier, D. Hall, J.L. Crowley, Estimating face orientation from robust

detection of salient facial features, in: Proceedings of Pointing, ICPR, Inter-national Workshop on Visual Observation of Deictic Gestures, 2004.

[55] V. Jain, A. Mukherjee, The Indian face database, /http://vis-www.cs.umass.edu/�vidit/IndianFaceDatabaseS.

[56] N.A. Fox, B.A. O’Mullane, R.B. Reilly, A. VALID, New Practical Audio-VisualDatabase, and Comparative Results, Lecture Notes in Computer Science 3546

(2005) 777–786.[57] CVL face database, /http://lrv.fri.uni-lj.si/facedb.htmlS.[58] N. Seo, Tutorial: OpenCV haartraining, /http://note.sonots.com/SciSoftware/

haartraining. htmlS.[59] R.J. Radke, S. Andra, O. Al-Kofahi, eeB. Roysam, Image change detection

algorithms: a systematic survey, IEEE Transactions on Image Processing 14

(3) (2005) 294–307.[60] Caltech background database, /http://www.vision.caltech.edu/html-files/

archive.htmlS.

Jae Kyu Suhr received his B.S. degree in Electronic Engineering at Inha University, Incheon, Korea, in 2005, and the M.S. and Ph.D. degrees in Electrical and ElectronicEngineering at Yonsei University, Seoul, Korea, in 2007 and 2011, respectively. He is currently working as a post-doctoral researcher in Research Institute of AutomotiveElectronics and Control, and Recognition System for Intelligent Vehicle Laboratory in Hanyang University, Seoul, Korea. His current research interests include computervision, image analysis, and pattern recognition for intelligent vehicle and visual surveillance.

Sungmin Eum received his B.S., M.S. degrees in Electrical and Electronic Engineering at Yonsei University, Seoul, Korea, in 2009 and 2011, respectively. He is currentlyworking as a post-master researcher in Recognition System for Intelligent Vehicle Laboratory in Hanyang University, Seoul, Korea. His research interests include computervision, image analysis, and pattern recognition.

Ho Gi Jung received his B.E., M.E., and Ph.D. degrees in electronic engineering from Yonsei University, Seoul, Korea, in 1995, 1997, and 2008, respectively. He was withMando Corporation Global R&D H.G., from 1997 to 2009. From May 2009 to February 2011, he was with Yonsei University as a full-time researcher and a researchprofessor. Since March 2011, he has been with Hanyang University as an assistant professor. His interests are recognition system for intelligent vehicle, next generationvehicle, computer vision applications, and pattern recognition applications.

Gen Li received his B.S. degree in Computer Science and Technology in Southwest University for Nationalities at Chengdu, China, in 2006, and is currently working towardhis Ph.D. degree in Electrical and Electronic Engineering at Yonsei University, Seoul, Korea. His current research interests include computer vision, image analysis, andpattern recognition.

Gahyun Kim received her B.S. degree in Information Electronics Engineering at Ewha Womans University, Seoul, Korea, in 2009, and the M.S. degree in Electrical andElectronic Engineering at Yonsei University, Seoul, Korea, in 2011. She is currently working as a post-master researcher in Recognition System for Intelligent VehicleLaboratory in Hanyang University, Seoul, Korea. Her research interests include computer vision, image analysis, and pattern recognition.

Jaihie Kim received his BS in electronic engineering from Yonsei University, Republic of Korea, in 1979, the PhD in electrical engineering from Case Western ReserveUniversity, USA in 1984. Since 1984, he has been a professor in the School of Electrical and Electronic Engineering, Yonsei University. Currently, he is the director of theBiometric Engineering Research Center in Korea, the chairman of the Korea Biometric Association, a member of the National Academy of Engineering of Korea, and in 2008,he was the president of the Institute of Electronics Engineers of Korea (IEEK) and now an emeritus president of the IEEK. His general research interests are biometrics,pattern recognition, and computer vision. Specifically, some of his recent research topics include touchless fingerprint recognition, fake fingerprint detection, fake facedetection, 2D–to–3D face conversion, and face age estimation/synthesis. He was the author of many international technical journals.

J.K. Suhr et al. / Pattern Recognition 45 (2012) 1899–19141914