Facial Expression Recognition from Video Sequences Based...

13
Research Article Facial Expression Recognition from Video Sequences Based on Spatial-Temporal Motion Local Binary Pattern and Gabor Multiorientation Fusion Histogram Lei Zhao, 1 Zengcai Wang, 1,2 and Guoxin Zhang 1 1 Vehicle Engineering Research Institute, School of Mechanical Engineering, Shandong University, Jinan 250061, China 2 Key Laboratory of High Efficiency and Clean Mechanical Manufacture, Ministry of Education, School of Mechanical Engineering, Shandong University, Jinan 250061, China Correspondence should be addressed to Zengcai Wang; [email protected] Received 14 November 2016; Revised 3 January 2017; Accepted 26 January 2017; Published 19 February 2017 Academic Editor: Alessandro Gasparetto Copyright © 2017 Lei Zhao et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. is paper proposes novel framework for facial expressions analysis using dynamic and static information in video sequences. First, based on incremental formulation, discriminative deformable face alignment method is adapted to locate facial points to correct in-plane head rotation and break up facial region from background. en, spatial-temporal motion local binary pattern (LBP) feature is extracted and integrated with Gabor multiorientation fusion histogram to give descriptors, which reflect static and dynamic texture information of facial expressions. Finally, a one-versus-one strategy based multiclass support vector machine (SVM) classifier is applied to classify facial expressions. Experiments on Cohn-Kanade (CK) + facial expression dataset illustrate that integrated framework outperforms methods using single descriptors. Compared with other state-of-the-art methods on CK+, MMI, and Oulu-CASIA VIS datasets, our proposed framework performs better. 1. Introduction Content and intent of human communication can be clarified by facial expressions synchronizing dialogues [1, 2]. Ekman and Friesen first proposed that six basic facial statuses reflect different emotion content [3]. Basic emotions are summa- rized into six facial statuses including happiness, anger, sadness, disgust, surprise, and fear and are oſten analyzed in facial expression recognition research. Attaining automatic recognition and understanding facial emotion will promote development of various fields, such as computer technology, security technology, and medicine technology in applica- tions, such as human-computer interaction, driver fatigue recognition, pain and depression recognition, and deception detection and stress monitoring. Owing to practical value and theoretical significance of medical and cognitive scientists, facial expression recognition became of great interest in research [4, 5]. Recently, this field made much progress because of advancements in related research subareas, such as face detection [6], tracking and recognition [7], and new developments in machine learning areas, such as supervised learning [8, 9] and feature extraction. However, accurate recognition of facial expressions is still a challenging problem because of its dynamic, complex, and subtle facial expression changes and different head poses [10, 11]. Most recent facial expression recognition methods focused on means of analyzing frame of neutral expression or frame of peak phase of any of six basic expressions in video sequence (i.e., static-based facial expression recognition) [12]. Static-based method analyzes facial expressions and disregards some important dynamic expression features. Con- sequently, this method performs poorly in some practical applications. Dynamic-based method extracts time and mo- tion information from image sequence of facial expressions. Movement of facial landmarks and changes in facial tex- ture are dynamic features containing useful information Hindawi Mathematical Problems in Engineering Volume 2017, Article ID 7206041, 12 pages https://doi.org/10.1155/2017/7206041

Transcript of Facial Expression Recognition from Video Sequences Based...

Page 1: Facial Expression Recognition from Video Sequences Based ...downloads.hindawi.com/journals/mpe/2017/7206041.pdf · Facial Expression Recognition from Video Sequences Based on Spatial-Temporal

Research ArticleFacial Expression Recognition from Video SequencesBased on Spatial-Temporal Motion Local Binary Pattern andGabor Multiorientation Fusion Histogram

Lei Zhao1 Zengcai Wang12 and Guoxin Zhang1

1Vehicle Engineering Research Institute School of Mechanical Engineering Shandong University Jinan 250061 China2Key Laboratory of High Efficiency and Clean Mechanical Manufacture Ministry of Education School of Mechanical EngineeringShandong University Jinan 250061 China

Correspondence should be addressed to Zengcai Wang wangzcsdueducn

Received 14 November 2016 Revised 3 January 2017 Accepted 26 January 2017 Published 19 February 2017

Academic Editor Alessandro Gasparetto

Copyright copy 2017 Lei Zhao et alThis is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

This paper proposes novel framework for facial expressions analysis using dynamic and static information in video sequencesFirst based on incremental formulation discriminative deformable face alignment method is adapted to locate facial points tocorrect in-plane head rotation and break up facial region from background Then spatial-temporal motion local binary pattern(LBP) feature is extracted and integrated with Gabor multiorientation fusion histogram to give descriptors which reflect staticand dynamic texture information of facial expressions Finally a one-versus-one strategy based multiclass support vector machine(SVM) classifier is applied to classify facial expressions Experiments on Cohn-Kanade (CK) + facial expression dataset illustratethat integrated framework outperforms methods using single descriptors Compared with other state-of-the-art methods on CK+MMI and Oulu-CASIA VIS datasets our proposed framework performs better

1 Introduction

Content and intent of human communication can be clarifiedby facial expressions synchronizing dialogues [1 2] Ekmanand Friesen first proposed that six basic facial statuses reflectdifferent emotion content [3] Basic emotions are summa-rized into six facial statuses including happiness angersadness disgust surprise and fear and are often analyzed infacial expression recognition research Attaining automaticrecognition and understanding facial emotion will promotedevelopment of various fields such as computer technologysecurity technology and medicine technology in applica-tions such as human-computer interaction driver fatiguerecognition pain and depression recognition and deceptiondetection and stressmonitoringOwing to practical value andtheoretical significance of medical and cognitive scientistsfacial expression recognition became of great interest inresearch [4 5] Recently this field made much progress

because of advancements in related research subareas suchas face detection [6] tracking and recognition [7] and newdevelopments in machine learning areas such as supervisedlearning [8 9] and feature extraction However accuraterecognition of facial expressions is still a challenging problembecause of its dynamic complex and subtle facial expressionchanges and different head poses [10 11]

Most recent facial expression recognition methodsfocused on means of analyzing frame of neutral expressionor frame of peak phase of any of six basic expressions in videosequence (ie static-based facial expression recognition)[12] Static-based method analyzes facial expressions anddisregards some important dynamic expression features Con-sequently this method performs poorly in some practicalapplications Dynamic-based method extracts time and mo-tion information from image sequence of facial expressionsMovement of facial landmarks and changes in facial tex-ture are dynamic features containing useful information

HindawiMathematical Problems in EngineeringVolume 2017 Article ID 7206041 12 pageshttpsdoiorg10115520177206041

2 Mathematical Problems in Engineering

representing underlying emotional status Therefore dy-namic information should be extracted from over entirevideo sequence during facial expression recognition

This paper presents video-basedmethod for facial expres-sions analysis using dynamic and static textures features offacial region First constrained local model (CLM) basedon incremental formulation is used to align and track facelandmark Then dynamic and static features are extractedand fused to enhance recognition accuracy spatial-temporalmotion LBP concatenating LBP histograms on XT and YTplanes of whole video sequence to estimate changes of facialtexture during the whole process and Gabormultiorientationfusion histogram in peak expression frame to representstatus of facial texture when facial expression occur Finallyclassification is accomplished using multiclass SVM usingone-versus-one strategy

This paper mainly provides the following contributions(a) integration of spatial-temporal motion LBP with Gabormultiorientation fusion histogram (b) analysis on contribu-tions of three LBP histograms on three orthogonal planesto accuracy of face expression recognition and (c) spatial-temporal motion LBP facial dynamic features

The remainder of this paper is structured as follows Sec-tion 2 presents background of facial expression recognitionmethods and discusses contribution of current approachesSection 3 shows detailed description of proposed approachesSection 4 details experiments employed for evaluation of pro-posed framework performance Finally Section 5 concludesthe paper

2 Background

Facial expression analysis systems are designed to classifyimage or video sequences into one basic emotion of sixpreviously mentioned expressions Facial expressions changeface shape and texture (ie deformation of eyebrows moutheyes and skin texture) through movement of facial musclesThese variations can be extracted and adopted to classifygiven facial image or video sequences Though methods forfacial expression recognition may include different aspectsrecent research can be divided into two main streams facialaction unit based (AU-based) techniques and content-basedtechniques both of which are detailed in Sections 21 and 22respectively

21 AU-Based Expression Recognition InAU-based approachesFacial Action Coding System (FACS) is extensively used toolfor describing facial movements [25] FACS bridges facialexpression changes and motions of facial muscles producingthem This system defines nine different AUs in upper partof the face eighteen AUs in lower part of the face andfive AUs which belong to neither upper nor lower part ofthe face Moreover this system defines some action unitssuch as eleven for describing head position nine for statesof eyes and fourteen for other actions AUs which are thesmallest identifiable facial movements [26] are used bymanyexpression recognition systems [27ndash30]

In [31] several approaches are compared to classify AU-based facial expressions these approaches include princi-pal component analysis (PCA) linear discriminate analysis(LDA) independent component analysis optical flow andGabor filters Results showed that methods using local spatialfeatures perform excellently in expression recognition How-ever PCA and LDA technologies destroy capacity to discernlocal features

The system mentioned in [26] tracks a series of facialfeature points and extracts temporal and spatial geometricfeatures from motions of feature points Geometric featuresare less susceptible to physical changes such as variation ofillumination and differences among human faces Howeverin some AUs failure is observed in methods based ongeometric features such as AU15 (mouth pressure) and AU14(dimple) The appearance of these AUs can change facialtexture but not facial feature points

Based on dynamic Bayesian network (DBN) for sponta-neous AU intensity recognition unified probabilistic modelis proposed in [32]Themodel contains two steps that is AUintensity observation extraction and DBN inference Gaborfeature histogram of oriented gradients (HOG) feature andSVM classifier-based framework are employed to extractAU intensity observation Then DBN is used to systemat-ically model dependencies among AU multi-AU intensitylevels and temporal relationships DBN model combineswith image observation to recognize AU intensity throughDBN probabilistic inference Results show that the proposedmethod can improve AU intensity recognition accuracy

In [33] work hidden Markov model (HMM) is usedto model AU in time evolution process Classification isaccomplished by maximizing extracted facial features andprobability of HMM Work in [34] combines SVM withHMM to recognize AU and recognition accuracy on eachAU is higher than result yielded when using SVM for timeevolution of feature Although both methods consider timecharacteristics of AUs interaction model among AUs is notbuilt

22 Content-Based Expression Recognition Non-AU meth-ods are content-based and use local or global facial fea-tures to analyze expressions Content-based techniques forfacial expression include static-based method [35 36] anddynamic-based method [16 18 20 22 37]

In [37] Kanade-Lucas-Tomas tracker is used to track gridpoints on face to obtain point displacements These pointsare used to train SVM classifier However this method onlyextracts geometric features through locating and trackingpoints on face In [16] system integrating pyramid histogramof gradients on three orthogonal planes (PHOG-TOP) withoptical flow is adapted to classify the emotions Two kinds offeatures respectively represent movement of facial landmarksand variation of facial texture Results show that integratedframework outperforms methods using single feature opera-tor

The work in [18] adopts method using texture operatorcalled LBP instead of geometric features Important featuresare selected by using boosting algorithmbefore training SVM

Mathematical Problems in Engineering 3

classifier In [22] volume LBP (VLBP) and LBP histogramson three orthogonal planes (LBPTOP) are used to revealdynamic texture of facial expressions in video sequenceThen based on multiclass SVM using one-versus-one strat-egy a classifier is applied Experiments conducted on Cohn-Kanade (CK) dataset demonstrate that LBPTOP outperformsformer methods

To obtain higher recognition accuracy combined geo-metric and texture features are adopted The work in [20]fuses geometric feature and Gabor feature of local facialregion to enhance the facial expression recognition accuracyDynamic information of video sequence is extracted forfeatures to construct parametric space with 300 dimensionsCompared with other methods system performs excellentlyin expression recognition when combining geometric andtexture features

23 Discussion and Contributions AU is middle-level inter-pretation of facial muscle motion such actions associatehigh-level semantics with low-level features of facial expres-sions and identify meaningful human facial movements [27]However drawbacks of the AU-based expression recognitioninclude affiliation of errors to whole recognition processand chances of reduction in accuracy when middle-levelclassification step is added in the system Static-based expres-sion recognition technology only reflects expression stateat specific time points Dynamic features extracted fromvideo sequence reveal facial expression changes and are moreaccurate and robust for expression recognition [16 20 22]

In this paper proposed framework focuses on content-based and features of fusion-based facial expression recog-nition technique and it identifies expression in video moreaccurately Most feature extraction methods of recent workare content-based and are limited to single form of dynamicor static features while in this paper we propose a method torecognize facial expression using the spatial-temporalmotionLBP from the whole video sequence and Gabor multiori-entation fusion histogram from the peak expression frameMoreover we first study contributions of LBP histogramsfrom different orthogonal planes to evaluate accuracy ofexpression recognition and to concatenate LBP histogramson XT and YT planes to establish spatial-temporal motionLBP features based on significance of expression recognitionThus in this paper framework outperforms those fromprevious studies while considering dynamic and static texturefeatures of facial expressions

3 Methodology

Proposed approach includes location and tracking of facialpoints (preprocessing) spatial-temporal motion LBP andGabor multiorientation fusion histogram extraction (featureextraction) and expression classification Figure 1 illustratesthis method Framework and its components are detailed inthe following sections

31 Preprocessing When image sequence is introduced tofacial expression recognition systems facial regions should

be detected and cropped as preprocessing step Real-timeViolandashJones face detector [38] commonly used in differentareas such as face detection recognition and expressionanalysis is adopted to crop face region coarsely in this paperViolandashJones face detector consists of cascade of classifiersemploying Haar feature trained by AdaBoost Haar feature isbased on integral image filters computed simply and fast atany location and scale

To segment face regions more accurately fiducial pointsshould be located and tracked in video sequences Basedon incremental formulation [39] CLM model is employedto detect and track face points in our framework OriginalCLM framework is patch-based method and faces can berepresented by series of image patches cropped from locationof fiducial points Patch-expert of each point is trained bylinear SVM and positive and negative patches and used asdetector to update point location Instead of dealing withupdating shape model CLM model based on incrementalformulation focuses on updating function which can mapfacial texture to facial shape and constructing robust anddiscriminatively deformable model that outperforms state-of-the-art CLM alignment frameworks [39]

Locations of detected points on contour of eyes are usedto compute the angle between line of two inner corners andimage level edge in each frame to crop the facial region frombackgroundAngle can be rotated to horizontal axis to correctany in-plane head rotation Points of two outer eyes cornersand nose tip are used to crop and scale images into 104 times128 rectangular region containing all local areas related tofacial expressions After preprocessing center in horizontaldirection of cropped image is x coordinate of center of twoeyes while 119910 coordinate of nose tip is found in lower third invertical direction of cropped image

32 Spatial-TemporalMotion LBPFeature LBP iswell-knownoperator used to describe local texture characteristics ofimage and is first proposed by Ojala et al [40] Operatorlabels image pixels by thresholding 119899 times 119899 neighborhood ofeach pixel with value of center and considering results asbinary numbers Then histogram of all labels in image canbe adopted as texture descriptor LBP is widely used in manyaspects such as face recognition texture classification tex-ture segmentation and facial expression recognition becauseof gray scale and rotation invariance advantages

Recently some extended LBP operators are introducedto describe dynamic texture and outperform other textureoperators LBPTOP is spatial-temporal operator extendedfrom LBP and is proposed in the work of [22] As shownin Figure 2 LBP codes are extracted from three orthogonalplanes (ie XY XT and YT) of video sequences For allpixels statistical histograms of three different planes canbe respectively denoted as LBP-XY LBP-XT and LBP -YT and are obtained and concatenated into one histogramHistogram is calculated as follows

119867119894119895 = sum119909119910119905

119868 119891119895 (119909 119910 119905) = 119894

119894 = 0 119899119895 minus 1 119895 = 0 1 2(1)

4 Mathematical Problems in Engineering

Video Preprocessing

Gabor multiorientation fusion histogram

Spatial-temporal LBP motion histogram

X

Y

T

Expression classification

Peak frame

Image sequences

middot middot middot

middot middot middot

Figure 1 Proposed framework for facial recognition

minus3minus2

minus1 01

23

minus3minus2

minus10

12

3minus3

minus2

minus1

0

1

2

3

XT

Y

Figure 2 Example of LBP code using radii three and eightneighboring points on three planes

where 119899119895 is numbers of different labels produced by LBPoperator in 119895th plane (119895 = 0 XY 1 XT and 2 YT) and119891119895(119909 119910 119905) represents LBP code of center pixel (119909 119910 119905) in 119895thplane

As shown in formula (1) and Figure 2 dynamic texturein video sequence can be extracted by incorporating LBPhistogram on XY plane (LBP-XY) reflecting informationof spatial domain and LBP histograms on XT and YTplanes representing spatial-temporal motion information inhorizontal and vertical directions respectively

All LBPTOP components are LBP histograms extractedfrom three orthogonal planes and have equal contribution infacial expression recognition [22] However not all LBPTOPcomponents are significant LBP-XY includes characteristicsof different subjects resulting in difficulties in accurate ex-pression recognition LBP-XT and LBP-YT contain mostinformation on spatial-temporal texture changes of facial

expression Consequently as shown in Figure 3 LBP-XYis abandoned and LBP-XT and LBP-XT are merged toconstruct spatial-temporal motion LBP feature which isapplied in facial expression recognition in our study In thispaper LBP code is used and includes radii three and eightneighboring points on each plane moreover image of eachframe is equally divided into 8 times 9 rectangular subblocks withoverlap ratio of 70

33 Gabor Multiorientation Fusion Histogram Gabor wa-velet is well-known descriptor representing texture informa-tion of an image Gabor filter can be calculated by Gaus-sian kernel multiplied by sinusoidal plane Gabor feature ishighly capable of describing facial textures used in differ-ent research such as identity recognition feature locationand facial expression recognition However Gabor performsweakly in characterizing global features and feature dataare redundant Figure 4 shows extracted features based onGabor multiorientation fusion histogram proposed in [41]these features are used in reducing feature dimension andimproving recognition accuracy

First Gabor filter of five scales and eight orientations isused to transform images The following represents multi-scale and multiorientation features of pixel point 119911(119909 119910) inimages

119866119906V (119911) 119906 isin (0 7) V isin (0 4) (2)

where 119906 and V denote orientations and scales and 119866119906V(119911) isnorm of Gabor features

Transformation of each original image produces 40 cor-responding images with different scales and orientationsDimension of obtained feature is 40 times as high as that oforiginal facial imageTherefore on the same scale features ofeight orientations are fused according to a certain rule Fusionrules are as follows

119879V (119911) = argmax119906

1003817100381710038171003817119866119906V (119911)1003817100381710038171003817

119906 isin (0 7) V isin (0 4) (3)

Mathematical Problems in Engineering 5

X

Y

T

(a) (b)LBP-YT LBP-XT

Figure 3 Features in each block sequence (a) Block sequence and (b) LBP histograms from XT and YT planes

Gabor features in eight orientations and five scales

Orientation feature fusion

Fusion features infive scales

Block

Histogram

Histogram

middot middot middot

middot middot middot

middot middot middot

Figure 4 Feature extraction procedure based on Gabor multiorientation fusion histogram

where 119879V(119911) isin [1 8] is encoding value of pixel point 119911(119909 119910)and 119866119906V(119911) is Gabor features

Here orientation of local area is evaluated by orien-tation code of maximum value of Gabor features of eightorientations Each encoding value 119879V(119911) represents one localorientation Finally each expression image is transformedinto five scales of images and each scale fuses eight orienta-tions Fusion images are shown in Figure 5 As can be seenfrom figure fused images retain the most obvious changeinformation of Gabor subfiltering Therefore Gabor fusionfeature shows high discrimination for local texture changes

Histogram can describe global features of images How-ever many structural details maybe lost when histogram ofentire image is directly calculated Thus each fusion image isdivided into 8 times 8 nonoverlapping and equal rectangular sub-blocks Accordingly histogram distributions of all subblocksare calculated according to formula (4) and combined toconstruct Gabor multiorientation fusion histogram of wholeimage

ℎV119903119894 = sum119911

119868 (119877V119903 (119911) = 119894) 119894 = 0 8 (4)

where 119877V119903(119911) is encoding value in point 119911(119909 119910) of 119894th sub-block

As obtained by coding five scales onmultiple directions ofGabor feature Gabor multiorientation fusion histogram notonly inherits Gabor wavelet advantage capturing correspond-ing spatial frequency (scale) spatial location and orientationselectivity but also reduces redundancy of feature data andcomputation complexity

34 Multifeature Fusion Features extracted in Sections 32and 33 are scaled to [minus1 1] and integrated into one vectorusing the following equation

119865 = 120572119875 + (1 minus 120572)119876 0 le 120572 le 1 (5)

where 119875 is spatial-temporal motion LBP histogram and 119876 isGabor multiorientation fusion histogram 119865 is fusion vectorof119875 and119876 and is feature vector for final prediction Parameter120572 usually depends on performance of each feature For allexperiments in this paper we set value of 120572 to 05 whileconsidering experiments results shown in Tables 2 and 3

35 Classifier Many classifiers are used for facial expressionrecognition these classifiers include J48 based on ID3nearest neighbor based on fuzzy-rough sets random forest[20] SVM[22] andHMM[29] In this paper SVM is selected

6 Mathematical Problems in Engineering

Figure 5 Fusion images on five scales

in our framework because of the following properties (1)classifier based on statistical learning theory (2) high gen-eralization performance and (3) ability to deal with featurehaving high dimensions

To achieve more effective classification of facial expres-sions SVM classifier with RBF kernel is adapted Given atraining set of instance-label pairs (119909119894 119910119894) the SVM requiresthe solution of the following optimization problem

min119908119887120585

12119908119879119908 + 119862

119897

sum119894=1

120585119894

subject to 119910119894 (119908119879120601 (119909119894) + 119887) ge 1 minus 120585119894

120585119894 ge 0

(6)

where 119909119894 are training samples 119897 is the number of trainingset 119910119894 are the labels of 119909119894 119862 is punishment coefficient thatprevents overfitting 120585119894 is relaxation variable and 119908 and 119887 arethe parameters of the classifier Here training vectors 119909119894 aremapped into a higher dimensional space by using function120601(119909119894) SVM finds a linear separating hyperplane with themaximalmargin in the higher dimensional space119870(119909119894 119909119895) equiv120601(119909119894)119879120601(119909119895) is the kernel function of SVM The RBF kernelfunction is shown as follows

119870(119909119894 119909119895) = exp (minus120574 10038171003817100381710038171003817119909119894 minus 119909119895100381710038171003817100381710038172) 120574 gt 0 (7)

where 120574 is parameter of RBF kernel In this paper grid-searchstrategy and tenfold cross validation [42] are used to selectand estimate punishment coefficient119862 and kernel parameters120574 of SVM SVM is binary classifier used only in two setsof feature spaces of objects Facial expression recognitionis multiclass problem Thus multiclassifier which consistsof binary SVM classifiers is constructed to classify facialexpressions [42] Two strategies are commonly used one-versus-one strategy and one-versus-all strategy One-versus-one strategy is applied in our method because of its robust-ness in classification and simplicity of application In thisstrategy SVM is trained for classifying any two kinds offacial expression Thus six expressions require training 15binary SVM classifiers (fear versus anger happiness versusdisgust and surprise versus sadness among others) Finalresult is expression classification with highest number ofvotes However occasionally class with most votes is not theonly result In this case nearest neighbor classifier is appliedto these classes to obtain one final result

Table 1 Number of subjects for each expression class in threedatasets

Expression CK+ MMI Oulu (each condition)Anger 45 32 80Disgust 59 28 80Fear 25 28 80Happiness 69 42 80Sadness 28 32 80Surprise 83 41 80Total 309 203 480

4 Experiments

41 Facial Expression Datasets CK+ dataset [14] is extendedfrom CK dataset [43] which was built in 2000 it is availableand most widely used dataset for evaluating performance offacial expression recognition systemsThis dataset consists of593 video sequences including seven basic facial expressionsperformed by 123 participants Ages of subjects range from18 to 30 years 35 are male 13 are Afro-American eighty-one percent are Euro-American and six percent are peopleof other race In video sequences image of each frame is640 times 490 or 640 times 480 pixels Most frames are gray imageswith eight-bit precision for grayscale values Video sequencescontain frames from neutral phase to peak phase of facialexpressions and video rate is 30 frames per second Inour study 309 image sequences containing 97 subjects withsix expressions (anger disgust fear happiness sadness andsurprise) are selected for our experiments Each subject hasone to six expressions Top row of Figure 6 shows peak framesof six expressions from six subjects in CK+ dataset Table 1shows number of subjects for each expression class in CK+dataset in our experiment

MMI dataset [44] is applied to evaluate performance ofexpression recognition methods this dataset contains 203video sequences including different head poses and subtleexpressions These expression sequences were performed by19 participants with ages ranging from 19 to 62 years Subjectsare Asian European and South American Male participantsaccount for 40 of subjects Different from CK+ datasetMMI dataset contains six basic facial expressions comprisingneutral onset apex and offset frames In video sequencesimage of each frame is 720 times 576 pixels with RGB full colorOriginal frames are processed into images with eight-bit

Mathematical Problems in Engineering 7

Table 2 Classification results of using spatial-temporal motion LBP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 1 0 20 2 2 0 800Happiness 0 0 0 69 0 0 100Sadness 4 0 2 0 21 1 750Surprise 0 0 2 1 0 80 964Overall 941

Table 3 Classification results of using Gabor multiorientation fusion histogram on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 40 2 0 0 3 0 889Disgust 2 56 0 1 0 0 949Fear 1 0 17 3 4 0 680Happiness 0 0 0 69 0 0 100Sadness 4 0 0 0 24 0 857Surprise 0 1 0 0 0 82 988Overall 932

Angry Disgust FearHappiness Sadness Surprise

Figure 6 Sample images in CK+ dataset and MMI dataset

precision for grayscale values for our study and extractedframes from neutral to apex phases of facial expressionsBottom row of Figure 6 shows peak frames of six expressionsfrom six subjects in MMI dataset Table 1 shows number ofeach expression class in MMI dataset in our experiment Asseen from Table 1 MMI dataset includes fewer subjects thanCK+ dataset

Oulu-CASIAVIS facial expression database [24] includessix basic expressions from 80 subjects between 23 to 58years old 262 of the subjects are females All the imagesequences were taken under the visible light condition Oulu-CASIA VIS contains three subsets All expressions in thethree subsets were respectively captured in three different

illumination conditions normal weak and darkThe imagesof normal illumination condition are captured by usinggood normal lighting Weak illumination means that onlycomputer display is on and each subject sits in front of thedisplay Dark illumination means almost darkness Videosequences contain frames fromneutral phase to peak phase offacial expressions and video rate is 25 frames per secondTheface regions of six expressions in Oulu-CASIA VIS datasetare shown in Figure 7 (top row normal middle row weakand bottom row dark)The resolution of the images in Oulu-CASIA VIS dataset (320 times 240) is lower than that of theimages in CK+ and MMI datasets The number of videosequences is 480 for each illumination condition Table 1

8 Mathematical Problems in Engineering

Anger Disgust FearHappiness Sadness Surprise

Figure 7 Sample images in Oulu-CASIA VIS dataset

shows number of each expression class in Oulu-CASIA VISdataset in our experiment

42 Experimental Results on CK+ Dataset Three sets of ex-periments are conducted to evaluate performance of ourmethod on CK+ dataset because head motions of subjectsin CK+ dataset are smaller than that in other two datasetsThen in this paper performance of framework proposedis compared with seven state-of-the-art methods Leave-one-out cross validation strategy is applied in these setsof experiments The punishment coefficient 119862 and kernelparameters 120574 of SVM forCK+dataset are 1048576 and 372 times10minus9 respectively

In first set of experiments contributions of three LBPhistograms on three planes (three components of LBPTOP)are investigated separately to determine accuracy of facerecognition Figure 8 presents recognition rates of classifyingsix basic expressions using LBP histograms from three indi-vidual planes (ie LBP-XY LBP-XT and LBP-YT) and theircombination (ie LBPTOP) The figure indicates that usingLBPTOP yields the highest recognition rate (945) LBP-YTdescribes better variation in texture along vertical direction(935) and lowest performance is attributed to LBP-XYcontaining appearance characteristics of different subjects(822) Then we investigate the performance of spatial-temporalmotion LBP feature which joins the LBP histogramson XT and YT planes together and compare with LBPTOPBased on results shown in Figure 9 recognition rates are

LBP-XY LBP-XT LBP-YT LBPTOP60

65

70

75

80

85

90

95

100

Reco

gniti

on ra

te (

)

822

919935 945

Figure 8 Recognition rates of methods using LBP histograms fromthree planes and LBPTOP

almost similar Furthermore spatial-temporal motion LBPfeature has higher recognition rate than LBPTOP in terms ofanger and disgust Results shown in Figures 8 and 9mean thefollowing (1) variation of texture onXT andYTplane ismoresignificant than XY plane in expression sequence (2) com-pared with LBPTOP dimension of spatial-temporal motionLBP feature is reduced by 13 but recognition accuracy is notdecreased

Mathematical Problems in Engineering 9

Table 4 Classification results of using fusion features on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 0 0 22 0 2 1 880Happiness 0 0 0 69 0 0 100Sadness 3 0 1 0 24 0 857Surprise 0 0 2 1 0 80 964Overall 958

Table 5 Comparison results of using fusion feature and LBPTOP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 933 (+22) 0 0 0 67 0Disgust 0 100 (00) 0 0 0 0Fear 0 0 880 (00) 0 8 4Happiness 0 0 0 100 (00) 0 0Sadness 107 0 36 0 867 (+81) 0Surprise 0 0 24 12 0 964 (00)Overall 958 (+13)

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

100

Reco

gniti

on ra

te (

)

Spatial-temporal LBPLBPTOP

Figure 9 Recognition rates of methods using LBPTOP and spatial-temporal motion LBP

Second set of experiments evaluates effectiveness of com-bining spatial-temporal motion LBP with Gabor multiorien-tation fusion histogram Tables 2ndash4 show results obtained byusing spatial-temporal motion LBP Gabor multiorientationfusion histogram and proposed framework with multiclassSVM classifier Results include recognition rate of eachexpression and overall classifications accuracy Each blackdigital represents number of correctly classified subjects ofeach expression category and each dark and bolder digitalimage represents number of misclassified subjects of each

expression category Results show that when using combina-tion of spatial-temporalmotion LBP andGabormultiorienta-tion fusion histogram framework performs better than usingindividual features

Third set of experiments compares performance betweenproposedmethods and approach using LBPTOP [22] Resultsin Table 5 show recognition rate of method using combina-tion of spatial-temporal motion LBP with Gabor multiori-entation fusion histogram (ie proposed framework) wherefigures in parentheses represent difference in accuracy inusing LBPTOP and features proposed in this paper positivefigure shows better accuracy of proposed features Based onTable 5 proposed framework performs better in two expres-sions and same in others Furthermore overall recognitionrate is slightly better when using proposed method There-foremethod fusing spatial-temporalmotion LBPwithGabormultiorientation fusion histogram outperforms LBPTOP

In this set experiment we compared our method withsystems proposed by Eskil and Benli [13] Lucey et al [14]Chew et al [15] Fan and Tjahjadi [16] and Jain et al [17]on CK+ dataset The datasets used in these papers are thesame and identical to the dataset adopted in our studyTherefore results presented in these papers can be directlycompared with our method Methods proposed by Eskiland Benli [13] Lucey et al [14] Chew et al [15] and Fanand Tjahjadi [16] used leave-one-subject-out cross validationmethod and technology proposed by Jain et al [17] usedtenfold cross validation Based on results shown in Table 6proposed framework shows average accuracy of 958 for allsix basic facial expressions performing better than methodsdeveloped by mentioned authors in this paragraph

43 Experimental Results onMMIDataset Sevenmethods in[18ndash20] are used to perform quantitative comparisons with

10 Mathematical Problems in Engineering

Table 6 Comparison of results in proposed framework and sevenstate-of-the-art methods on CK+ dataset

Research Methodology AccuracyEskil and Benli [13] High-polygon wireframe mode 850Lucey et al [14] AAM shape 689

AAM appearance 845Combined 887

Chew et al [15] Uni-hyperplane classification 894Fan and Tjahjadi [16] PHOGTOP and optical flow 909Jain et al [17] Temporal modeling of shapes 919Proposed 958

Table 7 Comparison result of proposed framework and sevensystems on MMI dataset

Research Methodology Accuracy ()Shan et al [18] LBP 4778Wang et al [19] AdaBoost 478

HMM 515ITBN 597

Fang et al [20] Active shape model 6238Active appearance model 6435Gabor + Geometry + SVM 7067

Proposed 7192

our framework on MMI dataset The punishment coefficient119862 and kernel parameters 120574 of SVM for MMI dataset are2097152 and 931 times 10minus10 respectively Datasets used inthese papers are identical to the dataset adapted in ourstudy In this experiment proposed framework performstenfold cross validation Shan et al [18] and Fang et al [20]developed methods by conducting experiments using thesame strategy we employed Methods proposed by Wang etal [19] performed twentyfold cross validation and methodsusing active shape model [20] and active appearance model[20] performed twofold cross validation results are shownin Table 7 Table 7 indicates that performance of proposedmethod onMMI dataset is inferior to CK+ dataset because ofsubtler expressions notable changes in head rotation angleand fewer training data However proposed frameworkstill performs better than the other seven state-of-the-artmethods

44 Experimental Results on Oulu-CASIA VIS Dataset Inthis section we evaluated our framework on Oulu-CASIAVIS dataset In these set experiments proposed frameworkperforms tenfold cross validation The punishment coeffi-cient 119862 and kernel parameters 120574 of SVM for this dataset are1048507 and 933 times 10minus10 respectively First our methodis tested on three subsets of Oulu-CASIA VIS respectivelyThe experimental results are shown in Figure 10 As shownin Figure 10 the recognition rate obtained for the normalillumination condition is highest and the recognition rateobtained for the dark illumination condition is worst Evi-dently the facial images captured in the weak or dark illu-mination are often missing much useful texture information

Table 8 Comparison result of proposed framework and fourmethods on Oulu-CASIA VIS dataset

Research Methodology Accuracy ()Scovanner et al [21] 3D SIFT 5583Zhao and Pietikainen [22] LBPTOP 6813Klaser et al [23] HOG3D 7063Zhao et al [24] AdaLBP 7354Proposed 7437

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

Reco

gniti

on ra

te (

)

DarkWeakNormal

Figure 10 Recognition results of proposed methods on threesubsets of Oulu-CASIA VIS dataset

for facial expression recognition Then four methods in [21ndash24] are used to perform quantitative comparisons with ourframework on the subset obtained in normal illuminationconditions The experimental results are shown in Table 8Table 8 indicates that performance of proposed method onOulu-CASIA VIS dataset is inferior to CK+ dataset but supe-rior to MMI dataset Proposed framework still outperformsthe other four state-of-the-art methods

45 Computational Complexity Evaluation In this sectioncomputational complexity of our method is analyzed inCK+ MMI and Oulu-CASIA VIS dataset Processing time isapproximated to be time needed for preprocessing extractingspatial-temporal motion LBP and Gabor multiorientationfusion histogram and classifying each image frame of videosequence Runtime (measured using computer system clock)is estimated by usingmixed-language programming based onMATLAB andC++ on an Intel (R) Core (TM) i7-6700CPU at340GHz with 8GB RAM running onWindows 10 operatingsystem Average processing time per image is under 280 286and 249ms for CK+ MMI and Oulu-CASIA VIS datasetsrespectively Then we compare computational time of ourmethodwith system using LBPTOP [22] Comparison resultsin Table 9 show that computational time of our method islower than system using LBPTOP in three datasets Resultsshow that our method is more efficient and effective than

Mathematical Problems in Engineering 11

Table 9 Runtime of our method and LBPTOP in CK+ MMI andOulu-CASIA VIS datasets

Research CK+ MMI OuluLBPTOP 349ms 357ms 317msproposed 280ms 286ms 249ms

LBPTOP-based technology for facial expression recognitionin CK+ MMI and Oulu-CASIA VIS datasets

5 Conclusion

This paper presents novel facial expression recognitionframework integrating spatial-temporal motion LBP withGabor multiorientation fusion histogram Framework com-prises preprocessing (face recognition and points locationand tracking) dynamic and static feature extraction andfeature classification outperforming seven state-of-the-artmethods on CK+ dataset seven other methods on MMIdataset and four methods on Oulu-CASIA VIS datasetExpressions of disgust and happiness are easier to classifythan other expressions demonstrating better performance ofproposed method However recognition rates are lower forfear (880) and sadness (867) on CK+ dataset comparedwith others because most anger expressions are misclassifiedas sadness and vice-versa A lamination of proposed methodis that the occlusions head rotations and illumination canreduce accuracy of expression recognition these phenomenawill be solved in our future work A framework integratingother useful information (pupil head and body movements)with the features extracted in this paper could perform betterThis issue will also be studied in our future work Theproposed method cannot achieve real-time facial expressionrecognition We will attempt to establish an efficient dimen-sionality reduction method to reduce the computationalcomplexity of our framework in the future Furthermoreproposed system will be improved and applied in study ofhuman drowsiness expression analysis in our further work

Competing Interests

The authors declare no conflict of interests

Acknowledgments

This work was supported by the Open Foundation of StateKey Laboratory of Automotive Simulation and Control(China Grant no 20161105)

References

[1] P Ekman and E L Rosenberg What the Face Reveals Basicand Applied Studies of Spontaneous Expression Using the FacialAction Coding System Oxford University Press Oxford UK2005

[2] J A Russell J M Fernandez-Dols and G Mandler The Psy-chology of Facial Expression Cambridge University Press Cam-bridge UK 1997

[3] P Ekman and W V Friesen ldquoConstants across cultures in theface and emotionrdquo Journal of Personality and Social Psychologyvol 17 no 2 pp 124ndash129 1971

[4] B Golomb and T Sejnowski ldquoBenefits of machine understand-ing of facial expressionsrdquo in NSF ReportmdashFacial ExpressionUnderstanding pp 55ndash71 1997

[5] M Pantic ldquoFace for ambient interfacerdquo in Ambient Intelligencein Everyday Life vol 3864 of Lecture Notes on Artificial Intelli-gence pp 32ndash66 Springer Berlin Germany 2006

[6] R-L Hsu M Abdel-Mottaleb and A K Jain ldquoFace detectionin color imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 5 pp 696ndash706 2002

[7] H Fang and N Costen ldquoFrom Rank-N to Rank-1 face recog-nition based on motion similarityrdquo in Proceedings of the 20thBritish Machine Vision Conference (BMVC rsquo09) pp 1ndash11 Lon-don UK September 2009

[8] C Cornelis M D Cock and A M Radzikowska ldquoVaguelyquantified rough setsrdquo in Rough Sets Fuzzy Sets Data Miningand Granular Computing 11th International Conference RSFD-GrC 2007 Toronto Canada May 14ndash16 2007 Proceedings vol4482 of Lecture Notes in Computer Science pp 87ndash94 SpringerBerlin Germany 2007

[9] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[10] M Pantic and L UM Rothkrantz ldquoAutomatic analysis of facialexpressions the state of the artrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 22 no 12 pp 1424ndash14452000

[11] B Fasel and J Luettin ldquoAutomatic facial expression analysis asurveyrdquo Pattern Recognition vol 36 no 1 pp 259ndash275 2003

[12] C P Sumathi T Santhanam and M Mahadevi ldquoAutomaticfacial expression analysis a surveyrdquo International Journal ofComputer Science amp Engineering Survey vol 3 no 6 pp 47ndash592012

[13] M T Eskil and K S Benli ldquoFacial expression recognition basedon anatomyrdquo Computer Vision and Image Understanding vol119 pp 1ndash14 2014

[14] P Lucey J F Cohn T Kanade J Saragih Z Ambadar andI Matthews ldquoThe extended Cohn-Kanade dataset (CK+) acomplete dataset for action unit and emotion-specified expres-sionrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPRW rsquo10) pp94ndash101 San Francisco Calif USA June 2010

[15] S W Chew S Lucey P Lucey S Sridharan and J F ConnldquoImproved facial expression recognition via uni-hyperplaneclassificationrdquo in Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR rsquo12) pp 2554ndash2561IEEE June 2012

[16] X Fan and T Tjahjadi ldquoA spatial-temporal framework basedon histogram of gradients and optical flow for facial expressionrecognition in video sequencesrdquoPatternRecognition vol 48 no11 pp 3407ndash3416 2015

[17] S Jain CHu and J K Aggarwal ldquoFacial expression recognitionwith temporal modeling of shapesrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 1642ndash1649 November 2011

[18] C Shan S Gong and P W McOwan ldquoFacial expression rec-ognition based on Local Binary Patterns A ComprehensiveStudyrdquo Image and Vision Computing vol 27 no 6 pp 803ndash8162009

12 Mathematical Problems in Engineering

[19] Z Wang S Wang and Q Ji ldquoCapturing complex spatio-temporal relations among facial muscles for facial expressionrecognitionrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo13) pp 3422ndash3429 IEEE Portland Ore USA June 2013

[20] H Fang N Mac Parthalain A J Aubrey et al ldquoFacial ex-pression recognition in dynamic sequences an integrated ap-proachrdquo Pattern Recognition vol 47 no 3 pp 1271ndash1281 2014

[21] P Scovanner S Ali andM Shah ldquoA 3-dimensional sift descrip-tor and its application to action recognitionrdquo in Proceedingsof the 15th ACM International Conference on Multimedia (MMrsquo07) pp 357ndash360 ACM Augsburg Germany September 2007

[22] G Zhao and M Pietikainen ldquoDynamic texture recognitionusing local binary patterns with an application to facial expres-sionsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 29 no 6 pp 915ndash928 2007

[23] A Klaser M Marszałek and C Schmid ldquoA spatio-temporaldescriptor based on 3D-gradientsrdquo in Proceedings of the 19thBritish Machine Vision Conference (BMVC rsquo08) pp 275ndash281Leeds UK September 2008

[24] G Zhao XHuangMTaini S Z Li andM Pietikainen ldquoFacialexpression recognition from near-infrared videosrdquo Image andVision Computing vol 29 no 9 pp 607ndash619 2011

[25] P Ekman W V Friesen and J C Hager Facial Action CodingSystem A Human Face Salt Lake City Utah USA 2002

[26] M F Valstar and M Pantic ldquoFully automatic recognition ofthe temporal phases of facial actionsrdquo IEEE Transactions onSystems Man and Cybernetics Part B Cybernetics vol 42 no1 pp 28ndash43 2012

[27] Y Zhang and Q Ji ldquoActive and dynamic information fusion forfacial expression understanding from image sequencesrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol27 no 5 pp 699ndash714 2005

[28] M Pantic and I Patras ldquoDynamics of facial expression recog-nition of facial actions and their temporal segments from faceprofile image sequencesrdquo IEEE Transactions on Systems Manand Cybernetics Part B Cybernetics vol 36 no 2 pp 433ndash4492006

[29] Y Tong W Liao and Q Ji ldquoFacial action unit recognitionby exploiting their dynamic and semantic relationshipsrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol29 no 10 pp 1683ndash1699 2007

[30] Y Zhang Q Ji Z Zhu and B Yi ldquoDynamic facial expressionanalysis and synthesis with MPEG-4 facial animation param-etersrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 18 no 10 pp 1383ndash1396 2008

[31] G Donate M S Bartlett J C Hager P Ekman and T JSejnowski ldquoClassifying facial actionsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 21 no 10 pp974ndash989 1999

[32] Y Li S M Mavadati M H Mahoor Y Zhao and Q JildquoMeasuring the intensity of spontaneous facial action units withdynamic Bayesian networkrdquo Pattern Recognition vol 48 no 11pp 3417ndash3427 2015

[33] J J-J Lien T Kanade J F Cohn and C-C Li ldquoDetectiontracking and classification of action units in facial expressionrdquoRobotics and Autonomous Systems vol 31 no 3 pp 131ndash1462000

[34] M F Valstar and M Pantic ldquoCombined support vector ma-chines and hidden Markov models for modeling facial action

temporal dynamicsrdquo Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence and LectureNotes in Bioinformatics) vol 4796 pp 118ndash127 2007

[35] T Danisman I M Bilasco J Martinet and C DjerabaldquoIntelligent pixels of interest selection with application to facialexpression recognition using multilayer perceptronrdquo SignalProcessing vol 93 no 6 pp 1547ndash1556 2013

[36] W-L Chao J-J Ding and J-Z Liu ldquoFacial expression recog-nition based on improved local binary pattern and class-regularized locality preserving projectionrdquo Signal Processingvol 117 pp 1ndash10 2015

[37] I Kotsia and I Pitas ldquoFacial expression recognition in imagesequences using geometric deformation features and supportvector machinesrdquo IEEE Transactions on Image Processing vol16 no 1 pp 172ndash187 2007

[38] P Viola and M J Jones ldquoRobust Real-Time Face DetectionrdquoInternational Journal of Computer Vision vol 57 no 2 pp 137ndash154 2004

[39] A Asthana S Zafeiriou S Cheng andM Pantic ldquoIncrementalface alignment in the wildrdquo in Proceedings of the 27th IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo14) pp 1859ndash1866 IEEE Columbus Ohio USA June 2014

[40] T Ojala M Pietikainen and D Harwood ldquoA comparativestudy of texture measures with classification based on featureddistributionsrdquo Pattern Recognition vol 29 no 1 pp 51ndash59 1996

[41] S-S Liu Y-T Tian and CWan ldquoFacial expression recognitionmethod based on Gabor multi-orientation features fusion andblock histogramrdquo Zidonghua XuebaoActa Automatica Sinicavol 37 no 12 pp 1455ndash1463 2011

[42] C-W Hsu and C-J Lin ldquoA comparison of methods for mul-ticlass support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[43] T Kanade J F Cohn and Y Tian ldquoComprehensive database forfacial expression analysisrdquo in Proceedings of the 4th IEEE Inter-national Conference on Automatic Face and Gesture Recognition(FG rsquo00) pp 46ndash53 Grenoble France March 2000

[44] M Pantic M Valstar R Rademaker and L Maat ldquoWeb-baseddatabase for facial expression analysisrdquo in Proceedings of theIEEE International Conference on Multimedia and Expo (ICMErsquo05) pp 317ndash321 Amsterdam Netherlands July 2005

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Facial Expression Recognition from Video Sequences Based ...downloads.hindawi.com/journals/mpe/2017/7206041.pdf · Facial Expression Recognition from Video Sequences Based on Spatial-Temporal

2 Mathematical Problems in Engineering

representing underlying emotional status Therefore dy-namic information should be extracted from over entirevideo sequence during facial expression recognition

This paper presents video-basedmethod for facial expres-sions analysis using dynamic and static textures features offacial region First constrained local model (CLM) basedon incremental formulation is used to align and track facelandmark Then dynamic and static features are extractedand fused to enhance recognition accuracy spatial-temporalmotion LBP concatenating LBP histograms on XT and YTplanes of whole video sequence to estimate changes of facialtexture during the whole process and Gabormultiorientationfusion histogram in peak expression frame to representstatus of facial texture when facial expression occur Finallyclassification is accomplished using multiclass SVM usingone-versus-one strategy

This paper mainly provides the following contributions(a) integration of spatial-temporal motion LBP with Gabormultiorientation fusion histogram (b) analysis on contribu-tions of three LBP histograms on three orthogonal planesto accuracy of face expression recognition and (c) spatial-temporal motion LBP facial dynamic features

The remainder of this paper is structured as follows Sec-tion 2 presents background of facial expression recognitionmethods and discusses contribution of current approachesSection 3 shows detailed description of proposed approachesSection 4 details experiments employed for evaluation of pro-posed framework performance Finally Section 5 concludesthe paper

2 Background

Facial expression analysis systems are designed to classifyimage or video sequences into one basic emotion of sixpreviously mentioned expressions Facial expressions changeface shape and texture (ie deformation of eyebrows moutheyes and skin texture) through movement of facial musclesThese variations can be extracted and adopted to classifygiven facial image or video sequences Though methods forfacial expression recognition may include different aspectsrecent research can be divided into two main streams facialaction unit based (AU-based) techniques and content-basedtechniques both of which are detailed in Sections 21 and 22respectively

21 AU-Based Expression Recognition InAU-based approachesFacial Action Coding System (FACS) is extensively used toolfor describing facial movements [25] FACS bridges facialexpression changes and motions of facial muscles producingthem This system defines nine different AUs in upper partof the face eighteen AUs in lower part of the face andfive AUs which belong to neither upper nor lower part ofthe face Moreover this system defines some action unitssuch as eleven for describing head position nine for statesof eyes and fourteen for other actions AUs which are thesmallest identifiable facial movements [26] are used bymanyexpression recognition systems [27ndash30]

In [31] several approaches are compared to classify AU-based facial expressions these approaches include princi-pal component analysis (PCA) linear discriminate analysis(LDA) independent component analysis optical flow andGabor filters Results showed that methods using local spatialfeatures perform excellently in expression recognition How-ever PCA and LDA technologies destroy capacity to discernlocal features

The system mentioned in [26] tracks a series of facialfeature points and extracts temporal and spatial geometricfeatures from motions of feature points Geometric featuresare less susceptible to physical changes such as variation ofillumination and differences among human faces Howeverin some AUs failure is observed in methods based ongeometric features such as AU15 (mouth pressure) and AU14(dimple) The appearance of these AUs can change facialtexture but not facial feature points

Based on dynamic Bayesian network (DBN) for sponta-neous AU intensity recognition unified probabilistic modelis proposed in [32]Themodel contains two steps that is AUintensity observation extraction and DBN inference Gaborfeature histogram of oriented gradients (HOG) feature andSVM classifier-based framework are employed to extractAU intensity observation Then DBN is used to systemat-ically model dependencies among AU multi-AU intensitylevels and temporal relationships DBN model combineswith image observation to recognize AU intensity throughDBN probabilistic inference Results show that the proposedmethod can improve AU intensity recognition accuracy

In [33] work hidden Markov model (HMM) is usedto model AU in time evolution process Classification isaccomplished by maximizing extracted facial features andprobability of HMM Work in [34] combines SVM withHMM to recognize AU and recognition accuracy on eachAU is higher than result yielded when using SVM for timeevolution of feature Although both methods consider timecharacteristics of AUs interaction model among AUs is notbuilt

22 Content-Based Expression Recognition Non-AU meth-ods are content-based and use local or global facial fea-tures to analyze expressions Content-based techniques forfacial expression include static-based method [35 36] anddynamic-based method [16 18 20 22 37]

In [37] Kanade-Lucas-Tomas tracker is used to track gridpoints on face to obtain point displacements These pointsare used to train SVM classifier However this method onlyextracts geometric features through locating and trackingpoints on face In [16] system integrating pyramid histogramof gradients on three orthogonal planes (PHOG-TOP) withoptical flow is adapted to classify the emotions Two kinds offeatures respectively represent movement of facial landmarksand variation of facial texture Results show that integratedframework outperforms methods using single feature opera-tor

The work in [18] adopts method using texture operatorcalled LBP instead of geometric features Important featuresare selected by using boosting algorithmbefore training SVM

Mathematical Problems in Engineering 3

classifier In [22] volume LBP (VLBP) and LBP histogramson three orthogonal planes (LBPTOP) are used to revealdynamic texture of facial expressions in video sequenceThen based on multiclass SVM using one-versus-one strat-egy a classifier is applied Experiments conducted on Cohn-Kanade (CK) dataset demonstrate that LBPTOP outperformsformer methods

To obtain higher recognition accuracy combined geo-metric and texture features are adopted The work in [20]fuses geometric feature and Gabor feature of local facialregion to enhance the facial expression recognition accuracyDynamic information of video sequence is extracted forfeatures to construct parametric space with 300 dimensionsCompared with other methods system performs excellentlyin expression recognition when combining geometric andtexture features

23 Discussion and Contributions AU is middle-level inter-pretation of facial muscle motion such actions associatehigh-level semantics with low-level features of facial expres-sions and identify meaningful human facial movements [27]However drawbacks of the AU-based expression recognitioninclude affiliation of errors to whole recognition processand chances of reduction in accuracy when middle-levelclassification step is added in the system Static-based expres-sion recognition technology only reflects expression stateat specific time points Dynamic features extracted fromvideo sequence reveal facial expression changes and are moreaccurate and robust for expression recognition [16 20 22]

In this paper proposed framework focuses on content-based and features of fusion-based facial expression recog-nition technique and it identifies expression in video moreaccurately Most feature extraction methods of recent workare content-based and are limited to single form of dynamicor static features while in this paper we propose a method torecognize facial expression using the spatial-temporalmotionLBP from the whole video sequence and Gabor multiori-entation fusion histogram from the peak expression frameMoreover we first study contributions of LBP histogramsfrom different orthogonal planes to evaluate accuracy ofexpression recognition and to concatenate LBP histogramson XT and YT planes to establish spatial-temporal motionLBP features based on significance of expression recognitionThus in this paper framework outperforms those fromprevious studies while considering dynamic and static texturefeatures of facial expressions

3 Methodology

Proposed approach includes location and tracking of facialpoints (preprocessing) spatial-temporal motion LBP andGabor multiorientation fusion histogram extraction (featureextraction) and expression classification Figure 1 illustratesthis method Framework and its components are detailed inthe following sections

31 Preprocessing When image sequence is introduced tofacial expression recognition systems facial regions should

be detected and cropped as preprocessing step Real-timeViolandashJones face detector [38] commonly used in differentareas such as face detection recognition and expressionanalysis is adopted to crop face region coarsely in this paperViolandashJones face detector consists of cascade of classifiersemploying Haar feature trained by AdaBoost Haar feature isbased on integral image filters computed simply and fast atany location and scale

To segment face regions more accurately fiducial pointsshould be located and tracked in video sequences Basedon incremental formulation [39] CLM model is employedto detect and track face points in our framework OriginalCLM framework is patch-based method and faces can berepresented by series of image patches cropped from locationof fiducial points Patch-expert of each point is trained bylinear SVM and positive and negative patches and used asdetector to update point location Instead of dealing withupdating shape model CLM model based on incrementalformulation focuses on updating function which can mapfacial texture to facial shape and constructing robust anddiscriminatively deformable model that outperforms state-of-the-art CLM alignment frameworks [39]

Locations of detected points on contour of eyes are usedto compute the angle between line of two inner corners andimage level edge in each frame to crop the facial region frombackgroundAngle can be rotated to horizontal axis to correctany in-plane head rotation Points of two outer eyes cornersand nose tip are used to crop and scale images into 104 times128 rectangular region containing all local areas related tofacial expressions After preprocessing center in horizontaldirection of cropped image is x coordinate of center of twoeyes while 119910 coordinate of nose tip is found in lower third invertical direction of cropped image

32 Spatial-TemporalMotion LBPFeature LBP iswell-knownoperator used to describe local texture characteristics ofimage and is first proposed by Ojala et al [40] Operatorlabels image pixels by thresholding 119899 times 119899 neighborhood ofeach pixel with value of center and considering results asbinary numbers Then histogram of all labels in image canbe adopted as texture descriptor LBP is widely used in manyaspects such as face recognition texture classification tex-ture segmentation and facial expression recognition becauseof gray scale and rotation invariance advantages

Recently some extended LBP operators are introducedto describe dynamic texture and outperform other textureoperators LBPTOP is spatial-temporal operator extendedfrom LBP and is proposed in the work of [22] As shownin Figure 2 LBP codes are extracted from three orthogonalplanes (ie XY XT and YT) of video sequences For allpixels statistical histograms of three different planes canbe respectively denoted as LBP-XY LBP-XT and LBP -YT and are obtained and concatenated into one histogramHistogram is calculated as follows

119867119894119895 = sum119909119910119905

119868 119891119895 (119909 119910 119905) = 119894

119894 = 0 119899119895 minus 1 119895 = 0 1 2(1)

4 Mathematical Problems in Engineering

Video Preprocessing

Gabor multiorientation fusion histogram

Spatial-temporal LBP motion histogram

X

Y

T

Expression classification

Peak frame

Image sequences

middot middot middot

middot middot middot

Figure 1 Proposed framework for facial recognition

minus3minus2

minus1 01

23

minus3minus2

minus10

12

3minus3

minus2

minus1

0

1

2

3

XT

Y

Figure 2 Example of LBP code using radii three and eightneighboring points on three planes

where 119899119895 is numbers of different labels produced by LBPoperator in 119895th plane (119895 = 0 XY 1 XT and 2 YT) and119891119895(119909 119910 119905) represents LBP code of center pixel (119909 119910 119905) in 119895thplane

As shown in formula (1) and Figure 2 dynamic texturein video sequence can be extracted by incorporating LBPhistogram on XY plane (LBP-XY) reflecting informationof spatial domain and LBP histograms on XT and YTplanes representing spatial-temporal motion information inhorizontal and vertical directions respectively

All LBPTOP components are LBP histograms extractedfrom three orthogonal planes and have equal contribution infacial expression recognition [22] However not all LBPTOPcomponents are significant LBP-XY includes characteristicsof different subjects resulting in difficulties in accurate ex-pression recognition LBP-XT and LBP-YT contain mostinformation on spatial-temporal texture changes of facial

expression Consequently as shown in Figure 3 LBP-XYis abandoned and LBP-XT and LBP-XT are merged toconstruct spatial-temporal motion LBP feature which isapplied in facial expression recognition in our study In thispaper LBP code is used and includes radii three and eightneighboring points on each plane moreover image of eachframe is equally divided into 8 times 9 rectangular subblocks withoverlap ratio of 70

33 Gabor Multiorientation Fusion Histogram Gabor wa-velet is well-known descriptor representing texture informa-tion of an image Gabor filter can be calculated by Gaus-sian kernel multiplied by sinusoidal plane Gabor feature ishighly capable of describing facial textures used in differ-ent research such as identity recognition feature locationand facial expression recognition However Gabor performsweakly in characterizing global features and feature dataare redundant Figure 4 shows extracted features based onGabor multiorientation fusion histogram proposed in [41]these features are used in reducing feature dimension andimproving recognition accuracy

First Gabor filter of five scales and eight orientations isused to transform images The following represents multi-scale and multiorientation features of pixel point 119911(119909 119910) inimages

119866119906V (119911) 119906 isin (0 7) V isin (0 4) (2)

where 119906 and V denote orientations and scales and 119866119906V(119911) isnorm of Gabor features

Transformation of each original image produces 40 cor-responding images with different scales and orientationsDimension of obtained feature is 40 times as high as that oforiginal facial imageTherefore on the same scale features ofeight orientations are fused according to a certain rule Fusionrules are as follows

119879V (119911) = argmax119906

1003817100381710038171003817119866119906V (119911)1003817100381710038171003817

119906 isin (0 7) V isin (0 4) (3)

Mathematical Problems in Engineering 5

X

Y

T

(a) (b)LBP-YT LBP-XT

Figure 3 Features in each block sequence (a) Block sequence and (b) LBP histograms from XT and YT planes

Gabor features in eight orientations and five scales

Orientation feature fusion

Fusion features infive scales

Block

Histogram

Histogram

middot middot middot

middot middot middot

middot middot middot

Figure 4 Feature extraction procedure based on Gabor multiorientation fusion histogram

where 119879V(119911) isin [1 8] is encoding value of pixel point 119911(119909 119910)and 119866119906V(119911) is Gabor features

Here orientation of local area is evaluated by orien-tation code of maximum value of Gabor features of eightorientations Each encoding value 119879V(119911) represents one localorientation Finally each expression image is transformedinto five scales of images and each scale fuses eight orienta-tions Fusion images are shown in Figure 5 As can be seenfrom figure fused images retain the most obvious changeinformation of Gabor subfiltering Therefore Gabor fusionfeature shows high discrimination for local texture changes

Histogram can describe global features of images How-ever many structural details maybe lost when histogram ofentire image is directly calculated Thus each fusion image isdivided into 8 times 8 nonoverlapping and equal rectangular sub-blocks Accordingly histogram distributions of all subblocksare calculated according to formula (4) and combined toconstruct Gabor multiorientation fusion histogram of wholeimage

ℎV119903119894 = sum119911

119868 (119877V119903 (119911) = 119894) 119894 = 0 8 (4)

where 119877V119903(119911) is encoding value in point 119911(119909 119910) of 119894th sub-block

As obtained by coding five scales onmultiple directions ofGabor feature Gabor multiorientation fusion histogram notonly inherits Gabor wavelet advantage capturing correspond-ing spatial frequency (scale) spatial location and orientationselectivity but also reduces redundancy of feature data andcomputation complexity

34 Multifeature Fusion Features extracted in Sections 32and 33 are scaled to [minus1 1] and integrated into one vectorusing the following equation

119865 = 120572119875 + (1 minus 120572)119876 0 le 120572 le 1 (5)

where 119875 is spatial-temporal motion LBP histogram and 119876 isGabor multiorientation fusion histogram 119865 is fusion vectorof119875 and119876 and is feature vector for final prediction Parameter120572 usually depends on performance of each feature For allexperiments in this paper we set value of 120572 to 05 whileconsidering experiments results shown in Tables 2 and 3

35 Classifier Many classifiers are used for facial expressionrecognition these classifiers include J48 based on ID3nearest neighbor based on fuzzy-rough sets random forest[20] SVM[22] andHMM[29] In this paper SVM is selected

6 Mathematical Problems in Engineering

Figure 5 Fusion images on five scales

in our framework because of the following properties (1)classifier based on statistical learning theory (2) high gen-eralization performance and (3) ability to deal with featurehaving high dimensions

To achieve more effective classification of facial expres-sions SVM classifier with RBF kernel is adapted Given atraining set of instance-label pairs (119909119894 119910119894) the SVM requiresthe solution of the following optimization problem

min119908119887120585

12119908119879119908 + 119862

119897

sum119894=1

120585119894

subject to 119910119894 (119908119879120601 (119909119894) + 119887) ge 1 minus 120585119894

120585119894 ge 0

(6)

where 119909119894 are training samples 119897 is the number of trainingset 119910119894 are the labels of 119909119894 119862 is punishment coefficient thatprevents overfitting 120585119894 is relaxation variable and 119908 and 119887 arethe parameters of the classifier Here training vectors 119909119894 aremapped into a higher dimensional space by using function120601(119909119894) SVM finds a linear separating hyperplane with themaximalmargin in the higher dimensional space119870(119909119894 119909119895) equiv120601(119909119894)119879120601(119909119895) is the kernel function of SVM The RBF kernelfunction is shown as follows

119870(119909119894 119909119895) = exp (minus120574 10038171003817100381710038171003817119909119894 minus 119909119895100381710038171003817100381710038172) 120574 gt 0 (7)

where 120574 is parameter of RBF kernel In this paper grid-searchstrategy and tenfold cross validation [42] are used to selectand estimate punishment coefficient119862 and kernel parameters120574 of SVM SVM is binary classifier used only in two setsof feature spaces of objects Facial expression recognitionis multiclass problem Thus multiclassifier which consistsof binary SVM classifiers is constructed to classify facialexpressions [42] Two strategies are commonly used one-versus-one strategy and one-versus-all strategy One-versus-one strategy is applied in our method because of its robust-ness in classification and simplicity of application In thisstrategy SVM is trained for classifying any two kinds offacial expression Thus six expressions require training 15binary SVM classifiers (fear versus anger happiness versusdisgust and surprise versus sadness among others) Finalresult is expression classification with highest number ofvotes However occasionally class with most votes is not theonly result In this case nearest neighbor classifier is appliedto these classes to obtain one final result

Table 1 Number of subjects for each expression class in threedatasets

Expression CK+ MMI Oulu (each condition)Anger 45 32 80Disgust 59 28 80Fear 25 28 80Happiness 69 42 80Sadness 28 32 80Surprise 83 41 80Total 309 203 480

4 Experiments

41 Facial Expression Datasets CK+ dataset [14] is extendedfrom CK dataset [43] which was built in 2000 it is availableand most widely used dataset for evaluating performance offacial expression recognition systemsThis dataset consists of593 video sequences including seven basic facial expressionsperformed by 123 participants Ages of subjects range from18 to 30 years 35 are male 13 are Afro-American eighty-one percent are Euro-American and six percent are peopleof other race In video sequences image of each frame is640 times 490 or 640 times 480 pixels Most frames are gray imageswith eight-bit precision for grayscale values Video sequencescontain frames from neutral phase to peak phase of facialexpressions and video rate is 30 frames per second Inour study 309 image sequences containing 97 subjects withsix expressions (anger disgust fear happiness sadness andsurprise) are selected for our experiments Each subject hasone to six expressions Top row of Figure 6 shows peak framesof six expressions from six subjects in CK+ dataset Table 1shows number of subjects for each expression class in CK+dataset in our experiment

MMI dataset [44] is applied to evaluate performance ofexpression recognition methods this dataset contains 203video sequences including different head poses and subtleexpressions These expression sequences were performed by19 participants with ages ranging from 19 to 62 years Subjectsare Asian European and South American Male participantsaccount for 40 of subjects Different from CK+ datasetMMI dataset contains six basic facial expressions comprisingneutral onset apex and offset frames In video sequencesimage of each frame is 720 times 576 pixels with RGB full colorOriginal frames are processed into images with eight-bit

Mathematical Problems in Engineering 7

Table 2 Classification results of using spatial-temporal motion LBP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 1 0 20 2 2 0 800Happiness 0 0 0 69 0 0 100Sadness 4 0 2 0 21 1 750Surprise 0 0 2 1 0 80 964Overall 941

Table 3 Classification results of using Gabor multiorientation fusion histogram on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 40 2 0 0 3 0 889Disgust 2 56 0 1 0 0 949Fear 1 0 17 3 4 0 680Happiness 0 0 0 69 0 0 100Sadness 4 0 0 0 24 0 857Surprise 0 1 0 0 0 82 988Overall 932

Angry Disgust FearHappiness Sadness Surprise

Figure 6 Sample images in CK+ dataset and MMI dataset

precision for grayscale values for our study and extractedframes from neutral to apex phases of facial expressionsBottom row of Figure 6 shows peak frames of six expressionsfrom six subjects in MMI dataset Table 1 shows number ofeach expression class in MMI dataset in our experiment Asseen from Table 1 MMI dataset includes fewer subjects thanCK+ dataset

Oulu-CASIAVIS facial expression database [24] includessix basic expressions from 80 subjects between 23 to 58years old 262 of the subjects are females All the imagesequences were taken under the visible light condition Oulu-CASIA VIS contains three subsets All expressions in thethree subsets were respectively captured in three different

illumination conditions normal weak and darkThe imagesof normal illumination condition are captured by usinggood normal lighting Weak illumination means that onlycomputer display is on and each subject sits in front of thedisplay Dark illumination means almost darkness Videosequences contain frames fromneutral phase to peak phase offacial expressions and video rate is 25 frames per secondTheface regions of six expressions in Oulu-CASIA VIS datasetare shown in Figure 7 (top row normal middle row weakand bottom row dark)The resolution of the images in Oulu-CASIA VIS dataset (320 times 240) is lower than that of theimages in CK+ and MMI datasets The number of videosequences is 480 for each illumination condition Table 1

8 Mathematical Problems in Engineering

Anger Disgust FearHappiness Sadness Surprise

Figure 7 Sample images in Oulu-CASIA VIS dataset

shows number of each expression class in Oulu-CASIA VISdataset in our experiment

42 Experimental Results on CK+ Dataset Three sets of ex-periments are conducted to evaluate performance of ourmethod on CK+ dataset because head motions of subjectsin CK+ dataset are smaller than that in other two datasetsThen in this paper performance of framework proposedis compared with seven state-of-the-art methods Leave-one-out cross validation strategy is applied in these setsof experiments The punishment coefficient 119862 and kernelparameters 120574 of SVM forCK+dataset are 1048576 and 372 times10minus9 respectively

In first set of experiments contributions of three LBPhistograms on three planes (three components of LBPTOP)are investigated separately to determine accuracy of facerecognition Figure 8 presents recognition rates of classifyingsix basic expressions using LBP histograms from three indi-vidual planes (ie LBP-XY LBP-XT and LBP-YT) and theircombination (ie LBPTOP) The figure indicates that usingLBPTOP yields the highest recognition rate (945) LBP-YTdescribes better variation in texture along vertical direction(935) and lowest performance is attributed to LBP-XYcontaining appearance characteristics of different subjects(822) Then we investigate the performance of spatial-temporalmotion LBP feature which joins the LBP histogramson XT and YT planes together and compare with LBPTOPBased on results shown in Figure 9 recognition rates are

LBP-XY LBP-XT LBP-YT LBPTOP60

65

70

75

80

85

90

95

100

Reco

gniti

on ra

te (

)

822

919935 945

Figure 8 Recognition rates of methods using LBP histograms fromthree planes and LBPTOP

almost similar Furthermore spatial-temporal motion LBPfeature has higher recognition rate than LBPTOP in terms ofanger and disgust Results shown in Figures 8 and 9mean thefollowing (1) variation of texture onXT andYTplane ismoresignificant than XY plane in expression sequence (2) com-pared with LBPTOP dimension of spatial-temporal motionLBP feature is reduced by 13 but recognition accuracy is notdecreased

Mathematical Problems in Engineering 9

Table 4 Classification results of using fusion features on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 0 0 22 0 2 1 880Happiness 0 0 0 69 0 0 100Sadness 3 0 1 0 24 0 857Surprise 0 0 2 1 0 80 964Overall 958

Table 5 Comparison results of using fusion feature and LBPTOP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 933 (+22) 0 0 0 67 0Disgust 0 100 (00) 0 0 0 0Fear 0 0 880 (00) 0 8 4Happiness 0 0 0 100 (00) 0 0Sadness 107 0 36 0 867 (+81) 0Surprise 0 0 24 12 0 964 (00)Overall 958 (+13)

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

100

Reco

gniti

on ra

te (

)

Spatial-temporal LBPLBPTOP

Figure 9 Recognition rates of methods using LBPTOP and spatial-temporal motion LBP

Second set of experiments evaluates effectiveness of com-bining spatial-temporal motion LBP with Gabor multiorien-tation fusion histogram Tables 2ndash4 show results obtained byusing spatial-temporal motion LBP Gabor multiorientationfusion histogram and proposed framework with multiclassSVM classifier Results include recognition rate of eachexpression and overall classifications accuracy Each blackdigital represents number of correctly classified subjects ofeach expression category and each dark and bolder digitalimage represents number of misclassified subjects of each

expression category Results show that when using combina-tion of spatial-temporalmotion LBP andGabormultiorienta-tion fusion histogram framework performs better than usingindividual features

Third set of experiments compares performance betweenproposedmethods and approach using LBPTOP [22] Resultsin Table 5 show recognition rate of method using combina-tion of spatial-temporal motion LBP with Gabor multiori-entation fusion histogram (ie proposed framework) wherefigures in parentheses represent difference in accuracy inusing LBPTOP and features proposed in this paper positivefigure shows better accuracy of proposed features Based onTable 5 proposed framework performs better in two expres-sions and same in others Furthermore overall recognitionrate is slightly better when using proposed method There-foremethod fusing spatial-temporalmotion LBPwithGabormultiorientation fusion histogram outperforms LBPTOP

In this set experiment we compared our method withsystems proposed by Eskil and Benli [13] Lucey et al [14]Chew et al [15] Fan and Tjahjadi [16] and Jain et al [17]on CK+ dataset The datasets used in these papers are thesame and identical to the dataset adopted in our studyTherefore results presented in these papers can be directlycompared with our method Methods proposed by Eskiland Benli [13] Lucey et al [14] Chew et al [15] and Fanand Tjahjadi [16] used leave-one-subject-out cross validationmethod and technology proposed by Jain et al [17] usedtenfold cross validation Based on results shown in Table 6proposed framework shows average accuracy of 958 for allsix basic facial expressions performing better than methodsdeveloped by mentioned authors in this paragraph

43 Experimental Results onMMIDataset Sevenmethods in[18ndash20] are used to perform quantitative comparisons with

10 Mathematical Problems in Engineering

Table 6 Comparison of results in proposed framework and sevenstate-of-the-art methods on CK+ dataset

Research Methodology AccuracyEskil and Benli [13] High-polygon wireframe mode 850Lucey et al [14] AAM shape 689

AAM appearance 845Combined 887

Chew et al [15] Uni-hyperplane classification 894Fan and Tjahjadi [16] PHOGTOP and optical flow 909Jain et al [17] Temporal modeling of shapes 919Proposed 958

Table 7 Comparison result of proposed framework and sevensystems on MMI dataset

Research Methodology Accuracy ()Shan et al [18] LBP 4778Wang et al [19] AdaBoost 478

HMM 515ITBN 597

Fang et al [20] Active shape model 6238Active appearance model 6435Gabor + Geometry + SVM 7067

Proposed 7192

our framework on MMI dataset The punishment coefficient119862 and kernel parameters 120574 of SVM for MMI dataset are2097152 and 931 times 10minus10 respectively Datasets used inthese papers are identical to the dataset adapted in ourstudy In this experiment proposed framework performstenfold cross validation Shan et al [18] and Fang et al [20]developed methods by conducting experiments using thesame strategy we employed Methods proposed by Wang etal [19] performed twentyfold cross validation and methodsusing active shape model [20] and active appearance model[20] performed twofold cross validation results are shownin Table 7 Table 7 indicates that performance of proposedmethod onMMI dataset is inferior to CK+ dataset because ofsubtler expressions notable changes in head rotation angleand fewer training data However proposed frameworkstill performs better than the other seven state-of-the-artmethods

44 Experimental Results on Oulu-CASIA VIS Dataset Inthis section we evaluated our framework on Oulu-CASIAVIS dataset In these set experiments proposed frameworkperforms tenfold cross validation The punishment coeffi-cient 119862 and kernel parameters 120574 of SVM for this dataset are1048507 and 933 times 10minus10 respectively First our methodis tested on three subsets of Oulu-CASIA VIS respectivelyThe experimental results are shown in Figure 10 As shownin Figure 10 the recognition rate obtained for the normalillumination condition is highest and the recognition rateobtained for the dark illumination condition is worst Evi-dently the facial images captured in the weak or dark illu-mination are often missing much useful texture information

Table 8 Comparison result of proposed framework and fourmethods on Oulu-CASIA VIS dataset

Research Methodology Accuracy ()Scovanner et al [21] 3D SIFT 5583Zhao and Pietikainen [22] LBPTOP 6813Klaser et al [23] HOG3D 7063Zhao et al [24] AdaLBP 7354Proposed 7437

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

Reco

gniti

on ra

te (

)

DarkWeakNormal

Figure 10 Recognition results of proposed methods on threesubsets of Oulu-CASIA VIS dataset

for facial expression recognition Then four methods in [21ndash24] are used to perform quantitative comparisons with ourframework on the subset obtained in normal illuminationconditions The experimental results are shown in Table 8Table 8 indicates that performance of proposed method onOulu-CASIA VIS dataset is inferior to CK+ dataset but supe-rior to MMI dataset Proposed framework still outperformsthe other four state-of-the-art methods

45 Computational Complexity Evaluation In this sectioncomputational complexity of our method is analyzed inCK+ MMI and Oulu-CASIA VIS dataset Processing time isapproximated to be time needed for preprocessing extractingspatial-temporal motion LBP and Gabor multiorientationfusion histogram and classifying each image frame of videosequence Runtime (measured using computer system clock)is estimated by usingmixed-language programming based onMATLAB andC++ on an Intel (R) Core (TM) i7-6700CPU at340GHz with 8GB RAM running onWindows 10 operatingsystem Average processing time per image is under 280 286and 249ms for CK+ MMI and Oulu-CASIA VIS datasetsrespectively Then we compare computational time of ourmethodwith system using LBPTOP [22] Comparison resultsin Table 9 show that computational time of our method islower than system using LBPTOP in three datasets Resultsshow that our method is more efficient and effective than

Mathematical Problems in Engineering 11

Table 9 Runtime of our method and LBPTOP in CK+ MMI andOulu-CASIA VIS datasets

Research CK+ MMI OuluLBPTOP 349ms 357ms 317msproposed 280ms 286ms 249ms

LBPTOP-based technology for facial expression recognitionin CK+ MMI and Oulu-CASIA VIS datasets

5 Conclusion

This paper presents novel facial expression recognitionframework integrating spatial-temporal motion LBP withGabor multiorientation fusion histogram Framework com-prises preprocessing (face recognition and points locationand tracking) dynamic and static feature extraction andfeature classification outperforming seven state-of-the-artmethods on CK+ dataset seven other methods on MMIdataset and four methods on Oulu-CASIA VIS datasetExpressions of disgust and happiness are easier to classifythan other expressions demonstrating better performance ofproposed method However recognition rates are lower forfear (880) and sadness (867) on CK+ dataset comparedwith others because most anger expressions are misclassifiedas sadness and vice-versa A lamination of proposed methodis that the occlusions head rotations and illumination canreduce accuracy of expression recognition these phenomenawill be solved in our future work A framework integratingother useful information (pupil head and body movements)with the features extracted in this paper could perform betterThis issue will also be studied in our future work Theproposed method cannot achieve real-time facial expressionrecognition We will attempt to establish an efficient dimen-sionality reduction method to reduce the computationalcomplexity of our framework in the future Furthermoreproposed system will be improved and applied in study ofhuman drowsiness expression analysis in our further work

Competing Interests

The authors declare no conflict of interests

Acknowledgments

This work was supported by the Open Foundation of StateKey Laboratory of Automotive Simulation and Control(China Grant no 20161105)

References

[1] P Ekman and E L Rosenberg What the Face Reveals Basicand Applied Studies of Spontaneous Expression Using the FacialAction Coding System Oxford University Press Oxford UK2005

[2] J A Russell J M Fernandez-Dols and G Mandler The Psy-chology of Facial Expression Cambridge University Press Cam-bridge UK 1997

[3] P Ekman and W V Friesen ldquoConstants across cultures in theface and emotionrdquo Journal of Personality and Social Psychologyvol 17 no 2 pp 124ndash129 1971

[4] B Golomb and T Sejnowski ldquoBenefits of machine understand-ing of facial expressionsrdquo in NSF ReportmdashFacial ExpressionUnderstanding pp 55ndash71 1997

[5] M Pantic ldquoFace for ambient interfacerdquo in Ambient Intelligencein Everyday Life vol 3864 of Lecture Notes on Artificial Intelli-gence pp 32ndash66 Springer Berlin Germany 2006

[6] R-L Hsu M Abdel-Mottaleb and A K Jain ldquoFace detectionin color imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 5 pp 696ndash706 2002

[7] H Fang and N Costen ldquoFrom Rank-N to Rank-1 face recog-nition based on motion similarityrdquo in Proceedings of the 20thBritish Machine Vision Conference (BMVC rsquo09) pp 1ndash11 Lon-don UK September 2009

[8] C Cornelis M D Cock and A M Radzikowska ldquoVaguelyquantified rough setsrdquo in Rough Sets Fuzzy Sets Data Miningand Granular Computing 11th International Conference RSFD-GrC 2007 Toronto Canada May 14ndash16 2007 Proceedings vol4482 of Lecture Notes in Computer Science pp 87ndash94 SpringerBerlin Germany 2007

[9] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[10] M Pantic and L UM Rothkrantz ldquoAutomatic analysis of facialexpressions the state of the artrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 22 no 12 pp 1424ndash14452000

[11] B Fasel and J Luettin ldquoAutomatic facial expression analysis asurveyrdquo Pattern Recognition vol 36 no 1 pp 259ndash275 2003

[12] C P Sumathi T Santhanam and M Mahadevi ldquoAutomaticfacial expression analysis a surveyrdquo International Journal ofComputer Science amp Engineering Survey vol 3 no 6 pp 47ndash592012

[13] M T Eskil and K S Benli ldquoFacial expression recognition basedon anatomyrdquo Computer Vision and Image Understanding vol119 pp 1ndash14 2014

[14] P Lucey J F Cohn T Kanade J Saragih Z Ambadar andI Matthews ldquoThe extended Cohn-Kanade dataset (CK+) acomplete dataset for action unit and emotion-specified expres-sionrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPRW rsquo10) pp94ndash101 San Francisco Calif USA June 2010

[15] S W Chew S Lucey P Lucey S Sridharan and J F ConnldquoImproved facial expression recognition via uni-hyperplaneclassificationrdquo in Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR rsquo12) pp 2554ndash2561IEEE June 2012

[16] X Fan and T Tjahjadi ldquoA spatial-temporal framework basedon histogram of gradients and optical flow for facial expressionrecognition in video sequencesrdquoPatternRecognition vol 48 no11 pp 3407ndash3416 2015

[17] S Jain CHu and J K Aggarwal ldquoFacial expression recognitionwith temporal modeling of shapesrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 1642ndash1649 November 2011

[18] C Shan S Gong and P W McOwan ldquoFacial expression rec-ognition based on Local Binary Patterns A ComprehensiveStudyrdquo Image and Vision Computing vol 27 no 6 pp 803ndash8162009

12 Mathematical Problems in Engineering

[19] Z Wang S Wang and Q Ji ldquoCapturing complex spatio-temporal relations among facial muscles for facial expressionrecognitionrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo13) pp 3422ndash3429 IEEE Portland Ore USA June 2013

[20] H Fang N Mac Parthalain A J Aubrey et al ldquoFacial ex-pression recognition in dynamic sequences an integrated ap-proachrdquo Pattern Recognition vol 47 no 3 pp 1271ndash1281 2014

[21] P Scovanner S Ali andM Shah ldquoA 3-dimensional sift descrip-tor and its application to action recognitionrdquo in Proceedingsof the 15th ACM International Conference on Multimedia (MMrsquo07) pp 357ndash360 ACM Augsburg Germany September 2007

[22] G Zhao and M Pietikainen ldquoDynamic texture recognitionusing local binary patterns with an application to facial expres-sionsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 29 no 6 pp 915ndash928 2007

[23] A Klaser M Marszałek and C Schmid ldquoA spatio-temporaldescriptor based on 3D-gradientsrdquo in Proceedings of the 19thBritish Machine Vision Conference (BMVC rsquo08) pp 275ndash281Leeds UK September 2008

[24] G Zhao XHuangMTaini S Z Li andM Pietikainen ldquoFacialexpression recognition from near-infrared videosrdquo Image andVision Computing vol 29 no 9 pp 607ndash619 2011

[25] P Ekman W V Friesen and J C Hager Facial Action CodingSystem A Human Face Salt Lake City Utah USA 2002

[26] M F Valstar and M Pantic ldquoFully automatic recognition ofthe temporal phases of facial actionsrdquo IEEE Transactions onSystems Man and Cybernetics Part B Cybernetics vol 42 no1 pp 28ndash43 2012

[27] Y Zhang and Q Ji ldquoActive and dynamic information fusion forfacial expression understanding from image sequencesrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol27 no 5 pp 699ndash714 2005

[28] M Pantic and I Patras ldquoDynamics of facial expression recog-nition of facial actions and their temporal segments from faceprofile image sequencesrdquo IEEE Transactions on Systems Manand Cybernetics Part B Cybernetics vol 36 no 2 pp 433ndash4492006

[29] Y Tong W Liao and Q Ji ldquoFacial action unit recognitionby exploiting their dynamic and semantic relationshipsrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol29 no 10 pp 1683ndash1699 2007

[30] Y Zhang Q Ji Z Zhu and B Yi ldquoDynamic facial expressionanalysis and synthesis with MPEG-4 facial animation param-etersrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 18 no 10 pp 1383ndash1396 2008

[31] G Donate M S Bartlett J C Hager P Ekman and T JSejnowski ldquoClassifying facial actionsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 21 no 10 pp974ndash989 1999

[32] Y Li S M Mavadati M H Mahoor Y Zhao and Q JildquoMeasuring the intensity of spontaneous facial action units withdynamic Bayesian networkrdquo Pattern Recognition vol 48 no 11pp 3417ndash3427 2015

[33] J J-J Lien T Kanade J F Cohn and C-C Li ldquoDetectiontracking and classification of action units in facial expressionrdquoRobotics and Autonomous Systems vol 31 no 3 pp 131ndash1462000

[34] M F Valstar and M Pantic ldquoCombined support vector ma-chines and hidden Markov models for modeling facial action

temporal dynamicsrdquo Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence and LectureNotes in Bioinformatics) vol 4796 pp 118ndash127 2007

[35] T Danisman I M Bilasco J Martinet and C DjerabaldquoIntelligent pixels of interest selection with application to facialexpression recognition using multilayer perceptronrdquo SignalProcessing vol 93 no 6 pp 1547ndash1556 2013

[36] W-L Chao J-J Ding and J-Z Liu ldquoFacial expression recog-nition based on improved local binary pattern and class-regularized locality preserving projectionrdquo Signal Processingvol 117 pp 1ndash10 2015

[37] I Kotsia and I Pitas ldquoFacial expression recognition in imagesequences using geometric deformation features and supportvector machinesrdquo IEEE Transactions on Image Processing vol16 no 1 pp 172ndash187 2007

[38] P Viola and M J Jones ldquoRobust Real-Time Face DetectionrdquoInternational Journal of Computer Vision vol 57 no 2 pp 137ndash154 2004

[39] A Asthana S Zafeiriou S Cheng andM Pantic ldquoIncrementalface alignment in the wildrdquo in Proceedings of the 27th IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo14) pp 1859ndash1866 IEEE Columbus Ohio USA June 2014

[40] T Ojala M Pietikainen and D Harwood ldquoA comparativestudy of texture measures with classification based on featureddistributionsrdquo Pattern Recognition vol 29 no 1 pp 51ndash59 1996

[41] S-S Liu Y-T Tian and CWan ldquoFacial expression recognitionmethod based on Gabor multi-orientation features fusion andblock histogramrdquo Zidonghua XuebaoActa Automatica Sinicavol 37 no 12 pp 1455ndash1463 2011

[42] C-W Hsu and C-J Lin ldquoA comparison of methods for mul-ticlass support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[43] T Kanade J F Cohn and Y Tian ldquoComprehensive database forfacial expression analysisrdquo in Proceedings of the 4th IEEE Inter-national Conference on Automatic Face and Gesture Recognition(FG rsquo00) pp 46ndash53 Grenoble France March 2000

[44] M Pantic M Valstar R Rademaker and L Maat ldquoWeb-baseddatabase for facial expression analysisrdquo in Proceedings of theIEEE International Conference on Multimedia and Expo (ICMErsquo05) pp 317ndash321 Amsterdam Netherlands July 2005

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Facial Expression Recognition from Video Sequences Based ...downloads.hindawi.com/journals/mpe/2017/7206041.pdf · Facial Expression Recognition from Video Sequences Based on Spatial-Temporal

Mathematical Problems in Engineering 3

classifier In [22] volume LBP (VLBP) and LBP histogramson three orthogonal planes (LBPTOP) are used to revealdynamic texture of facial expressions in video sequenceThen based on multiclass SVM using one-versus-one strat-egy a classifier is applied Experiments conducted on Cohn-Kanade (CK) dataset demonstrate that LBPTOP outperformsformer methods

To obtain higher recognition accuracy combined geo-metric and texture features are adopted The work in [20]fuses geometric feature and Gabor feature of local facialregion to enhance the facial expression recognition accuracyDynamic information of video sequence is extracted forfeatures to construct parametric space with 300 dimensionsCompared with other methods system performs excellentlyin expression recognition when combining geometric andtexture features

23 Discussion and Contributions AU is middle-level inter-pretation of facial muscle motion such actions associatehigh-level semantics with low-level features of facial expres-sions and identify meaningful human facial movements [27]However drawbacks of the AU-based expression recognitioninclude affiliation of errors to whole recognition processand chances of reduction in accuracy when middle-levelclassification step is added in the system Static-based expres-sion recognition technology only reflects expression stateat specific time points Dynamic features extracted fromvideo sequence reveal facial expression changes and are moreaccurate and robust for expression recognition [16 20 22]

In this paper proposed framework focuses on content-based and features of fusion-based facial expression recog-nition technique and it identifies expression in video moreaccurately Most feature extraction methods of recent workare content-based and are limited to single form of dynamicor static features while in this paper we propose a method torecognize facial expression using the spatial-temporalmotionLBP from the whole video sequence and Gabor multiori-entation fusion histogram from the peak expression frameMoreover we first study contributions of LBP histogramsfrom different orthogonal planes to evaluate accuracy ofexpression recognition and to concatenate LBP histogramson XT and YT planes to establish spatial-temporal motionLBP features based on significance of expression recognitionThus in this paper framework outperforms those fromprevious studies while considering dynamic and static texturefeatures of facial expressions

3 Methodology

Proposed approach includes location and tracking of facialpoints (preprocessing) spatial-temporal motion LBP andGabor multiorientation fusion histogram extraction (featureextraction) and expression classification Figure 1 illustratesthis method Framework and its components are detailed inthe following sections

31 Preprocessing When image sequence is introduced tofacial expression recognition systems facial regions should

be detected and cropped as preprocessing step Real-timeViolandashJones face detector [38] commonly used in differentareas such as face detection recognition and expressionanalysis is adopted to crop face region coarsely in this paperViolandashJones face detector consists of cascade of classifiersemploying Haar feature trained by AdaBoost Haar feature isbased on integral image filters computed simply and fast atany location and scale

To segment face regions more accurately fiducial pointsshould be located and tracked in video sequences Basedon incremental formulation [39] CLM model is employedto detect and track face points in our framework OriginalCLM framework is patch-based method and faces can berepresented by series of image patches cropped from locationof fiducial points Patch-expert of each point is trained bylinear SVM and positive and negative patches and used asdetector to update point location Instead of dealing withupdating shape model CLM model based on incrementalformulation focuses on updating function which can mapfacial texture to facial shape and constructing robust anddiscriminatively deformable model that outperforms state-of-the-art CLM alignment frameworks [39]

Locations of detected points on contour of eyes are usedto compute the angle between line of two inner corners andimage level edge in each frame to crop the facial region frombackgroundAngle can be rotated to horizontal axis to correctany in-plane head rotation Points of two outer eyes cornersand nose tip are used to crop and scale images into 104 times128 rectangular region containing all local areas related tofacial expressions After preprocessing center in horizontaldirection of cropped image is x coordinate of center of twoeyes while 119910 coordinate of nose tip is found in lower third invertical direction of cropped image

32 Spatial-TemporalMotion LBPFeature LBP iswell-knownoperator used to describe local texture characteristics ofimage and is first proposed by Ojala et al [40] Operatorlabels image pixels by thresholding 119899 times 119899 neighborhood ofeach pixel with value of center and considering results asbinary numbers Then histogram of all labels in image canbe adopted as texture descriptor LBP is widely used in manyaspects such as face recognition texture classification tex-ture segmentation and facial expression recognition becauseof gray scale and rotation invariance advantages

Recently some extended LBP operators are introducedto describe dynamic texture and outperform other textureoperators LBPTOP is spatial-temporal operator extendedfrom LBP and is proposed in the work of [22] As shownin Figure 2 LBP codes are extracted from three orthogonalplanes (ie XY XT and YT) of video sequences For allpixels statistical histograms of three different planes canbe respectively denoted as LBP-XY LBP-XT and LBP -YT and are obtained and concatenated into one histogramHistogram is calculated as follows

119867119894119895 = sum119909119910119905

119868 119891119895 (119909 119910 119905) = 119894

119894 = 0 119899119895 minus 1 119895 = 0 1 2(1)

4 Mathematical Problems in Engineering

Video Preprocessing

Gabor multiorientation fusion histogram

Spatial-temporal LBP motion histogram

X

Y

T

Expression classification

Peak frame

Image sequences

middot middot middot

middot middot middot

Figure 1 Proposed framework for facial recognition

minus3minus2

minus1 01

23

minus3minus2

minus10

12

3minus3

minus2

minus1

0

1

2

3

XT

Y

Figure 2 Example of LBP code using radii three and eightneighboring points on three planes

where 119899119895 is numbers of different labels produced by LBPoperator in 119895th plane (119895 = 0 XY 1 XT and 2 YT) and119891119895(119909 119910 119905) represents LBP code of center pixel (119909 119910 119905) in 119895thplane

As shown in formula (1) and Figure 2 dynamic texturein video sequence can be extracted by incorporating LBPhistogram on XY plane (LBP-XY) reflecting informationof spatial domain and LBP histograms on XT and YTplanes representing spatial-temporal motion information inhorizontal and vertical directions respectively

All LBPTOP components are LBP histograms extractedfrom three orthogonal planes and have equal contribution infacial expression recognition [22] However not all LBPTOPcomponents are significant LBP-XY includes characteristicsof different subjects resulting in difficulties in accurate ex-pression recognition LBP-XT and LBP-YT contain mostinformation on spatial-temporal texture changes of facial

expression Consequently as shown in Figure 3 LBP-XYis abandoned and LBP-XT and LBP-XT are merged toconstruct spatial-temporal motion LBP feature which isapplied in facial expression recognition in our study In thispaper LBP code is used and includes radii three and eightneighboring points on each plane moreover image of eachframe is equally divided into 8 times 9 rectangular subblocks withoverlap ratio of 70

33 Gabor Multiorientation Fusion Histogram Gabor wa-velet is well-known descriptor representing texture informa-tion of an image Gabor filter can be calculated by Gaus-sian kernel multiplied by sinusoidal plane Gabor feature ishighly capable of describing facial textures used in differ-ent research such as identity recognition feature locationand facial expression recognition However Gabor performsweakly in characterizing global features and feature dataare redundant Figure 4 shows extracted features based onGabor multiorientation fusion histogram proposed in [41]these features are used in reducing feature dimension andimproving recognition accuracy

First Gabor filter of five scales and eight orientations isused to transform images The following represents multi-scale and multiorientation features of pixel point 119911(119909 119910) inimages

119866119906V (119911) 119906 isin (0 7) V isin (0 4) (2)

where 119906 and V denote orientations and scales and 119866119906V(119911) isnorm of Gabor features

Transformation of each original image produces 40 cor-responding images with different scales and orientationsDimension of obtained feature is 40 times as high as that oforiginal facial imageTherefore on the same scale features ofeight orientations are fused according to a certain rule Fusionrules are as follows

119879V (119911) = argmax119906

1003817100381710038171003817119866119906V (119911)1003817100381710038171003817

119906 isin (0 7) V isin (0 4) (3)

Mathematical Problems in Engineering 5

X

Y

T

(a) (b)LBP-YT LBP-XT

Figure 3 Features in each block sequence (a) Block sequence and (b) LBP histograms from XT and YT planes

Gabor features in eight orientations and five scales

Orientation feature fusion

Fusion features infive scales

Block

Histogram

Histogram

middot middot middot

middot middot middot

middot middot middot

Figure 4 Feature extraction procedure based on Gabor multiorientation fusion histogram

where 119879V(119911) isin [1 8] is encoding value of pixel point 119911(119909 119910)and 119866119906V(119911) is Gabor features

Here orientation of local area is evaluated by orien-tation code of maximum value of Gabor features of eightorientations Each encoding value 119879V(119911) represents one localorientation Finally each expression image is transformedinto five scales of images and each scale fuses eight orienta-tions Fusion images are shown in Figure 5 As can be seenfrom figure fused images retain the most obvious changeinformation of Gabor subfiltering Therefore Gabor fusionfeature shows high discrimination for local texture changes

Histogram can describe global features of images How-ever many structural details maybe lost when histogram ofentire image is directly calculated Thus each fusion image isdivided into 8 times 8 nonoverlapping and equal rectangular sub-blocks Accordingly histogram distributions of all subblocksare calculated according to formula (4) and combined toconstruct Gabor multiorientation fusion histogram of wholeimage

ℎV119903119894 = sum119911

119868 (119877V119903 (119911) = 119894) 119894 = 0 8 (4)

where 119877V119903(119911) is encoding value in point 119911(119909 119910) of 119894th sub-block

As obtained by coding five scales onmultiple directions ofGabor feature Gabor multiorientation fusion histogram notonly inherits Gabor wavelet advantage capturing correspond-ing spatial frequency (scale) spatial location and orientationselectivity but also reduces redundancy of feature data andcomputation complexity

34 Multifeature Fusion Features extracted in Sections 32and 33 are scaled to [minus1 1] and integrated into one vectorusing the following equation

119865 = 120572119875 + (1 minus 120572)119876 0 le 120572 le 1 (5)

where 119875 is spatial-temporal motion LBP histogram and 119876 isGabor multiorientation fusion histogram 119865 is fusion vectorof119875 and119876 and is feature vector for final prediction Parameter120572 usually depends on performance of each feature For allexperiments in this paper we set value of 120572 to 05 whileconsidering experiments results shown in Tables 2 and 3

35 Classifier Many classifiers are used for facial expressionrecognition these classifiers include J48 based on ID3nearest neighbor based on fuzzy-rough sets random forest[20] SVM[22] andHMM[29] In this paper SVM is selected

6 Mathematical Problems in Engineering

Figure 5 Fusion images on five scales

in our framework because of the following properties (1)classifier based on statistical learning theory (2) high gen-eralization performance and (3) ability to deal with featurehaving high dimensions

To achieve more effective classification of facial expres-sions SVM classifier with RBF kernel is adapted Given atraining set of instance-label pairs (119909119894 119910119894) the SVM requiresthe solution of the following optimization problem

min119908119887120585

12119908119879119908 + 119862

119897

sum119894=1

120585119894

subject to 119910119894 (119908119879120601 (119909119894) + 119887) ge 1 minus 120585119894

120585119894 ge 0

(6)

where 119909119894 are training samples 119897 is the number of trainingset 119910119894 are the labels of 119909119894 119862 is punishment coefficient thatprevents overfitting 120585119894 is relaxation variable and 119908 and 119887 arethe parameters of the classifier Here training vectors 119909119894 aremapped into a higher dimensional space by using function120601(119909119894) SVM finds a linear separating hyperplane with themaximalmargin in the higher dimensional space119870(119909119894 119909119895) equiv120601(119909119894)119879120601(119909119895) is the kernel function of SVM The RBF kernelfunction is shown as follows

119870(119909119894 119909119895) = exp (minus120574 10038171003817100381710038171003817119909119894 minus 119909119895100381710038171003817100381710038172) 120574 gt 0 (7)

where 120574 is parameter of RBF kernel In this paper grid-searchstrategy and tenfold cross validation [42] are used to selectand estimate punishment coefficient119862 and kernel parameters120574 of SVM SVM is binary classifier used only in two setsof feature spaces of objects Facial expression recognitionis multiclass problem Thus multiclassifier which consistsof binary SVM classifiers is constructed to classify facialexpressions [42] Two strategies are commonly used one-versus-one strategy and one-versus-all strategy One-versus-one strategy is applied in our method because of its robust-ness in classification and simplicity of application In thisstrategy SVM is trained for classifying any two kinds offacial expression Thus six expressions require training 15binary SVM classifiers (fear versus anger happiness versusdisgust and surprise versus sadness among others) Finalresult is expression classification with highest number ofvotes However occasionally class with most votes is not theonly result In this case nearest neighbor classifier is appliedto these classes to obtain one final result

Table 1 Number of subjects for each expression class in threedatasets

Expression CK+ MMI Oulu (each condition)Anger 45 32 80Disgust 59 28 80Fear 25 28 80Happiness 69 42 80Sadness 28 32 80Surprise 83 41 80Total 309 203 480

4 Experiments

41 Facial Expression Datasets CK+ dataset [14] is extendedfrom CK dataset [43] which was built in 2000 it is availableand most widely used dataset for evaluating performance offacial expression recognition systemsThis dataset consists of593 video sequences including seven basic facial expressionsperformed by 123 participants Ages of subjects range from18 to 30 years 35 are male 13 are Afro-American eighty-one percent are Euro-American and six percent are peopleof other race In video sequences image of each frame is640 times 490 or 640 times 480 pixels Most frames are gray imageswith eight-bit precision for grayscale values Video sequencescontain frames from neutral phase to peak phase of facialexpressions and video rate is 30 frames per second Inour study 309 image sequences containing 97 subjects withsix expressions (anger disgust fear happiness sadness andsurprise) are selected for our experiments Each subject hasone to six expressions Top row of Figure 6 shows peak framesof six expressions from six subjects in CK+ dataset Table 1shows number of subjects for each expression class in CK+dataset in our experiment

MMI dataset [44] is applied to evaluate performance ofexpression recognition methods this dataset contains 203video sequences including different head poses and subtleexpressions These expression sequences were performed by19 participants with ages ranging from 19 to 62 years Subjectsare Asian European and South American Male participantsaccount for 40 of subjects Different from CK+ datasetMMI dataset contains six basic facial expressions comprisingneutral onset apex and offset frames In video sequencesimage of each frame is 720 times 576 pixels with RGB full colorOriginal frames are processed into images with eight-bit

Mathematical Problems in Engineering 7

Table 2 Classification results of using spatial-temporal motion LBP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 1 0 20 2 2 0 800Happiness 0 0 0 69 0 0 100Sadness 4 0 2 0 21 1 750Surprise 0 0 2 1 0 80 964Overall 941

Table 3 Classification results of using Gabor multiorientation fusion histogram on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 40 2 0 0 3 0 889Disgust 2 56 0 1 0 0 949Fear 1 0 17 3 4 0 680Happiness 0 0 0 69 0 0 100Sadness 4 0 0 0 24 0 857Surprise 0 1 0 0 0 82 988Overall 932

Angry Disgust FearHappiness Sadness Surprise

Figure 6 Sample images in CK+ dataset and MMI dataset

precision for grayscale values for our study and extractedframes from neutral to apex phases of facial expressionsBottom row of Figure 6 shows peak frames of six expressionsfrom six subjects in MMI dataset Table 1 shows number ofeach expression class in MMI dataset in our experiment Asseen from Table 1 MMI dataset includes fewer subjects thanCK+ dataset

Oulu-CASIAVIS facial expression database [24] includessix basic expressions from 80 subjects between 23 to 58years old 262 of the subjects are females All the imagesequences were taken under the visible light condition Oulu-CASIA VIS contains three subsets All expressions in thethree subsets were respectively captured in three different

illumination conditions normal weak and darkThe imagesof normal illumination condition are captured by usinggood normal lighting Weak illumination means that onlycomputer display is on and each subject sits in front of thedisplay Dark illumination means almost darkness Videosequences contain frames fromneutral phase to peak phase offacial expressions and video rate is 25 frames per secondTheface regions of six expressions in Oulu-CASIA VIS datasetare shown in Figure 7 (top row normal middle row weakand bottom row dark)The resolution of the images in Oulu-CASIA VIS dataset (320 times 240) is lower than that of theimages in CK+ and MMI datasets The number of videosequences is 480 for each illumination condition Table 1

8 Mathematical Problems in Engineering

Anger Disgust FearHappiness Sadness Surprise

Figure 7 Sample images in Oulu-CASIA VIS dataset

shows number of each expression class in Oulu-CASIA VISdataset in our experiment

42 Experimental Results on CK+ Dataset Three sets of ex-periments are conducted to evaluate performance of ourmethod on CK+ dataset because head motions of subjectsin CK+ dataset are smaller than that in other two datasetsThen in this paper performance of framework proposedis compared with seven state-of-the-art methods Leave-one-out cross validation strategy is applied in these setsof experiments The punishment coefficient 119862 and kernelparameters 120574 of SVM forCK+dataset are 1048576 and 372 times10minus9 respectively

In first set of experiments contributions of three LBPhistograms on three planes (three components of LBPTOP)are investigated separately to determine accuracy of facerecognition Figure 8 presents recognition rates of classifyingsix basic expressions using LBP histograms from three indi-vidual planes (ie LBP-XY LBP-XT and LBP-YT) and theircombination (ie LBPTOP) The figure indicates that usingLBPTOP yields the highest recognition rate (945) LBP-YTdescribes better variation in texture along vertical direction(935) and lowest performance is attributed to LBP-XYcontaining appearance characteristics of different subjects(822) Then we investigate the performance of spatial-temporalmotion LBP feature which joins the LBP histogramson XT and YT planes together and compare with LBPTOPBased on results shown in Figure 9 recognition rates are

LBP-XY LBP-XT LBP-YT LBPTOP60

65

70

75

80

85

90

95

100

Reco

gniti

on ra

te (

)

822

919935 945

Figure 8 Recognition rates of methods using LBP histograms fromthree planes and LBPTOP

almost similar Furthermore spatial-temporal motion LBPfeature has higher recognition rate than LBPTOP in terms ofanger and disgust Results shown in Figures 8 and 9mean thefollowing (1) variation of texture onXT andYTplane ismoresignificant than XY plane in expression sequence (2) com-pared with LBPTOP dimension of spatial-temporal motionLBP feature is reduced by 13 but recognition accuracy is notdecreased

Mathematical Problems in Engineering 9

Table 4 Classification results of using fusion features on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 0 0 22 0 2 1 880Happiness 0 0 0 69 0 0 100Sadness 3 0 1 0 24 0 857Surprise 0 0 2 1 0 80 964Overall 958

Table 5 Comparison results of using fusion feature and LBPTOP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 933 (+22) 0 0 0 67 0Disgust 0 100 (00) 0 0 0 0Fear 0 0 880 (00) 0 8 4Happiness 0 0 0 100 (00) 0 0Sadness 107 0 36 0 867 (+81) 0Surprise 0 0 24 12 0 964 (00)Overall 958 (+13)

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

100

Reco

gniti

on ra

te (

)

Spatial-temporal LBPLBPTOP

Figure 9 Recognition rates of methods using LBPTOP and spatial-temporal motion LBP

Second set of experiments evaluates effectiveness of com-bining spatial-temporal motion LBP with Gabor multiorien-tation fusion histogram Tables 2ndash4 show results obtained byusing spatial-temporal motion LBP Gabor multiorientationfusion histogram and proposed framework with multiclassSVM classifier Results include recognition rate of eachexpression and overall classifications accuracy Each blackdigital represents number of correctly classified subjects ofeach expression category and each dark and bolder digitalimage represents number of misclassified subjects of each

expression category Results show that when using combina-tion of spatial-temporalmotion LBP andGabormultiorienta-tion fusion histogram framework performs better than usingindividual features

Third set of experiments compares performance betweenproposedmethods and approach using LBPTOP [22] Resultsin Table 5 show recognition rate of method using combina-tion of spatial-temporal motion LBP with Gabor multiori-entation fusion histogram (ie proposed framework) wherefigures in parentheses represent difference in accuracy inusing LBPTOP and features proposed in this paper positivefigure shows better accuracy of proposed features Based onTable 5 proposed framework performs better in two expres-sions and same in others Furthermore overall recognitionrate is slightly better when using proposed method There-foremethod fusing spatial-temporalmotion LBPwithGabormultiorientation fusion histogram outperforms LBPTOP

In this set experiment we compared our method withsystems proposed by Eskil and Benli [13] Lucey et al [14]Chew et al [15] Fan and Tjahjadi [16] and Jain et al [17]on CK+ dataset The datasets used in these papers are thesame and identical to the dataset adopted in our studyTherefore results presented in these papers can be directlycompared with our method Methods proposed by Eskiland Benli [13] Lucey et al [14] Chew et al [15] and Fanand Tjahjadi [16] used leave-one-subject-out cross validationmethod and technology proposed by Jain et al [17] usedtenfold cross validation Based on results shown in Table 6proposed framework shows average accuracy of 958 for allsix basic facial expressions performing better than methodsdeveloped by mentioned authors in this paragraph

43 Experimental Results onMMIDataset Sevenmethods in[18ndash20] are used to perform quantitative comparisons with

10 Mathematical Problems in Engineering

Table 6 Comparison of results in proposed framework and sevenstate-of-the-art methods on CK+ dataset

Research Methodology AccuracyEskil and Benli [13] High-polygon wireframe mode 850Lucey et al [14] AAM shape 689

AAM appearance 845Combined 887

Chew et al [15] Uni-hyperplane classification 894Fan and Tjahjadi [16] PHOGTOP and optical flow 909Jain et al [17] Temporal modeling of shapes 919Proposed 958

Table 7 Comparison result of proposed framework and sevensystems on MMI dataset

Research Methodology Accuracy ()Shan et al [18] LBP 4778Wang et al [19] AdaBoost 478

HMM 515ITBN 597

Fang et al [20] Active shape model 6238Active appearance model 6435Gabor + Geometry + SVM 7067

Proposed 7192

our framework on MMI dataset The punishment coefficient119862 and kernel parameters 120574 of SVM for MMI dataset are2097152 and 931 times 10minus10 respectively Datasets used inthese papers are identical to the dataset adapted in ourstudy In this experiment proposed framework performstenfold cross validation Shan et al [18] and Fang et al [20]developed methods by conducting experiments using thesame strategy we employed Methods proposed by Wang etal [19] performed twentyfold cross validation and methodsusing active shape model [20] and active appearance model[20] performed twofold cross validation results are shownin Table 7 Table 7 indicates that performance of proposedmethod onMMI dataset is inferior to CK+ dataset because ofsubtler expressions notable changes in head rotation angleand fewer training data However proposed frameworkstill performs better than the other seven state-of-the-artmethods

44 Experimental Results on Oulu-CASIA VIS Dataset Inthis section we evaluated our framework on Oulu-CASIAVIS dataset In these set experiments proposed frameworkperforms tenfold cross validation The punishment coeffi-cient 119862 and kernel parameters 120574 of SVM for this dataset are1048507 and 933 times 10minus10 respectively First our methodis tested on three subsets of Oulu-CASIA VIS respectivelyThe experimental results are shown in Figure 10 As shownin Figure 10 the recognition rate obtained for the normalillumination condition is highest and the recognition rateobtained for the dark illumination condition is worst Evi-dently the facial images captured in the weak or dark illu-mination are often missing much useful texture information

Table 8 Comparison result of proposed framework and fourmethods on Oulu-CASIA VIS dataset

Research Methodology Accuracy ()Scovanner et al [21] 3D SIFT 5583Zhao and Pietikainen [22] LBPTOP 6813Klaser et al [23] HOG3D 7063Zhao et al [24] AdaLBP 7354Proposed 7437

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

Reco

gniti

on ra

te (

)

DarkWeakNormal

Figure 10 Recognition results of proposed methods on threesubsets of Oulu-CASIA VIS dataset

for facial expression recognition Then four methods in [21ndash24] are used to perform quantitative comparisons with ourframework on the subset obtained in normal illuminationconditions The experimental results are shown in Table 8Table 8 indicates that performance of proposed method onOulu-CASIA VIS dataset is inferior to CK+ dataset but supe-rior to MMI dataset Proposed framework still outperformsthe other four state-of-the-art methods

45 Computational Complexity Evaluation In this sectioncomputational complexity of our method is analyzed inCK+ MMI and Oulu-CASIA VIS dataset Processing time isapproximated to be time needed for preprocessing extractingspatial-temporal motion LBP and Gabor multiorientationfusion histogram and classifying each image frame of videosequence Runtime (measured using computer system clock)is estimated by usingmixed-language programming based onMATLAB andC++ on an Intel (R) Core (TM) i7-6700CPU at340GHz with 8GB RAM running onWindows 10 operatingsystem Average processing time per image is under 280 286and 249ms for CK+ MMI and Oulu-CASIA VIS datasetsrespectively Then we compare computational time of ourmethodwith system using LBPTOP [22] Comparison resultsin Table 9 show that computational time of our method islower than system using LBPTOP in three datasets Resultsshow that our method is more efficient and effective than

Mathematical Problems in Engineering 11

Table 9 Runtime of our method and LBPTOP in CK+ MMI andOulu-CASIA VIS datasets

Research CK+ MMI OuluLBPTOP 349ms 357ms 317msproposed 280ms 286ms 249ms

LBPTOP-based technology for facial expression recognitionin CK+ MMI and Oulu-CASIA VIS datasets

5 Conclusion

This paper presents novel facial expression recognitionframework integrating spatial-temporal motion LBP withGabor multiorientation fusion histogram Framework com-prises preprocessing (face recognition and points locationand tracking) dynamic and static feature extraction andfeature classification outperforming seven state-of-the-artmethods on CK+ dataset seven other methods on MMIdataset and four methods on Oulu-CASIA VIS datasetExpressions of disgust and happiness are easier to classifythan other expressions demonstrating better performance ofproposed method However recognition rates are lower forfear (880) and sadness (867) on CK+ dataset comparedwith others because most anger expressions are misclassifiedas sadness and vice-versa A lamination of proposed methodis that the occlusions head rotations and illumination canreduce accuracy of expression recognition these phenomenawill be solved in our future work A framework integratingother useful information (pupil head and body movements)with the features extracted in this paper could perform betterThis issue will also be studied in our future work Theproposed method cannot achieve real-time facial expressionrecognition We will attempt to establish an efficient dimen-sionality reduction method to reduce the computationalcomplexity of our framework in the future Furthermoreproposed system will be improved and applied in study ofhuman drowsiness expression analysis in our further work

Competing Interests

The authors declare no conflict of interests

Acknowledgments

This work was supported by the Open Foundation of StateKey Laboratory of Automotive Simulation and Control(China Grant no 20161105)

References

[1] P Ekman and E L Rosenberg What the Face Reveals Basicand Applied Studies of Spontaneous Expression Using the FacialAction Coding System Oxford University Press Oxford UK2005

[2] J A Russell J M Fernandez-Dols and G Mandler The Psy-chology of Facial Expression Cambridge University Press Cam-bridge UK 1997

[3] P Ekman and W V Friesen ldquoConstants across cultures in theface and emotionrdquo Journal of Personality and Social Psychologyvol 17 no 2 pp 124ndash129 1971

[4] B Golomb and T Sejnowski ldquoBenefits of machine understand-ing of facial expressionsrdquo in NSF ReportmdashFacial ExpressionUnderstanding pp 55ndash71 1997

[5] M Pantic ldquoFace for ambient interfacerdquo in Ambient Intelligencein Everyday Life vol 3864 of Lecture Notes on Artificial Intelli-gence pp 32ndash66 Springer Berlin Germany 2006

[6] R-L Hsu M Abdel-Mottaleb and A K Jain ldquoFace detectionin color imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 5 pp 696ndash706 2002

[7] H Fang and N Costen ldquoFrom Rank-N to Rank-1 face recog-nition based on motion similarityrdquo in Proceedings of the 20thBritish Machine Vision Conference (BMVC rsquo09) pp 1ndash11 Lon-don UK September 2009

[8] C Cornelis M D Cock and A M Radzikowska ldquoVaguelyquantified rough setsrdquo in Rough Sets Fuzzy Sets Data Miningand Granular Computing 11th International Conference RSFD-GrC 2007 Toronto Canada May 14ndash16 2007 Proceedings vol4482 of Lecture Notes in Computer Science pp 87ndash94 SpringerBerlin Germany 2007

[9] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[10] M Pantic and L UM Rothkrantz ldquoAutomatic analysis of facialexpressions the state of the artrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 22 no 12 pp 1424ndash14452000

[11] B Fasel and J Luettin ldquoAutomatic facial expression analysis asurveyrdquo Pattern Recognition vol 36 no 1 pp 259ndash275 2003

[12] C P Sumathi T Santhanam and M Mahadevi ldquoAutomaticfacial expression analysis a surveyrdquo International Journal ofComputer Science amp Engineering Survey vol 3 no 6 pp 47ndash592012

[13] M T Eskil and K S Benli ldquoFacial expression recognition basedon anatomyrdquo Computer Vision and Image Understanding vol119 pp 1ndash14 2014

[14] P Lucey J F Cohn T Kanade J Saragih Z Ambadar andI Matthews ldquoThe extended Cohn-Kanade dataset (CK+) acomplete dataset for action unit and emotion-specified expres-sionrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPRW rsquo10) pp94ndash101 San Francisco Calif USA June 2010

[15] S W Chew S Lucey P Lucey S Sridharan and J F ConnldquoImproved facial expression recognition via uni-hyperplaneclassificationrdquo in Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR rsquo12) pp 2554ndash2561IEEE June 2012

[16] X Fan and T Tjahjadi ldquoA spatial-temporal framework basedon histogram of gradients and optical flow for facial expressionrecognition in video sequencesrdquoPatternRecognition vol 48 no11 pp 3407ndash3416 2015

[17] S Jain CHu and J K Aggarwal ldquoFacial expression recognitionwith temporal modeling of shapesrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 1642ndash1649 November 2011

[18] C Shan S Gong and P W McOwan ldquoFacial expression rec-ognition based on Local Binary Patterns A ComprehensiveStudyrdquo Image and Vision Computing vol 27 no 6 pp 803ndash8162009

12 Mathematical Problems in Engineering

[19] Z Wang S Wang and Q Ji ldquoCapturing complex spatio-temporal relations among facial muscles for facial expressionrecognitionrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo13) pp 3422ndash3429 IEEE Portland Ore USA June 2013

[20] H Fang N Mac Parthalain A J Aubrey et al ldquoFacial ex-pression recognition in dynamic sequences an integrated ap-proachrdquo Pattern Recognition vol 47 no 3 pp 1271ndash1281 2014

[21] P Scovanner S Ali andM Shah ldquoA 3-dimensional sift descrip-tor and its application to action recognitionrdquo in Proceedingsof the 15th ACM International Conference on Multimedia (MMrsquo07) pp 357ndash360 ACM Augsburg Germany September 2007

[22] G Zhao and M Pietikainen ldquoDynamic texture recognitionusing local binary patterns with an application to facial expres-sionsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 29 no 6 pp 915ndash928 2007

[23] A Klaser M Marszałek and C Schmid ldquoA spatio-temporaldescriptor based on 3D-gradientsrdquo in Proceedings of the 19thBritish Machine Vision Conference (BMVC rsquo08) pp 275ndash281Leeds UK September 2008

[24] G Zhao XHuangMTaini S Z Li andM Pietikainen ldquoFacialexpression recognition from near-infrared videosrdquo Image andVision Computing vol 29 no 9 pp 607ndash619 2011

[25] P Ekman W V Friesen and J C Hager Facial Action CodingSystem A Human Face Salt Lake City Utah USA 2002

[26] M F Valstar and M Pantic ldquoFully automatic recognition ofthe temporal phases of facial actionsrdquo IEEE Transactions onSystems Man and Cybernetics Part B Cybernetics vol 42 no1 pp 28ndash43 2012

[27] Y Zhang and Q Ji ldquoActive and dynamic information fusion forfacial expression understanding from image sequencesrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol27 no 5 pp 699ndash714 2005

[28] M Pantic and I Patras ldquoDynamics of facial expression recog-nition of facial actions and their temporal segments from faceprofile image sequencesrdquo IEEE Transactions on Systems Manand Cybernetics Part B Cybernetics vol 36 no 2 pp 433ndash4492006

[29] Y Tong W Liao and Q Ji ldquoFacial action unit recognitionby exploiting their dynamic and semantic relationshipsrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol29 no 10 pp 1683ndash1699 2007

[30] Y Zhang Q Ji Z Zhu and B Yi ldquoDynamic facial expressionanalysis and synthesis with MPEG-4 facial animation param-etersrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 18 no 10 pp 1383ndash1396 2008

[31] G Donate M S Bartlett J C Hager P Ekman and T JSejnowski ldquoClassifying facial actionsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 21 no 10 pp974ndash989 1999

[32] Y Li S M Mavadati M H Mahoor Y Zhao and Q JildquoMeasuring the intensity of spontaneous facial action units withdynamic Bayesian networkrdquo Pattern Recognition vol 48 no 11pp 3417ndash3427 2015

[33] J J-J Lien T Kanade J F Cohn and C-C Li ldquoDetectiontracking and classification of action units in facial expressionrdquoRobotics and Autonomous Systems vol 31 no 3 pp 131ndash1462000

[34] M F Valstar and M Pantic ldquoCombined support vector ma-chines and hidden Markov models for modeling facial action

temporal dynamicsrdquo Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence and LectureNotes in Bioinformatics) vol 4796 pp 118ndash127 2007

[35] T Danisman I M Bilasco J Martinet and C DjerabaldquoIntelligent pixels of interest selection with application to facialexpression recognition using multilayer perceptronrdquo SignalProcessing vol 93 no 6 pp 1547ndash1556 2013

[36] W-L Chao J-J Ding and J-Z Liu ldquoFacial expression recog-nition based on improved local binary pattern and class-regularized locality preserving projectionrdquo Signal Processingvol 117 pp 1ndash10 2015

[37] I Kotsia and I Pitas ldquoFacial expression recognition in imagesequences using geometric deformation features and supportvector machinesrdquo IEEE Transactions on Image Processing vol16 no 1 pp 172ndash187 2007

[38] P Viola and M J Jones ldquoRobust Real-Time Face DetectionrdquoInternational Journal of Computer Vision vol 57 no 2 pp 137ndash154 2004

[39] A Asthana S Zafeiriou S Cheng andM Pantic ldquoIncrementalface alignment in the wildrdquo in Proceedings of the 27th IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo14) pp 1859ndash1866 IEEE Columbus Ohio USA June 2014

[40] T Ojala M Pietikainen and D Harwood ldquoA comparativestudy of texture measures with classification based on featureddistributionsrdquo Pattern Recognition vol 29 no 1 pp 51ndash59 1996

[41] S-S Liu Y-T Tian and CWan ldquoFacial expression recognitionmethod based on Gabor multi-orientation features fusion andblock histogramrdquo Zidonghua XuebaoActa Automatica Sinicavol 37 no 12 pp 1455ndash1463 2011

[42] C-W Hsu and C-J Lin ldquoA comparison of methods for mul-ticlass support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[43] T Kanade J F Cohn and Y Tian ldquoComprehensive database forfacial expression analysisrdquo in Proceedings of the 4th IEEE Inter-national Conference on Automatic Face and Gesture Recognition(FG rsquo00) pp 46ndash53 Grenoble France March 2000

[44] M Pantic M Valstar R Rademaker and L Maat ldquoWeb-baseddatabase for facial expression analysisrdquo in Proceedings of theIEEE International Conference on Multimedia and Expo (ICMErsquo05) pp 317ndash321 Amsterdam Netherlands July 2005

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Facial Expression Recognition from Video Sequences Based ...downloads.hindawi.com/journals/mpe/2017/7206041.pdf · Facial Expression Recognition from Video Sequences Based on Spatial-Temporal

4 Mathematical Problems in Engineering

Video Preprocessing

Gabor multiorientation fusion histogram

Spatial-temporal LBP motion histogram

X

Y

T

Expression classification

Peak frame

Image sequences

middot middot middot

middot middot middot

Figure 1 Proposed framework for facial recognition

minus3minus2

minus1 01

23

minus3minus2

minus10

12

3minus3

minus2

minus1

0

1

2

3

XT

Y

Figure 2 Example of LBP code using radii three and eightneighboring points on three planes

where 119899119895 is numbers of different labels produced by LBPoperator in 119895th plane (119895 = 0 XY 1 XT and 2 YT) and119891119895(119909 119910 119905) represents LBP code of center pixel (119909 119910 119905) in 119895thplane

As shown in formula (1) and Figure 2 dynamic texturein video sequence can be extracted by incorporating LBPhistogram on XY plane (LBP-XY) reflecting informationof spatial domain and LBP histograms on XT and YTplanes representing spatial-temporal motion information inhorizontal and vertical directions respectively

All LBPTOP components are LBP histograms extractedfrom three orthogonal planes and have equal contribution infacial expression recognition [22] However not all LBPTOPcomponents are significant LBP-XY includes characteristicsof different subjects resulting in difficulties in accurate ex-pression recognition LBP-XT and LBP-YT contain mostinformation on spatial-temporal texture changes of facial

expression Consequently as shown in Figure 3 LBP-XYis abandoned and LBP-XT and LBP-XT are merged toconstruct spatial-temporal motion LBP feature which isapplied in facial expression recognition in our study In thispaper LBP code is used and includes radii three and eightneighboring points on each plane moreover image of eachframe is equally divided into 8 times 9 rectangular subblocks withoverlap ratio of 70

33 Gabor Multiorientation Fusion Histogram Gabor wa-velet is well-known descriptor representing texture informa-tion of an image Gabor filter can be calculated by Gaus-sian kernel multiplied by sinusoidal plane Gabor feature ishighly capable of describing facial textures used in differ-ent research such as identity recognition feature locationand facial expression recognition However Gabor performsweakly in characterizing global features and feature dataare redundant Figure 4 shows extracted features based onGabor multiorientation fusion histogram proposed in [41]these features are used in reducing feature dimension andimproving recognition accuracy

First Gabor filter of five scales and eight orientations isused to transform images The following represents multi-scale and multiorientation features of pixel point 119911(119909 119910) inimages

119866119906V (119911) 119906 isin (0 7) V isin (0 4) (2)

where 119906 and V denote orientations and scales and 119866119906V(119911) isnorm of Gabor features

Transformation of each original image produces 40 cor-responding images with different scales and orientationsDimension of obtained feature is 40 times as high as that oforiginal facial imageTherefore on the same scale features ofeight orientations are fused according to a certain rule Fusionrules are as follows

119879V (119911) = argmax119906

1003817100381710038171003817119866119906V (119911)1003817100381710038171003817

119906 isin (0 7) V isin (0 4) (3)

Mathematical Problems in Engineering 5

X

Y

T

(a) (b)LBP-YT LBP-XT

Figure 3 Features in each block sequence (a) Block sequence and (b) LBP histograms from XT and YT planes

Gabor features in eight orientations and five scales

Orientation feature fusion

Fusion features infive scales

Block

Histogram

Histogram

middot middot middot

middot middot middot

middot middot middot

Figure 4 Feature extraction procedure based on Gabor multiorientation fusion histogram

where 119879V(119911) isin [1 8] is encoding value of pixel point 119911(119909 119910)and 119866119906V(119911) is Gabor features

Here orientation of local area is evaluated by orien-tation code of maximum value of Gabor features of eightorientations Each encoding value 119879V(119911) represents one localorientation Finally each expression image is transformedinto five scales of images and each scale fuses eight orienta-tions Fusion images are shown in Figure 5 As can be seenfrom figure fused images retain the most obvious changeinformation of Gabor subfiltering Therefore Gabor fusionfeature shows high discrimination for local texture changes

Histogram can describe global features of images How-ever many structural details maybe lost when histogram ofentire image is directly calculated Thus each fusion image isdivided into 8 times 8 nonoverlapping and equal rectangular sub-blocks Accordingly histogram distributions of all subblocksare calculated according to formula (4) and combined toconstruct Gabor multiorientation fusion histogram of wholeimage

ℎV119903119894 = sum119911

119868 (119877V119903 (119911) = 119894) 119894 = 0 8 (4)

where 119877V119903(119911) is encoding value in point 119911(119909 119910) of 119894th sub-block

As obtained by coding five scales onmultiple directions ofGabor feature Gabor multiorientation fusion histogram notonly inherits Gabor wavelet advantage capturing correspond-ing spatial frequency (scale) spatial location and orientationselectivity but also reduces redundancy of feature data andcomputation complexity

34 Multifeature Fusion Features extracted in Sections 32and 33 are scaled to [minus1 1] and integrated into one vectorusing the following equation

119865 = 120572119875 + (1 minus 120572)119876 0 le 120572 le 1 (5)

where 119875 is spatial-temporal motion LBP histogram and 119876 isGabor multiorientation fusion histogram 119865 is fusion vectorof119875 and119876 and is feature vector for final prediction Parameter120572 usually depends on performance of each feature For allexperiments in this paper we set value of 120572 to 05 whileconsidering experiments results shown in Tables 2 and 3

35 Classifier Many classifiers are used for facial expressionrecognition these classifiers include J48 based on ID3nearest neighbor based on fuzzy-rough sets random forest[20] SVM[22] andHMM[29] In this paper SVM is selected

6 Mathematical Problems in Engineering

Figure 5 Fusion images on five scales

in our framework because of the following properties (1)classifier based on statistical learning theory (2) high gen-eralization performance and (3) ability to deal with featurehaving high dimensions

To achieve more effective classification of facial expres-sions SVM classifier with RBF kernel is adapted Given atraining set of instance-label pairs (119909119894 119910119894) the SVM requiresthe solution of the following optimization problem

min119908119887120585

12119908119879119908 + 119862

119897

sum119894=1

120585119894

subject to 119910119894 (119908119879120601 (119909119894) + 119887) ge 1 minus 120585119894

120585119894 ge 0

(6)

where 119909119894 are training samples 119897 is the number of trainingset 119910119894 are the labels of 119909119894 119862 is punishment coefficient thatprevents overfitting 120585119894 is relaxation variable and 119908 and 119887 arethe parameters of the classifier Here training vectors 119909119894 aremapped into a higher dimensional space by using function120601(119909119894) SVM finds a linear separating hyperplane with themaximalmargin in the higher dimensional space119870(119909119894 119909119895) equiv120601(119909119894)119879120601(119909119895) is the kernel function of SVM The RBF kernelfunction is shown as follows

119870(119909119894 119909119895) = exp (minus120574 10038171003817100381710038171003817119909119894 minus 119909119895100381710038171003817100381710038172) 120574 gt 0 (7)

where 120574 is parameter of RBF kernel In this paper grid-searchstrategy and tenfold cross validation [42] are used to selectand estimate punishment coefficient119862 and kernel parameters120574 of SVM SVM is binary classifier used only in two setsof feature spaces of objects Facial expression recognitionis multiclass problem Thus multiclassifier which consistsof binary SVM classifiers is constructed to classify facialexpressions [42] Two strategies are commonly used one-versus-one strategy and one-versus-all strategy One-versus-one strategy is applied in our method because of its robust-ness in classification and simplicity of application In thisstrategy SVM is trained for classifying any two kinds offacial expression Thus six expressions require training 15binary SVM classifiers (fear versus anger happiness versusdisgust and surprise versus sadness among others) Finalresult is expression classification with highest number ofvotes However occasionally class with most votes is not theonly result In this case nearest neighbor classifier is appliedto these classes to obtain one final result

Table 1 Number of subjects for each expression class in threedatasets

Expression CK+ MMI Oulu (each condition)Anger 45 32 80Disgust 59 28 80Fear 25 28 80Happiness 69 42 80Sadness 28 32 80Surprise 83 41 80Total 309 203 480

4 Experiments

41 Facial Expression Datasets CK+ dataset [14] is extendedfrom CK dataset [43] which was built in 2000 it is availableand most widely used dataset for evaluating performance offacial expression recognition systemsThis dataset consists of593 video sequences including seven basic facial expressionsperformed by 123 participants Ages of subjects range from18 to 30 years 35 are male 13 are Afro-American eighty-one percent are Euro-American and six percent are peopleof other race In video sequences image of each frame is640 times 490 or 640 times 480 pixels Most frames are gray imageswith eight-bit precision for grayscale values Video sequencescontain frames from neutral phase to peak phase of facialexpressions and video rate is 30 frames per second Inour study 309 image sequences containing 97 subjects withsix expressions (anger disgust fear happiness sadness andsurprise) are selected for our experiments Each subject hasone to six expressions Top row of Figure 6 shows peak framesof six expressions from six subjects in CK+ dataset Table 1shows number of subjects for each expression class in CK+dataset in our experiment

MMI dataset [44] is applied to evaluate performance ofexpression recognition methods this dataset contains 203video sequences including different head poses and subtleexpressions These expression sequences were performed by19 participants with ages ranging from 19 to 62 years Subjectsare Asian European and South American Male participantsaccount for 40 of subjects Different from CK+ datasetMMI dataset contains six basic facial expressions comprisingneutral onset apex and offset frames In video sequencesimage of each frame is 720 times 576 pixels with RGB full colorOriginal frames are processed into images with eight-bit

Mathematical Problems in Engineering 7

Table 2 Classification results of using spatial-temporal motion LBP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 1 0 20 2 2 0 800Happiness 0 0 0 69 0 0 100Sadness 4 0 2 0 21 1 750Surprise 0 0 2 1 0 80 964Overall 941

Table 3 Classification results of using Gabor multiorientation fusion histogram on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 40 2 0 0 3 0 889Disgust 2 56 0 1 0 0 949Fear 1 0 17 3 4 0 680Happiness 0 0 0 69 0 0 100Sadness 4 0 0 0 24 0 857Surprise 0 1 0 0 0 82 988Overall 932

Angry Disgust FearHappiness Sadness Surprise

Figure 6 Sample images in CK+ dataset and MMI dataset

precision for grayscale values for our study and extractedframes from neutral to apex phases of facial expressionsBottom row of Figure 6 shows peak frames of six expressionsfrom six subjects in MMI dataset Table 1 shows number ofeach expression class in MMI dataset in our experiment Asseen from Table 1 MMI dataset includes fewer subjects thanCK+ dataset

Oulu-CASIAVIS facial expression database [24] includessix basic expressions from 80 subjects between 23 to 58years old 262 of the subjects are females All the imagesequences were taken under the visible light condition Oulu-CASIA VIS contains three subsets All expressions in thethree subsets were respectively captured in three different

illumination conditions normal weak and darkThe imagesof normal illumination condition are captured by usinggood normal lighting Weak illumination means that onlycomputer display is on and each subject sits in front of thedisplay Dark illumination means almost darkness Videosequences contain frames fromneutral phase to peak phase offacial expressions and video rate is 25 frames per secondTheface regions of six expressions in Oulu-CASIA VIS datasetare shown in Figure 7 (top row normal middle row weakand bottom row dark)The resolution of the images in Oulu-CASIA VIS dataset (320 times 240) is lower than that of theimages in CK+ and MMI datasets The number of videosequences is 480 for each illumination condition Table 1

8 Mathematical Problems in Engineering

Anger Disgust FearHappiness Sadness Surprise

Figure 7 Sample images in Oulu-CASIA VIS dataset

shows number of each expression class in Oulu-CASIA VISdataset in our experiment

42 Experimental Results on CK+ Dataset Three sets of ex-periments are conducted to evaluate performance of ourmethod on CK+ dataset because head motions of subjectsin CK+ dataset are smaller than that in other two datasetsThen in this paper performance of framework proposedis compared with seven state-of-the-art methods Leave-one-out cross validation strategy is applied in these setsof experiments The punishment coefficient 119862 and kernelparameters 120574 of SVM forCK+dataset are 1048576 and 372 times10minus9 respectively

In first set of experiments contributions of three LBPhistograms on three planes (three components of LBPTOP)are investigated separately to determine accuracy of facerecognition Figure 8 presents recognition rates of classifyingsix basic expressions using LBP histograms from three indi-vidual planes (ie LBP-XY LBP-XT and LBP-YT) and theircombination (ie LBPTOP) The figure indicates that usingLBPTOP yields the highest recognition rate (945) LBP-YTdescribes better variation in texture along vertical direction(935) and lowest performance is attributed to LBP-XYcontaining appearance characteristics of different subjects(822) Then we investigate the performance of spatial-temporalmotion LBP feature which joins the LBP histogramson XT and YT planes together and compare with LBPTOPBased on results shown in Figure 9 recognition rates are

LBP-XY LBP-XT LBP-YT LBPTOP60

65

70

75

80

85

90

95

100

Reco

gniti

on ra

te (

)

822

919935 945

Figure 8 Recognition rates of methods using LBP histograms fromthree planes and LBPTOP

almost similar Furthermore spatial-temporal motion LBPfeature has higher recognition rate than LBPTOP in terms ofanger and disgust Results shown in Figures 8 and 9mean thefollowing (1) variation of texture onXT andYTplane ismoresignificant than XY plane in expression sequence (2) com-pared with LBPTOP dimension of spatial-temporal motionLBP feature is reduced by 13 but recognition accuracy is notdecreased

Mathematical Problems in Engineering 9

Table 4 Classification results of using fusion features on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 0 0 22 0 2 1 880Happiness 0 0 0 69 0 0 100Sadness 3 0 1 0 24 0 857Surprise 0 0 2 1 0 80 964Overall 958

Table 5 Comparison results of using fusion feature and LBPTOP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 933 (+22) 0 0 0 67 0Disgust 0 100 (00) 0 0 0 0Fear 0 0 880 (00) 0 8 4Happiness 0 0 0 100 (00) 0 0Sadness 107 0 36 0 867 (+81) 0Surprise 0 0 24 12 0 964 (00)Overall 958 (+13)

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

100

Reco

gniti

on ra

te (

)

Spatial-temporal LBPLBPTOP

Figure 9 Recognition rates of methods using LBPTOP and spatial-temporal motion LBP

Second set of experiments evaluates effectiveness of com-bining spatial-temporal motion LBP with Gabor multiorien-tation fusion histogram Tables 2ndash4 show results obtained byusing spatial-temporal motion LBP Gabor multiorientationfusion histogram and proposed framework with multiclassSVM classifier Results include recognition rate of eachexpression and overall classifications accuracy Each blackdigital represents number of correctly classified subjects ofeach expression category and each dark and bolder digitalimage represents number of misclassified subjects of each

expression category Results show that when using combina-tion of spatial-temporalmotion LBP andGabormultiorienta-tion fusion histogram framework performs better than usingindividual features

Third set of experiments compares performance betweenproposedmethods and approach using LBPTOP [22] Resultsin Table 5 show recognition rate of method using combina-tion of spatial-temporal motion LBP with Gabor multiori-entation fusion histogram (ie proposed framework) wherefigures in parentheses represent difference in accuracy inusing LBPTOP and features proposed in this paper positivefigure shows better accuracy of proposed features Based onTable 5 proposed framework performs better in two expres-sions and same in others Furthermore overall recognitionrate is slightly better when using proposed method There-foremethod fusing spatial-temporalmotion LBPwithGabormultiorientation fusion histogram outperforms LBPTOP

In this set experiment we compared our method withsystems proposed by Eskil and Benli [13] Lucey et al [14]Chew et al [15] Fan and Tjahjadi [16] and Jain et al [17]on CK+ dataset The datasets used in these papers are thesame and identical to the dataset adopted in our studyTherefore results presented in these papers can be directlycompared with our method Methods proposed by Eskiland Benli [13] Lucey et al [14] Chew et al [15] and Fanand Tjahjadi [16] used leave-one-subject-out cross validationmethod and technology proposed by Jain et al [17] usedtenfold cross validation Based on results shown in Table 6proposed framework shows average accuracy of 958 for allsix basic facial expressions performing better than methodsdeveloped by mentioned authors in this paragraph

43 Experimental Results onMMIDataset Sevenmethods in[18ndash20] are used to perform quantitative comparisons with

10 Mathematical Problems in Engineering

Table 6 Comparison of results in proposed framework and sevenstate-of-the-art methods on CK+ dataset

Research Methodology AccuracyEskil and Benli [13] High-polygon wireframe mode 850Lucey et al [14] AAM shape 689

AAM appearance 845Combined 887

Chew et al [15] Uni-hyperplane classification 894Fan and Tjahjadi [16] PHOGTOP and optical flow 909Jain et al [17] Temporal modeling of shapes 919Proposed 958

Table 7 Comparison result of proposed framework and sevensystems on MMI dataset

Research Methodology Accuracy ()Shan et al [18] LBP 4778Wang et al [19] AdaBoost 478

HMM 515ITBN 597

Fang et al [20] Active shape model 6238Active appearance model 6435Gabor + Geometry + SVM 7067

Proposed 7192

our framework on MMI dataset The punishment coefficient119862 and kernel parameters 120574 of SVM for MMI dataset are2097152 and 931 times 10minus10 respectively Datasets used inthese papers are identical to the dataset adapted in ourstudy In this experiment proposed framework performstenfold cross validation Shan et al [18] and Fang et al [20]developed methods by conducting experiments using thesame strategy we employed Methods proposed by Wang etal [19] performed twentyfold cross validation and methodsusing active shape model [20] and active appearance model[20] performed twofold cross validation results are shownin Table 7 Table 7 indicates that performance of proposedmethod onMMI dataset is inferior to CK+ dataset because ofsubtler expressions notable changes in head rotation angleand fewer training data However proposed frameworkstill performs better than the other seven state-of-the-artmethods

44 Experimental Results on Oulu-CASIA VIS Dataset Inthis section we evaluated our framework on Oulu-CASIAVIS dataset In these set experiments proposed frameworkperforms tenfold cross validation The punishment coeffi-cient 119862 and kernel parameters 120574 of SVM for this dataset are1048507 and 933 times 10minus10 respectively First our methodis tested on three subsets of Oulu-CASIA VIS respectivelyThe experimental results are shown in Figure 10 As shownin Figure 10 the recognition rate obtained for the normalillumination condition is highest and the recognition rateobtained for the dark illumination condition is worst Evi-dently the facial images captured in the weak or dark illu-mination are often missing much useful texture information

Table 8 Comparison result of proposed framework and fourmethods on Oulu-CASIA VIS dataset

Research Methodology Accuracy ()Scovanner et al [21] 3D SIFT 5583Zhao and Pietikainen [22] LBPTOP 6813Klaser et al [23] HOG3D 7063Zhao et al [24] AdaLBP 7354Proposed 7437

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

Reco

gniti

on ra

te (

)

DarkWeakNormal

Figure 10 Recognition results of proposed methods on threesubsets of Oulu-CASIA VIS dataset

for facial expression recognition Then four methods in [21ndash24] are used to perform quantitative comparisons with ourframework on the subset obtained in normal illuminationconditions The experimental results are shown in Table 8Table 8 indicates that performance of proposed method onOulu-CASIA VIS dataset is inferior to CK+ dataset but supe-rior to MMI dataset Proposed framework still outperformsthe other four state-of-the-art methods

45 Computational Complexity Evaluation In this sectioncomputational complexity of our method is analyzed inCK+ MMI and Oulu-CASIA VIS dataset Processing time isapproximated to be time needed for preprocessing extractingspatial-temporal motion LBP and Gabor multiorientationfusion histogram and classifying each image frame of videosequence Runtime (measured using computer system clock)is estimated by usingmixed-language programming based onMATLAB andC++ on an Intel (R) Core (TM) i7-6700CPU at340GHz with 8GB RAM running onWindows 10 operatingsystem Average processing time per image is under 280 286and 249ms for CK+ MMI and Oulu-CASIA VIS datasetsrespectively Then we compare computational time of ourmethodwith system using LBPTOP [22] Comparison resultsin Table 9 show that computational time of our method islower than system using LBPTOP in three datasets Resultsshow that our method is more efficient and effective than

Mathematical Problems in Engineering 11

Table 9 Runtime of our method and LBPTOP in CK+ MMI andOulu-CASIA VIS datasets

Research CK+ MMI OuluLBPTOP 349ms 357ms 317msproposed 280ms 286ms 249ms

LBPTOP-based technology for facial expression recognitionin CK+ MMI and Oulu-CASIA VIS datasets

5 Conclusion

This paper presents novel facial expression recognitionframework integrating spatial-temporal motion LBP withGabor multiorientation fusion histogram Framework com-prises preprocessing (face recognition and points locationand tracking) dynamic and static feature extraction andfeature classification outperforming seven state-of-the-artmethods on CK+ dataset seven other methods on MMIdataset and four methods on Oulu-CASIA VIS datasetExpressions of disgust and happiness are easier to classifythan other expressions demonstrating better performance ofproposed method However recognition rates are lower forfear (880) and sadness (867) on CK+ dataset comparedwith others because most anger expressions are misclassifiedas sadness and vice-versa A lamination of proposed methodis that the occlusions head rotations and illumination canreduce accuracy of expression recognition these phenomenawill be solved in our future work A framework integratingother useful information (pupil head and body movements)with the features extracted in this paper could perform betterThis issue will also be studied in our future work Theproposed method cannot achieve real-time facial expressionrecognition We will attempt to establish an efficient dimen-sionality reduction method to reduce the computationalcomplexity of our framework in the future Furthermoreproposed system will be improved and applied in study ofhuman drowsiness expression analysis in our further work

Competing Interests

The authors declare no conflict of interests

Acknowledgments

This work was supported by the Open Foundation of StateKey Laboratory of Automotive Simulation and Control(China Grant no 20161105)

References

[1] P Ekman and E L Rosenberg What the Face Reveals Basicand Applied Studies of Spontaneous Expression Using the FacialAction Coding System Oxford University Press Oxford UK2005

[2] J A Russell J M Fernandez-Dols and G Mandler The Psy-chology of Facial Expression Cambridge University Press Cam-bridge UK 1997

[3] P Ekman and W V Friesen ldquoConstants across cultures in theface and emotionrdquo Journal of Personality and Social Psychologyvol 17 no 2 pp 124ndash129 1971

[4] B Golomb and T Sejnowski ldquoBenefits of machine understand-ing of facial expressionsrdquo in NSF ReportmdashFacial ExpressionUnderstanding pp 55ndash71 1997

[5] M Pantic ldquoFace for ambient interfacerdquo in Ambient Intelligencein Everyday Life vol 3864 of Lecture Notes on Artificial Intelli-gence pp 32ndash66 Springer Berlin Germany 2006

[6] R-L Hsu M Abdel-Mottaleb and A K Jain ldquoFace detectionin color imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 5 pp 696ndash706 2002

[7] H Fang and N Costen ldquoFrom Rank-N to Rank-1 face recog-nition based on motion similarityrdquo in Proceedings of the 20thBritish Machine Vision Conference (BMVC rsquo09) pp 1ndash11 Lon-don UK September 2009

[8] C Cornelis M D Cock and A M Radzikowska ldquoVaguelyquantified rough setsrdquo in Rough Sets Fuzzy Sets Data Miningand Granular Computing 11th International Conference RSFD-GrC 2007 Toronto Canada May 14ndash16 2007 Proceedings vol4482 of Lecture Notes in Computer Science pp 87ndash94 SpringerBerlin Germany 2007

[9] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[10] M Pantic and L UM Rothkrantz ldquoAutomatic analysis of facialexpressions the state of the artrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 22 no 12 pp 1424ndash14452000

[11] B Fasel and J Luettin ldquoAutomatic facial expression analysis asurveyrdquo Pattern Recognition vol 36 no 1 pp 259ndash275 2003

[12] C P Sumathi T Santhanam and M Mahadevi ldquoAutomaticfacial expression analysis a surveyrdquo International Journal ofComputer Science amp Engineering Survey vol 3 no 6 pp 47ndash592012

[13] M T Eskil and K S Benli ldquoFacial expression recognition basedon anatomyrdquo Computer Vision and Image Understanding vol119 pp 1ndash14 2014

[14] P Lucey J F Cohn T Kanade J Saragih Z Ambadar andI Matthews ldquoThe extended Cohn-Kanade dataset (CK+) acomplete dataset for action unit and emotion-specified expres-sionrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPRW rsquo10) pp94ndash101 San Francisco Calif USA June 2010

[15] S W Chew S Lucey P Lucey S Sridharan and J F ConnldquoImproved facial expression recognition via uni-hyperplaneclassificationrdquo in Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR rsquo12) pp 2554ndash2561IEEE June 2012

[16] X Fan and T Tjahjadi ldquoA spatial-temporal framework basedon histogram of gradients and optical flow for facial expressionrecognition in video sequencesrdquoPatternRecognition vol 48 no11 pp 3407ndash3416 2015

[17] S Jain CHu and J K Aggarwal ldquoFacial expression recognitionwith temporal modeling of shapesrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 1642ndash1649 November 2011

[18] C Shan S Gong and P W McOwan ldquoFacial expression rec-ognition based on Local Binary Patterns A ComprehensiveStudyrdquo Image and Vision Computing vol 27 no 6 pp 803ndash8162009

12 Mathematical Problems in Engineering

[19] Z Wang S Wang and Q Ji ldquoCapturing complex spatio-temporal relations among facial muscles for facial expressionrecognitionrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo13) pp 3422ndash3429 IEEE Portland Ore USA June 2013

[20] H Fang N Mac Parthalain A J Aubrey et al ldquoFacial ex-pression recognition in dynamic sequences an integrated ap-proachrdquo Pattern Recognition vol 47 no 3 pp 1271ndash1281 2014

[21] P Scovanner S Ali andM Shah ldquoA 3-dimensional sift descrip-tor and its application to action recognitionrdquo in Proceedingsof the 15th ACM International Conference on Multimedia (MMrsquo07) pp 357ndash360 ACM Augsburg Germany September 2007

[22] G Zhao and M Pietikainen ldquoDynamic texture recognitionusing local binary patterns with an application to facial expres-sionsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 29 no 6 pp 915ndash928 2007

[23] A Klaser M Marszałek and C Schmid ldquoA spatio-temporaldescriptor based on 3D-gradientsrdquo in Proceedings of the 19thBritish Machine Vision Conference (BMVC rsquo08) pp 275ndash281Leeds UK September 2008

[24] G Zhao XHuangMTaini S Z Li andM Pietikainen ldquoFacialexpression recognition from near-infrared videosrdquo Image andVision Computing vol 29 no 9 pp 607ndash619 2011

[25] P Ekman W V Friesen and J C Hager Facial Action CodingSystem A Human Face Salt Lake City Utah USA 2002

[26] M F Valstar and M Pantic ldquoFully automatic recognition ofthe temporal phases of facial actionsrdquo IEEE Transactions onSystems Man and Cybernetics Part B Cybernetics vol 42 no1 pp 28ndash43 2012

[27] Y Zhang and Q Ji ldquoActive and dynamic information fusion forfacial expression understanding from image sequencesrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol27 no 5 pp 699ndash714 2005

[28] M Pantic and I Patras ldquoDynamics of facial expression recog-nition of facial actions and their temporal segments from faceprofile image sequencesrdquo IEEE Transactions on Systems Manand Cybernetics Part B Cybernetics vol 36 no 2 pp 433ndash4492006

[29] Y Tong W Liao and Q Ji ldquoFacial action unit recognitionby exploiting their dynamic and semantic relationshipsrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol29 no 10 pp 1683ndash1699 2007

[30] Y Zhang Q Ji Z Zhu and B Yi ldquoDynamic facial expressionanalysis and synthesis with MPEG-4 facial animation param-etersrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 18 no 10 pp 1383ndash1396 2008

[31] G Donate M S Bartlett J C Hager P Ekman and T JSejnowski ldquoClassifying facial actionsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 21 no 10 pp974ndash989 1999

[32] Y Li S M Mavadati M H Mahoor Y Zhao and Q JildquoMeasuring the intensity of spontaneous facial action units withdynamic Bayesian networkrdquo Pattern Recognition vol 48 no 11pp 3417ndash3427 2015

[33] J J-J Lien T Kanade J F Cohn and C-C Li ldquoDetectiontracking and classification of action units in facial expressionrdquoRobotics and Autonomous Systems vol 31 no 3 pp 131ndash1462000

[34] M F Valstar and M Pantic ldquoCombined support vector ma-chines and hidden Markov models for modeling facial action

temporal dynamicsrdquo Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence and LectureNotes in Bioinformatics) vol 4796 pp 118ndash127 2007

[35] T Danisman I M Bilasco J Martinet and C DjerabaldquoIntelligent pixels of interest selection with application to facialexpression recognition using multilayer perceptronrdquo SignalProcessing vol 93 no 6 pp 1547ndash1556 2013

[36] W-L Chao J-J Ding and J-Z Liu ldquoFacial expression recog-nition based on improved local binary pattern and class-regularized locality preserving projectionrdquo Signal Processingvol 117 pp 1ndash10 2015

[37] I Kotsia and I Pitas ldquoFacial expression recognition in imagesequences using geometric deformation features and supportvector machinesrdquo IEEE Transactions on Image Processing vol16 no 1 pp 172ndash187 2007

[38] P Viola and M J Jones ldquoRobust Real-Time Face DetectionrdquoInternational Journal of Computer Vision vol 57 no 2 pp 137ndash154 2004

[39] A Asthana S Zafeiriou S Cheng andM Pantic ldquoIncrementalface alignment in the wildrdquo in Proceedings of the 27th IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo14) pp 1859ndash1866 IEEE Columbus Ohio USA June 2014

[40] T Ojala M Pietikainen and D Harwood ldquoA comparativestudy of texture measures with classification based on featureddistributionsrdquo Pattern Recognition vol 29 no 1 pp 51ndash59 1996

[41] S-S Liu Y-T Tian and CWan ldquoFacial expression recognitionmethod based on Gabor multi-orientation features fusion andblock histogramrdquo Zidonghua XuebaoActa Automatica Sinicavol 37 no 12 pp 1455ndash1463 2011

[42] C-W Hsu and C-J Lin ldquoA comparison of methods for mul-ticlass support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[43] T Kanade J F Cohn and Y Tian ldquoComprehensive database forfacial expression analysisrdquo in Proceedings of the 4th IEEE Inter-national Conference on Automatic Face and Gesture Recognition(FG rsquo00) pp 46ndash53 Grenoble France March 2000

[44] M Pantic M Valstar R Rademaker and L Maat ldquoWeb-baseddatabase for facial expression analysisrdquo in Proceedings of theIEEE International Conference on Multimedia and Expo (ICMErsquo05) pp 317ndash321 Amsterdam Netherlands July 2005

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Facial Expression Recognition from Video Sequences Based ...downloads.hindawi.com/journals/mpe/2017/7206041.pdf · Facial Expression Recognition from Video Sequences Based on Spatial-Temporal

Mathematical Problems in Engineering 5

X

Y

T

(a) (b)LBP-YT LBP-XT

Figure 3 Features in each block sequence (a) Block sequence and (b) LBP histograms from XT and YT planes

Gabor features in eight orientations and five scales

Orientation feature fusion

Fusion features infive scales

Block

Histogram

Histogram

middot middot middot

middot middot middot

middot middot middot

Figure 4 Feature extraction procedure based on Gabor multiorientation fusion histogram

where 119879V(119911) isin [1 8] is encoding value of pixel point 119911(119909 119910)and 119866119906V(119911) is Gabor features

Here orientation of local area is evaluated by orien-tation code of maximum value of Gabor features of eightorientations Each encoding value 119879V(119911) represents one localorientation Finally each expression image is transformedinto five scales of images and each scale fuses eight orienta-tions Fusion images are shown in Figure 5 As can be seenfrom figure fused images retain the most obvious changeinformation of Gabor subfiltering Therefore Gabor fusionfeature shows high discrimination for local texture changes

Histogram can describe global features of images How-ever many structural details maybe lost when histogram ofentire image is directly calculated Thus each fusion image isdivided into 8 times 8 nonoverlapping and equal rectangular sub-blocks Accordingly histogram distributions of all subblocksare calculated according to formula (4) and combined toconstruct Gabor multiorientation fusion histogram of wholeimage

ℎV119903119894 = sum119911

119868 (119877V119903 (119911) = 119894) 119894 = 0 8 (4)

where 119877V119903(119911) is encoding value in point 119911(119909 119910) of 119894th sub-block

As obtained by coding five scales onmultiple directions ofGabor feature Gabor multiorientation fusion histogram notonly inherits Gabor wavelet advantage capturing correspond-ing spatial frequency (scale) spatial location and orientationselectivity but also reduces redundancy of feature data andcomputation complexity

34 Multifeature Fusion Features extracted in Sections 32and 33 are scaled to [minus1 1] and integrated into one vectorusing the following equation

119865 = 120572119875 + (1 minus 120572)119876 0 le 120572 le 1 (5)

where 119875 is spatial-temporal motion LBP histogram and 119876 isGabor multiorientation fusion histogram 119865 is fusion vectorof119875 and119876 and is feature vector for final prediction Parameter120572 usually depends on performance of each feature For allexperiments in this paper we set value of 120572 to 05 whileconsidering experiments results shown in Tables 2 and 3

35 Classifier Many classifiers are used for facial expressionrecognition these classifiers include J48 based on ID3nearest neighbor based on fuzzy-rough sets random forest[20] SVM[22] andHMM[29] In this paper SVM is selected

6 Mathematical Problems in Engineering

Figure 5 Fusion images on five scales

in our framework because of the following properties (1)classifier based on statistical learning theory (2) high gen-eralization performance and (3) ability to deal with featurehaving high dimensions

To achieve more effective classification of facial expres-sions SVM classifier with RBF kernel is adapted Given atraining set of instance-label pairs (119909119894 119910119894) the SVM requiresthe solution of the following optimization problem

min119908119887120585

12119908119879119908 + 119862

119897

sum119894=1

120585119894

subject to 119910119894 (119908119879120601 (119909119894) + 119887) ge 1 minus 120585119894

120585119894 ge 0

(6)

where 119909119894 are training samples 119897 is the number of trainingset 119910119894 are the labels of 119909119894 119862 is punishment coefficient thatprevents overfitting 120585119894 is relaxation variable and 119908 and 119887 arethe parameters of the classifier Here training vectors 119909119894 aremapped into a higher dimensional space by using function120601(119909119894) SVM finds a linear separating hyperplane with themaximalmargin in the higher dimensional space119870(119909119894 119909119895) equiv120601(119909119894)119879120601(119909119895) is the kernel function of SVM The RBF kernelfunction is shown as follows

119870(119909119894 119909119895) = exp (minus120574 10038171003817100381710038171003817119909119894 minus 119909119895100381710038171003817100381710038172) 120574 gt 0 (7)

where 120574 is parameter of RBF kernel In this paper grid-searchstrategy and tenfold cross validation [42] are used to selectand estimate punishment coefficient119862 and kernel parameters120574 of SVM SVM is binary classifier used only in two setsof feature spaces of objects Facial expression recognitionis multiclass problem Thus multiclassifier which consistsof binary SVM classifiers is constructed to classify facialexpressions [42] Two strategies are commonly used one-versus-one strategy and one-versus-all strategy One-versus-one strategy is applied in our method because of its robust-ness in classification and simplicity of application In thisstrategy SVM is trained for classifying any two kinds offacial expression Thus six expressions require training 15binary SVM classifiers (fear versus anger happiness versusdisgust and surprise versus sadness among others) Finalresult is expression classification with highest number ofvotes However occasionally class with most votes is not theonly result In this case nearest neighbor classifier is appliedto these classes to obtain one final result

Table 1 Number of subjects for each expression class in threedatasets

Expression CK+ MMI Oulu (each condition)Anger 45 32 80Disgust 59 28 80Fear 25 28 80Happiness 69 42 80Sadness 28 32 80Surprise 83 41 80Total 309 203 480

4 Experiments

41 Facial Expression Datasets CK+ dataset [14] is extendedfrom CK dataset [43] which was built in 2000 it is availableand most widely used dataset for evaluating performance offacial expression recognition systemsThis dataset consists of593 video sequences including seven basic facial expressionsperformed by 123 participants Ages of subjects range from18 to 30 years 35 are male 13 are Afro-American eighty-one percent are Euro-American and six percent are peopleof other race In video sequences image of each frame is640 times 490 or 640 times 480 pixels Most frames are gray imageswith eight-bit precision for grayscale values Video sequencescontain frames from neutral phase to peak phase of facialexpressions and video rate is 30 frames per second Inour study 309 image sequences containing 97 subjects withsix expressions (anger disgust fear happiness sadness andsurprise) are selected for our experiments Each subject hasone to six expressions Top row of Figure 6 shows peak framesof six expressions from six subjects in CK+ dataset Table 1shows number of subjects for each expression class in CK+dataset in our experiment

MMI dataset [44] is applied to evaluate performance ofexpression recognition methods this dataset contains 203video sequences including different head poses and subtleexpressions These expression sequences were performed by19 participants with ages ranging from 19 to 62 years Subjectsare Asian European and South American Male participantsaccount for 40 of subjects Different from CK+ datasetMMI dataset contains six basic facial expressions comprisingneutral onset apex and offset frames In video sequencesimage of each frame is 720 times 576 pixels with RGB full colorOriginal frames are processed into images with eight-bit

Mathematical Problems in Engineering 7

Table 2 Classification results of using spatial-temporal motion LBP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 1 0 20 2 2 0 800Happiness 0 0 0 69 0 0 100Sadness 4 0 2 0 21 1 750Surprise 0 0 2 1 0 80 964Overall 941

Table 3 Classification results of using Gabor multiorientation fusion histogram on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 40 2 0 0 3 0 889Disgust 2 56 0 1 0 0 949Fear 1 0 17 3 4 0 680Happiness 0 0 0 69 0 0 100Sadness 4 0 0 0 24 0 857Surprise 0 1 0 0 0 82 988Overall 932

Angry Disgust FearHappiness Sadness Surprise

Figure 6 Sample images in CK+ dataset and MMI dataset

precision for grayscale values for our study and extractedframes from neutral to apex phases of facial expressionsBottom row of Figure 6 shows peak frames of six expressionsfrom six subjects in MMI dataset Table 1 shows number ofeach expression class in MMI dataset in our experiment Asseen from Table 1 MMI dataset includes fewer subjects thanCK+ dataset

Oulu-CASIAVIS facial expression database [24] includessix basic expressions from 80 subjects between 23 to 58years old 262 of the subjects are females All the imagesequences were taken under the visible light condition Oulu-CASIA VIS contains three subsets All expressions in thethree subsets were respectively captured in three different

illumination conditions normal weak and darkThe imagesof normal illumination condition are captured by usinggood normal lighting Weak illumination means that onlycomputer display is on and each subject sits in front of thedisplay Dark illumination means almost darkness Videosequences contain frames fromneutral phase to peak phase offacial expressions and video rate is 25 frames per secondTheface regions of six expressions in Oulu-CASIA VIS datasetare shown in Figure 7 (top row normal middle row weakand bottom row dark)The resolution of the images in Oulu-CASIA VIS dataset (320 times 240) is lower than that of theimages in CK+ and MMI datasets The number of videosequences is 480 for each illumination condition Table 1

8 Mathematical Problems in Engineering

Anger Disgust FearHappiness Sadness Surprise

Figure 7 Sample images in Oulu-CASIA VIS dataset

shows number of each expression class in Oulu-CASIA VISdataset in our experiment

42 Experimental Results on CK+ Dataset Three sets of ex-periments are conducted to evaluate performance of ourmethod on CK+ dataset because head motions of subjectsin CK+ dataset are smaller than that in other two datasetsThen in this paper performance of framework proposedis compared with seven state-of-the-art methods Leave-one-out cross validation strategy is applied in these setsof experiments The punishment coefficient 119862 and kernelparameters 120574 of SVM forCK+dataset are 1048576 and 372 times10minus9 respectively

In first set of experiments contributions of three LBPhistograms on three planes (three components of LBPTOP)are investigated separately to determine accuracy of facerecognition Figure 8 presents recognition rates of classifyingsix basic expressions using LBP histograms from three indi-vidual planes (ie LBP-XY LBP-XT and LBP-YT) and theircombination (ie LBPTOP) The figure indicates that usingLBPTOP yields the highest recognition rate (945) LBP-YTdescribes better variation in texture along vertical direction(935) and lowest performance is attributed to LBP-XYcontaining appearance characteristics of different subjects(822) Then we investigate the performance of spatial-temporalmotion LBP feature which joins the LBP histogramson XT and YT planes together and compare with LBPTOPBased on results shown in Figure 9 recognition rates are

LBP-XY LBP-XT LBP-YT LBPTOP60

65

70

75

80

85

90

95

100

Reco

gniti

on ra

te (

)

822

919935 945

Figure 8 Recognition rates of methods using LBP histograms fromthree planes and LBPTOP

almost similar Furthermore spatial-temporal motion LBPfeature has higher recognition rate than LBPTOP in terms ofanger and disgust Results shown in Figures 8 and 9mean thefollowing (1) variation of texture onXT andYTplane ismoresignificant than XY plane in expression sequence (2) com-pared with LBPTOP dimension of spatial-temporal motionLBP feature is reduced by 13 but recognition accuracy is notdecreased

Mathematical Problems in Engineering 9

Table 4 Classification results of using fusion features on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 0 0 22 0 2 1 880Happiness 0 0 0 69 0 0 100Sadness 3 0 1 0 24 0 857Surprise 0 0 2 1 0 80 964Overall 958

Table 5 Comparison results of using fusion feature and LBPTOP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 933 (+22) 0 0 0 67 0Disgust 0 100 (00) 0 0 0 0Fear 0 0 880 (00) 0 8 4Happiness 0 0 0 100 (00) 0 0Sadness 107 0 36 0 867 (+81) 0Surprise 0 0 24 12 0 964 (00)Overall 958 (+13)

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

100

Reco

gniti

on ra

te (

)

Spatial-temporal LBPLBPTOP

Figure 9 Recognition rates of methods using LBPTOP and spatial-temporal motion LBP

Second set of experiments evaluates effectiveness of com-bining spatial-temporal motion LBP with Gabor multiorien-tation fusion histogram Tables 2ndash4 show results obtained byusing spatial-temporal motion LBP Gabor multiorientationfusion histogram and proposed framework with multiclassSVM classifier Results include recognition rate of eachexpression and overall classifications accuracy Each blackdigital represents number of correctly classified subjects ofeach expression category and each dark and bolder digitalimage represents number of misclassified subjects of each

expression category Results show that when using combina-tion of spatial-temporalmotion LBP andGabormultiorienta-tion fusion histogram framework performs better than usingindividual features

Third set of experiments compares performance betweenproposedmethods and approach using LBPTOP [22] Resultsin Table 5 show recognition rate of method using combina-tion of spatial-temporal motion LBP with Gabor multiori-entation fusion histogram (ie proposed framework) wherefigures in parentheses represent difference in accuracy inusing LBPTOP and features proposed in this paper positivefigure shows better accuracy of proposed features Based onTable 5 proposed framework performs better in two expres-sions and same in others Furthermore overall recognitionrate is slightly better when using proposed method There-foremethod fusing spatial-temporalmotion LBPwithGabormultiorientation fusion histogram outperforms LBPTOP

In this set experiment we compared our method withsystems proposed by Eskil and Benli [13] Lucey et al [14]Chew et al [15] Fan and Tjahjadi [16] and Jain et al [17]on CK+ dataset The datasets used in these papers are thesame and identical to the dataset adopted in our studyTherefore results presented in these papers can be directlycompared with our method Methods proposed by Eskiland Benli [13] Lucey et al [14] Chew et al [15] and Fanand Tjahjadi [16] used leave-one-subject-out cross validationmethod and technology proposed by Jain et al [17] usedtenfold cross validation Based on results shown in Table 6proposed framework shows average accuracy of 958 for allsix basic facial expressions performing better than methodsdeveloped by mentioned authors in this paragraph

43 Experimental Results onMMIDataset Sevenmethods in[18ndash20] are used to perform quantitative comparisons with

10 Mathematical Problems in Engineering

Table 6 Comparison of results in proposed framework and sevenstate-of-the-art methods on CK+ dataset

Research Methodology AccuracyEskil and Benli [13] High-polygon wireframe mode 850Lucey et al [14] AAM shape 689

AAM appearance 845Combined 887

Chew et al [15] Uni-hyperplane classification 894Fan and Tjahjadi [16] PHOGTOP and optical flow 909Jain et al [17] Temporal modeling of shapes 919Proposed 958

Table 7 Comparison result of proposed framework and sevensystems on MMI dataset

Research Methodology Accuracy ()Shan et al [18] LBP 4778Wang et al [19] AdaBoost 478

HMM 515ITBN 597

Fang et al [20] Active shape model 6238Active appearance model 6435Gabor + Geometry + SVM 7067

Proposed 7192

our framework on MMI dataset The punishment coefficient119862 and kernel parameters 120574 of SVM for MMI dataset are2097152 and 931 times 10minus10 respectively Datasets used inthese papers are identical to the dataset adapted in ourstudy In this experiment proposed framework performstenfold cross validation Shan et al [18] and Fang et al [20]developed methods by conducting experiments using thesame strategy we employed Methods proposed by Wang etal [19] performed twentyfold cross validation and methodsusing active shape model [20] and active appearance model[20] performed twofold cross validation results are shownin Table 7 Table 7 indicates that performance of proposedmethod onMMI dataset is inferior to CK+ dataset because ofsubtler expressions notable changes in head rotation angleand fewer training data However proposed frameworkstill performs better than the other seven state-of-the-artmethods

44 Experimental Results on Oulu-CASIA VIS Dataset Inthis section we evaluated our framework on Oulu-CASIAVIS dataset In these set experiments proposed frameworkperforms tenfold cross validation The punishment coeffi-cient 119862 and kernel parameters 120574 of SVM for this dataset are1048507 and 933 times 10minus10 respectively First our methodis tested on three subsets of Oulu-CASIA VIS respectivelyThe experimental results are shown in Figure 10 As shownin Figure 10 the recognition rate obtained for the normalillumination condition is highest and the recognition rateobtained for the dark illumination condition is worst Evi-dently the facial images captured in the weak or dark illu-mination are often missing much useful texture information

Table 8 Comparison result of proposed framework and fourmethods on Oulu-CASIA VIS dataset

Research Methodology Accuracy ()Scovanner et al [21] 3D SIFT 5583Zhao and Pietikainen [22] LBPTOP 6813Klaser et al [23] HOG3D 7063Zhao et al [24] AdaLBP 7354Proposed 7437

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

Reco

gniti

on ra

te (

)

DarkWeakNormal

Figure 10 Recognition results of proposed methods on threesubsets of Oulu-CASIA VIS dataset

for facial expression recognition Then four methods in [21ndash24] are used to perform quantitative comparisons with ourframework on the subset obtained in normal illuminationconditions The experimental results are shown in Table 8Table 8 indicates that performance of proposed method onOulu-CASIA VIS dataset is inferior to CK+ dataset but supe-rior to MMI dataset Proposed framework still outperformsthe other four state-of-the-art methods

45 Computational Complexity Evaluation In this sectioncomputational complexity of our method is analyzed inCK+ MMI and Oulu-CASIA VIS dataset Processing time isapproximated to be time needed for preprocessing extractingspatial-temporal motion LBP and Gabor multiorientationfusion histogram and classifying each image frame of videosequence Runtime (measured using computer system clock)is estimated by usingmixed-language programming based onMATLAB andC++ on an Intel (R) Core (TM) i7-6700CPU at340GHz with 8GB RAM running onWindows 10 operatingsystem Average processing time per image is under 280 286and 249ms for CK+ MMI and Oulu-CASIA VIS datasetsrespectively Then we compare computational time of ourmethodwith system using LBPTOP [22] Comparison resultsin Table 9 show that computational time of our method islower than system using LBPTOP in three datasets Resultsshow that our method is more efficient and effective than

Mathematical Problems in Engineering 11

Table 9 Runtime of our method and LBPTOP in CK+ MMI andOulu-CASIA VIS datasets

Research CK+ MMI OuluLBPTOP 349ms 357ms 317msproposed 280ms 286ms 249ms

LBPTOP-based technology for facial expression recognitionin CK+ MMI and Oulu-CASIA VIS datasets

5 Conclusion

This paper presents novel facial expression recognitionframework integrating spatial-temporal motion LBP withGabor multiorientation fusion histogram Framework com-prises preprocessing (face recognition and points locationand tracking) dynamic and static feature extraction andfeature classification outperforming seven state-of-the-artmethods on CK+ dataset seven other methods on MMIdataset and four methods on Oulu-CASIA VIS datasetExpressions of disgust and happiness are easier to classifythan other expressions demonstrating better performance ofproposed method However recognition rates are lower forfear (880) and sadness (867) on CK+ dataset comparedwith others because most anger expressions are misclassifiedas sadness and vice-versa A lamination of proposed methodis that the occlusions head rotations and illumination canreduce accuracy of expression recognition these phenomenawill be solved in our future work A framework integratingother useful information (pupil head and body movements)with the features extracted in this paper could perform betterThis issue will also be studied in our future work Theproposed method cannot achieve real-time facial expressionrecognition We will attempt to establish an efficient dimen-sionality reduction method to reduce the computationalcomplexity of our framework in the future Furthermoreproposed system will be improved and applied in study ofhuman drowsiness expression analysis in our further work

Competing Interests

The authors declare no conflict of interests

Acknowledgments

This work was supported by the Open Foundation of StateKey Laboratory of Automotive Simulation and Control(China Grant no 20161105)

References

[1] P Ekman and E L Rosenberg What the Face Reveals Basicand Applied Studies of Spontaneous Expression Using the FacialAction Coding System Oxford University Press Oxford UK2005

[2] J A Russell J M Fernandez-Dols and G Mandler The Psy-chology of Facial Expression Cambridge University Press Cam-bridge UK 1997

[3] P Ekman and W V Friesen ldquoConstants across cultures in theface and emotionrdquo Journal of Personality and Social Psychologyvol 17 no 2 pp 124ndash129 1971

[4] B Golomb and T Sejnowski ldquoBenefits of machine understand-ing of facial expressionsrdquo in NSF ReportmdashFacial ExpressionUnderstanding pp 55ndash71 1997

[5] M Pantic ldquoFace for ambient interfacerdquo in Ambient Intelligencein Everyday Life vol 3864 of Lecture Notes on Artificial Intelli-gence pp 32ndash66 Springer Berlin Germany 2006

[6] R-L Hsu M Abdel-Mottaleb and A K Jain ldquoFace detectionin color imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 5 pp 696ndash706 2002

[7] H Fang and N Costen ldquoFrom Rank-N to Rank-1 face recog-nition based on motion similarityrdquo in Proceedings of the 20thBritish Machine Vision Conference (BMVC rsquo09) pp 1ndash11 Lon-don UK September 2009

[8] C Cornelis M D Cock and A M Radzikowska ldquoVaguelyquantified rough setsrdquo in Rough Sets Fuzzy Sets Data Miningand Granular Computing 11th International Conference RSFD-GrC 2007 Toronto Canada May 14ndash16 2007 Proceedings vol4482 of Lecture Notes in Computer Science pp 87ndash94 SpringerBerlin Germany 2007

[9] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[10] M Pantic and L UM Rothkrantz ldquoAutomatic analysis of facialexpressions the state of the artrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 22 no 12 pp 1424ndash14452000

[11] B Fasel and J Luettin ldquoAutomatic facial expression analysis asurveyrdquo Pattern Recognition vol 36 no 1 pp 259ndash275 2003

[12] C P Sumathi T Santhanam and M Mahadevi ldquoAutomaticfacial expression analysis a surveyrdquo International Journal ofComputer Science amp Engineering Survey vol 3 no 6 pp 47ndash592012

[13] M T Eskil and K S Benli ldquoFacial expression recognition basedon anatomyrdquo Computer Vision and Image Understanding vol119 pp 1ndash14 2014

[14] P Lucey J F Cohn T Kanade J Saragih Z Ambadar andI Matthews ldquoThe extended Cohn-Kanade dataset (CK+) acomplete dataset for action unit and emotion-specified expres-sionrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPRW rsquo10) pp94ndash101 San Francisco Calif USA June 2010

[15] S W Chew S Lucey P Lucey S Sridharan and J F ConnldquoImproved facial expression recognition via uni-hyperplaneclassificationrdquo in Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR rsquo12) pp 2554ndash2561IEEE June 2012

[16] X Fan and T Tjahjadi ldquoA spatial-temporal framework basedon histogram of gradients and optical flow for facial expressionrecognition in video sequencesrdquoPatternRecognition vol 48 no11 pp 3407ndash3416 2015

[17] S Jain CHu and J K Aggarwal ldquoFacial expression recognitionwith temporal modeling of shapesrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 1642ndash1649 November 2011

[18] C Shan S Gong and P W McOwan ldquoFacial expression rec-ognition based on Local Binary Patterns A ComprehensiveStudyrdquo Image and Vision Computing vol 27 no 6 pp 803ndash8162009

12 Mathematical Problems in Engineering

[19] Z Wang S Wang and Q Ji ldquoCapturing complex spatio-temporal relations among facial muscles for facial expressionrecognitionrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo13) pp 3422ndash3429 IEEE Portland Ore USA June 2013

[20] H Fang N Mac Parthalain A J Aubrey et al ldquoFacial ex-pression recognition in dynamic sequences an integrated ap-proachrdquo Pattern Recognition vol 47 no 3 pp 1271ndash1281 2014

[21] P Scovanner S Ali andM Shah ldquoA 3-dimensional sift descrip-tor and its application to action recognitionrdquo in Proceedingsof the 15th ACM International Conference on Multimedia (MMrsquo07) pp 357ndash360 ACM Augsburg Germany September 2007

[22] G Zhao and M Pietikainen ldquoDynamic texture recognitionusing local binary patterns with an application to facial expres-sionsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 29 no 6 pp 915ndash928 2007

[23] A Klaser M Marszałek and C Schmid ldquoA spatio-temporaldescriptor based on 3D-gradientsrdquo in Proceedings of the 19thBritish Machine Vision Conference (BMVC rsquo08) pp 275ndash281Leeds UK September 2008

[24] G Zhao XHuangMTaini S Z Li andM Pietikainen ldquoFacialexpression recognition from near-infrared videosrdquo Image andVision Computing vol 29 no 9 pp 607ndash619 2011

[25] P Ekman W V Friesen and J C Hager Facial Action CodingSystem A Human Face Salt Lake City Utah USA 2002

[26] M F Valstar and M Pantic ldquoFully automatic recognition ofthe temporal phases of facial actionsrdquo IEEE Transactions onSystems Man and Cybernetics Part B Cybernetics vol 42 no1 pp 28ndash43 2012

[27] Y Zhang and Q Ji ldquoActive and dynamic information fusion forfacial expression understanding from image sequencesrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol27 no 5 pp 699ndash714 2005

[28] M Pantic and I Patras ldquoDynamics of facial expression recog-nition of facial actions and their temporal segments from faceprofile image sequencesrdquo IEEE Transactions on Systems Manand Cybernetics Part B Cybernetics vol 36 no 2 pp 433ndash4492006

[29] Y Tong W Liao and Q Ji ldquoFacial action unit recognitionby exploiting their dynamic and semantic relationshipsrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol29 no 10 pp 1683ndash1699 2007

[30] Y Zhang Q Ji Z Zhu and B Yi ldquoDynamic facial expressionanalysis and synthesis with MPEG-4 facial animation param-etersrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 18 no 10 pp 1383ndash1396 2008

[31] G Donate M S Bartlett J C Hager P Ekman and T JSejnowski ldquoClassifying facial actionsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 21 no 10 pp974ndash989 1999

[32] Y Li S M Mavadati M H Mahoor Y Zhao and Q JildquoMeasuring the intensity of spontaneous facial action units withdynamic Bayesian networkrdquo Pattern Recognition vol 48 no 11pp 3417ndash3427 2015

[33] J J-J Lien T Kanade J F Cohn and C-C Li ldquoDetectiontracking and classification of action units in facial expressionrdquoRobotics and Autonomous Systems vol 31 no 3 pp 131ndash1462000

[34] M F Valstar and M Pantic ldquoCombined support vector ma-chines and hidden Markov models for modeling facial action

temporal dynamicsrdquo Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence and LectureNotes in Bioinformatics) vol 4796 pp 118ndash127 2007

[35] T Danisman I M Bilasco J Martinet and C DjerabaldquoIntelligent pixels of interest selection with application to facialexpression recognition using multilayer perceptronrdquo SignalProcessing vol 93 no 6 pp 1547ndash1556 2013

[36] W-L Chao J-J Ding and J-Z Liu ldquoFacial expression recog-nition based on improved local binary pattern and class-regularized locality preserving projectionrdquo Signal Processingvol 117 pp 1ndash10 2015

[37] I Kotsia and I Pitas ldquoFacial expression recognition in imagesequences using geometric deformation features and supportvector machinesrdquo IEEE Transactions on Image Processing vol16 no 1 pp 172ndash187 2007

[38] P Viola and M J Jones ldquoRobust Real-Time Face DetectionrdquoInternational Journal of Computer Vision vol 57 no 2 pp 137ndash154 2004

[39] A Asthana S Zafeiriou S Cheng andM Pantic ldquoIncrementalface alignment in the wildrdquo in Proceedings of the 27th IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo14) pp 1859ndash1866 IEEE Columbus Ohio USA June 2014

[40] T Ojala M Pietikainen and D Harwood ldquoA comparativestudy of texture measures with classification based on featureddistributionsrdquo Pattern Recognition vol 29 no 1 pp 51ndash59 1996

[41] S-S Liu Y-T Tian and CWan ldquoFacial expression recognitionmethod based on Gabor multi-orientation features fusion andblock histogramrdquo Zidonghua XuebaoActa Automatica Sinicavol 37 no 12 pp 1455ndash1463 2011

[42] C-W Hsu and C-J Lin ldquoA comparison of methods for mul-ticlass support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[43] T Kanade J F Cohn and Y Tian ldquoComprehensive database forfacial expression analysisrdquo in Proceedings of the 4th IEEE Inter-national Conference on Automatic Face and Gesture Recognition(FG rsquo00) pp 46ndash53 Grenoble France March 2000

[44] M Pantic M Valstar R Rademaker and L Maat ldquoWeb-baseddatabase for facial expression analysisrdquo in Proceedings of theIEEE International Conference on Multimedia and Expo (ICMErsquo05) pp 317ndash321 Amsterdam Netherlands July 2005

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Facial Expression Recognition from Video Sequences Based ...downloads.hindawi.com/journals/mpe/2017/7206041.pdf · Facial Expression Recognition from Video Sequences Based on Spatial-Temporal

6 Mathematical Problems in Engineering

Figure 5 Fusion images on five scales

in our framework because of the following properties (1)classifier based on statistical learning theory (2) high gen-eralization performance and (3) ability to deal with featurehaving high dimensions

To achieve more effective classification of facial expres-sions SVM classifier with RBF kernel is adapted Given atraining set of instance-label pairs (119909119894 119910119894) the SVM requiresthe solution of the following optimization problem

min119908119887120585

12119908119879119908 + 119862

119897

sum119894=1

120585119894

subject to 119910119894 (119908119879120601 (119909119894) + 119887) ge 1 minus 120585119894

120585119894 ge 0

(6)

where 119909119894 are training samples 119897 is the number of trainingset 119910119894 are the labels of 119909119894 119862 is punishment coefficient thatprevents overfitting 120585119894 is relaxation variable and 119908 and 119887 arethe parameters of the classifier Here training vectors 119909119894 aremapped into a higher dimensional space by using function120601(119909119894) SVM finds a linear separating hyperplane with themaximalmargin in the higher dimensional space119870(119909119894 119909119895) equiv120601(119909119894)119879120601(119909119895) is the kernel function of SVM The RBF kernelfunction is shown as follows

119870(119909119894 119909119895) = exp (minus120574 10038171003817100381710038171003817119909119894 minus 119909119895100381710038171003817100381710038172) 120574 gt 0 (7)

where 120574 is parameter of RBF kernel In this paper grid-searchstrategy and tenfold cross validation [42] are used to selectand estimate punishment coefficient119862 and kernel parameters120574 of SVM SVM is binary classifier used only in two setsof feature spaces of objects Facial expression recognitionis multiclass problem Thus multiclassifier which consistsof binary SVM classifiers is constructed to classify facialexpressions [42] Two strategies are commonly used one-versus-one strategy and one-versus-all strategy One-versus-one strategy is applied in our method because of its robust-ness in classification and simplicity of application In thisstrategy SVM is trained for classifying any two kinds offacial expression Thus six expressions require training 15binary SVM classifiers (fear versus anger happiness versusdisgust and surprise versus sadness among others) Finalresult is expression classification with highest number ofvotes However occasionally class with most votes is not theonly result In this case nearest neighbor classifier is appliedto these classes to obtain one final result

Table 1 Number of subjects for each expression class in threedatasets

Expression CK+ MMI Oulu (each condition)Anger 45 32 80Disgust 59 28 80Fear 25 28 80Happiness 69 42 80Sadness 28 32 80Surprise 83 41 80Total 309 203 480

4 Experiments

41 Facial Expression Datasets CK+ dataset [14] is extendedfrom CK dataset [43] which was built in 2000 it is availableand most widely used dataset for evaluating performance offacial expression recognition systemsThis dataset consists of593 video sequences including seven basic facial expressionsperformed by 123 participants Ages of subjects range from18 to 30 years 35 are male 13 are Afro-American eighty-one percent are Euro-American and six percent are peopleof other race In video sequences image of each frame is640 times 490 or 640 times 480 pixels Most frames are gray imageswith eight-bit precision for grayscale values Video sequencescontain frames from neutral phase to peak phase of facialexpressions and video rate is 30 frames per second Inour study 309 image sequences containing 97 subjects withsix expressions (anger disgust fear happiness sadness andsurprise) are selected for our experiments Each subject hasone to six expressions Top row of Figure 6 shows peak framesof six expressions from six subjects in CK+ dataset Table 1shows number of subjects for each expression class in CK+dataset in our experiment

MMI dataset [44] is applied to evaluate performance ofexpression recognition methods this dataset contains 203video sequences including different head poses and subtleexpressions These expression sequences were performed by19 participants with ages ranging from 19 to 62 years Subjectsare Asian European and South American Male participantsaccount for 40 of subjects Different from CK+ datasetMMI dataset contains six basic facial expressions comprisingneutral onset apex and offset frames In video sequencesimage of each frame is 720 times 576 pixels with RGB full colorOriginal frames are processed into images with eight-bit

Mathematical Problems in Engineering 7

Table 2 Classification results of using spatial-temporal motion LBP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 1 0 20 2 2 0 800Happiness 0 0 0 69 0 0 100Sadness 4 0 2 0 21 1 750Surprise 0 0 2 1 0 80 964Overall 941

Table 3 Classification results of using Gabor multiorientation fusion histogram on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 40 2 0 0 3 0 889Disgust 2 56 0 1 0 0 949Fear 1 0 17 3 4 0 680Happiness 0 0 0 69 0 0 100Sadness 4 0 0 0 24 0 857Surprise 0 1 0 0 0 82 988Overall 932

Angry Disgust FearHappiness Sadness Surprise

Figure 6 Sample images in CK+ dataset and MMI dataset

precision for grayscale values for our study and extractedframes from neutral to apex phases of facial expressionsBottom row of Figure 6 shows peak frames of six expressionsfrom six subjects in MMI dataset Table 1 shows number ofeach expression class in MMI dataset in our experiment Asseen from Table 1 MMI dataset includes fewer subjects thanCK+ dataset

Oulu-CASIAVIS facial expression database [24] includessix basic expressions from 80 subjects between 23 to 58years old 262 of the subjects are females All the imagesequences were taken under the visible light condition Oulu-CASIA VIS contains three subsets All expressions in thethree subsets were respectively captured in three different

illumination conditions normal weak and darkThe imagesof normal illumination condition are captured by usinggood normal lighting Weak illumination means that onlycomputer display is on and each subject sits in front of thedisplay Dark illumination means almost darkness Videosequences contain frames fromneutral phase to peak phase offacial expressions and video rate is 25 frames per secondTheface regions of six expressions in Oulu-CASIA VIS datasetare shown in Figure 7 (top row normal middle row weakand bottom row dark)The resolution of the images in Oulu-CASIA VIS dataset (320 times 240) is lower than that of theimages in CK+ and MMI datasets The number of videosequences is 480 for each illumination condition Table 1

8 Mathematical Problems in Engineering

Anger Disgust FearHappiness Sadness Surprise

Figure 7 Sample images in Oulu-CASIA VIS dataset

shows number of each expression class in Oulu-CASIA VISdataset in our experiment

42 Experimental Results on CK+ Dataset Three sets of ex-periments are conducted to evaluate performance of ourmethod on CK+ dataset because head motions of subjectsin CK+ dataset are smaller than that in other two datasetsThen in this paper performance of framework proposedis compared with seven state-of-the-art methods Leave-one-out cross validation strategy is applied in these setsof experiments The punishment coefficient 119862 and kernelparameters 120574 of SVM forCK+dataset are 1048576 and 372 times10minus9 respectively

In first set of experiments contributions of three LBPhistograms on three planes (three components of LBPTOP)are investigated separately to determine accuracy of facerecognition Figure 8 presents recognition rates of classifyingsix basic expressions using LBP histograms from three indi-vidual planes (ie LBP-XY LBP-XT and LBP-YT) and theircombination (ie LBPTOP) The figure indicates that usingLBPTOP yields the highest recognition rate (945) LBP-YTdescribes better variation in texture along vertical direction(935) and lowest performance is attributed to LBP-XYcontaining appearance characteristics of different subjects(822) Then we investigate the performance of spatial-temporalmotion LBP feature which joins the LBP histogramson XT and YT planes together and compare with LBPTOPBased on results shown in Figure 9 recognition rates are

LBP-XY LBP-XT LBP-YT LBPTOP60

65

70

75

80

85

90

95

100

Reco

gniti

on ra

te (

)

822

919935 945

Figure 8 Recognition rates of methods using LBP histograms fromthree planes and LBPTOP

almost similar Furthermore spatial-temporal motion LBPfeature has higher recognition rate than LBPTOP in terms ofanger and disgust Results shown in Figures 8 and 9mean thefollowing (1) variation of texture onXT andYTplane ismoresignificant than XY plane in expression sequence (2) com-pared with LBPTOP dimension of spatial-temporal motionLBP feature is reduced by 13 but recognition accuracy is notdecreased

Mathematical Problems in Engineering 9

Table 4 Classification results of using fusion features on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 0 0 22 0 2 1 880Happiness 0 0 0 69 0 0 100Sadness 3 0 1 0 24 0 857Surprise 0 0 2 1 0 80 964Overall 958

Table 5 Comparison results of using fusion feature and LBPTOP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 933 (+22) 0 0 0 67 0Disgust 0 100 (00) 0 0 0 0Fear 0 0 880 (00) 0 8 4Happiness 0 0 0 100 (00) 0 0Sadness 107 0 36 0 867 (+81) 0Surprise 0 0 24 12 0 964 (00)Overall 958 (+13)

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

100

Reco

gniti

on ra

te (

)

Spatial-temporal LBPLBPTOP

Figure 9 Recognition rates of methods using LBPTOP and spatial-temporal motion LBP

Second set of experiments evaluates effectiveness of com-bining spatial-temporal motion LBP with Gabor multiorien-tation fusion histogram Tables 2ndash4 show results obtained byusing spatial-temporal motion LBP Gabor multiorientationfusion histogram and proposed framework with multiclassSVM classifier Results include recognition rate of eachexpression and overall classifications accuracy Each blackdigital represents number of correctly classified subjects ofeach expression category and each dark and bolder digitalimage represents number of misclassified subjects of each

expression category Results show that when using combina-tion of spatial-temporalmotion LBP andGabormultiorienta-tion fusion histogram framework performs better than usingindividual features

Third set of experiments compares performance betweenproposedmethods and approach using LBPTOP [22] Resultsin Table 5 show recognition rate of method using combina-tion of spatial-temporal motion LBP with Gabor multiori-entation fusion histogram (ie proposed framework) wherefigures in parentheses represent difference in accuracy inusing LBPTOP and features proposed in this paper positivefigure shows better accuracy of proposed features Based onTable 5 proposed framework performs better in two expres-sions and same in others Furthermore overall recognitionrate is slightly better when using proposed method There-foremethod fusing spatial-temporalmotion LBPwithGabormultiorientation fusion histogram outperforms LBPTOP

In this set experiment we compared our method withsystems proposed by Eskil and Benli [13] Lucey et al [14]Chew et al [15] Fan and Tjahjadi [16] and Jain et al [17]on CK+ dataset The datasets used in these papers are thesame and identical to the dataset adopted in our studyTherefore results presented in these papers can be directlycompared with our method Methods proposed by Eskiland Benli [13] Lucey et al [14] Chew et al [15] and Fanand Tjahjadi [16] used leave-one-subject-out cross validationmethod and technology proposed by Jain et al [17] usedtenfold cross validation Based on results shown in Table 6proposed framework shows average accuracy of 958 for allsix basic facial expressions performing better than methodsdeveloped by mentioned authors in this paragraph

43 Experimental Results onMMIDataset Sevenmethods in[18ndash20] are used to perform quantitative comparisons with

10 Mathematical Problems in Engineering

Table 6 Comparison of results in proposed framework and sevenstate-of-the-art methods on CK+ dataset

Research Methodology AccuracyEskil and Benli [13] High-polygon wireframe mode 850Lucey et al [14] AAM shape 689

AAM appearance 845Combined 887

Chew et al [15] Uni-hyperplane classification 894Fan and Tjahjadi [16] PHOGTOP and optical flow 909Jain et al [17] Temporal modeling of shapes 919Proposed 958

Table 7 Comparison result of proposed framework and sevensystems on MMI dataset

Research Methodology Accuracy ()Shan et al [18] LBP 4778Wang et al [19] AdaBoost 478

HMM 515ITBN 597

Fang et al [20] Active shape model 6238Active appearance model 6435Gabor + Geometry + SVM 7067

Proposed 7192

our framework on MMI dataset The punishment coefficient119862 and kernel parameters 120574 of SVM for MMI dataset are2097152 and 931 times 10minus10 respectively Datasets used inthese papers are identical to the dataset adapted in ourstudy In this experiment proposed framework performstenfold cross validation Shan et al [18] and Fang et al [20]developed methods by conducting experiments using thesame strategy we employed Methods proposed by Wang etal [19] performed twentyfold cross validation and methodsusing active shape model [20] and active appearance model[20] performed twofold cross validation results are shownin Table 7 Table 7 indicates that performance of proposedmethod onMMI dataset is inferior to CK+ dataset because ofsubtler expressions notable changes in head rotation angleand fewer training data However proposed frameworkstill performs better than the other seven state-of-the-artmethods

44 Experimental Results on Oulu-CASIA VIS Dataset Inthis section we evaluated our framework on Oulu-CASIAVIS dataset In these set experiments proposed frameworkperforms tenfold cross validation The punishment coeffi-cient 119862 and kernel parameters 120574 of SVM for this dataset are1048507 and 933 times 10minus10 respectively First our methodis tested on three subsets of Oulu-CASIA VIS respectivelyThe experimental results are shown in Figure 10 As shownin Figure 10 the recognition rate obtained for the normalillumination condition is highest and the recognition rateobtained for the dark illumination condition is worst Evi-dently the facial images captured in the weak or dark illu-mination are often missing much useful texture information

Table 8 Comparison result of proposed framework and fourmethods on Oulu-CASIA VIS dataset

Research Methodology Accuracy ()Scovanner et al [21] 3D SIFT 5583Zhao and Pietikainen [22] LBPTOP 6813Klaser et al [23] HOG3D 7063Zhao et al [24] AdaLBP 7354Proposed 7437

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

Reco

gniti

on ra

te (

)

DarkWeakNormal

Figure 10 Recognition results of proposed methods on threesubsets of Oulu-CASIA VIS dataset

for facial expression recognition Then four methods in [21ndash24] are used to perform quantitative comparisons with ourframework on the subset obtained in normal illuminationconditions The experimental results are shown in Table 8Table 8 indicates that performance of proposed method onOulu-CASIA VIS dataset is inferior to CK+ dataset but supe-rior to MMI dataset Proposed framework still outperformsthe other four state-of-the-art methods

45 Computational Complexity Evaluation In this sectioncomputational complexity of our method is analyzed inCK+ MMI and Oulu-CASIA VIS dataset Processing time isapproximated to be time needed for preprocessing extractingspatial-temporal motion LBP and Gabor multiorientationfusion histogram and classifying each image frame of videosequence Runtime (measured using computer system clock)is estimated by usingmixed-language programming based onMATLAB andC++ on an Intel (R) Core (TM) i7-6700CPU at340GHz with 8GB RAM running onWindows 10 operatingsystem Average processing time per image is under 280 286and 249ms for CK+ MMI and Oulu-CASIA VIS datasetsrespectively Then we compare computational time of ourmethodwith system using LBPTOP [22] Comparison resultsin Table 9 show that computational time of our method islower than system using LBPTOP in three datasets Resultsshow that our method is more efficient and effective than

Mathematical Problems in Engineering 11

Table 9 Runtime of our method and LBPTOP in CK+ MMI andOulu-CASIA VIS datasets

Research CK+ MMI OuluLBPTOP 349ms 357ms 317msproposed 280ms 286ms 249ms

LBPTOP-based technology for facial expression recognitionin CK+ MMI and Oulu-CASIA VIS datasets

5 Conclusion

This paper presents novel facial expression recognitionframework integrating spatial-temporal motion LBP withGabor multiorientation fusion histogram Framework com-prises preprocessing (face recognition and points locationand tracking) dynamic and static feature extraction andfeature classification outperforming seven state-of-the-artmethods on CK+ dataset seven other methods on MMIdataset and four methods on Oulu-CASIA VIS datasetExpressions of disgust and happiness are easier to classifythan other expressions demonstrating better performance ofproposed method However recognition rates are lower forfear (880) and sadness (867) on CK+ dataset comparedwith others because most anger expressions are misclassifiedas sadness and vice-versa A lamination of proposed methodis that the occlusions head rotations and illumination canreduce accuracy of expression recognition these phenomenawill be solved in our future work A framework integratingother useful information (pupil head and body movements)with the features extracted in this paper could perform betterThis issue will also be studied in our future work Theproposed method cannot achieve real-time facial expressionrecognition We will attempt to establish an efficient dimen-sionality reduction method to reduce the computationalcomplexity of our framework in the future Furthermoreproposed system will be improved and applied in study ofhuman drowsiness expression analysis in our further work

Competing Interests

The authors declare no conflict of interests

Acknowledgments

This work was supported by the Open Foundation of StateKey Laboratory of Automotive Simulation and Control(China Grant no 20161105)

References

[1] P Ekman and E L Rosenberg What the Face Reveals Basicand Applied Studies of Spontaneous Expression Using the FacialAction Coding System Oxford University Press Oxford UK2005

[2] J A Russell J M Fernandez-Dols and G Mandler The Psy-chology of Facial Expression Cambridge University Press Cam-bridge UK 1997

[3] P Ekman and W V Friesen ldquoConstants across cultures in theface and emotionrdquo Journal of Personality and Social Psychologyvol 17 no 2 pp 124ndash129 1971

[4] B Golomb and T Sejnowski ldquoBenefits of machine understand-ing of facial expressionsrdquo in NSF ReportmdashFacial ExpressionUnderstanding pp 55ndash71 1997

[5] M Pantic ldquoFace for ambient interfacerdquo in Ambient Intelligencein Everyday Life vol 3864 of Lecture Notes on Artificial Intelli-gence pp 32ndash66 Springer Berlin Germany 2006

[6] R-L Hsu M Abdel-Mottaleb and A K Jain ldquoFace detectionin color imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 5 pp 696ndash706 2002

[7] H Fang and N Costen ldquoFrom Rank-N to Rank-1 face recog-nition based on motion similarityrdquo in Proceedings of the 20thBritish Machine Vision Conference (BMVC rsquo09) pp 1ndash11 Lon-don UK September 2009

[8] C Cornelis M D Cock and A M Radzikowska ldquoVaguelyquantified rough setsrdquo in Rough Sets Fuzzy Sets Data Miningand Granular Computing 11th International Conference RSFD-GrC 2007 Toronto Canada May 14ndash16 2007 Proceedings vol4482 of Lecture Notes in Computer Science pp 87ndash94 SpringerBerlin Germany 2007

[9] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[10] M Pantic and L UM Rothkrantz ldquoAutomatic analysis of facialexpressions the state of the artrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 22 no 12 pp 1424ndash14452000

[11] B Fasel and J Luettin ldquoAutomatic facial expression analysis asurveyrdquo Pattern Recognition vol 36 no 1 pp 259ndash275 2003

[12] C P Sumathi T Santhanam and M Mahadevi ldquoAutomaticfacial expression analysis a surveyrdquo International Journal ofComputer Science amp Engineering Survey vol 3 no 6 pp 47ndash592012

[13] M T Eskil and K S Benli ldquoFacial expression recognition basedon anatomyrdquo Computer Vision and Image Understanding vol119 pp 1ndash14 2014

[14] P Lucey J F Cohn T Kanade J Saragih Z Ambadar andI Matthews ldquoThe extended Cohn-Kanade dataset (CK+) acomplete dataset for action unit and emotion-specified expres-sionrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPRW rsquo10) pp94ndash101 San Francisco Calif USA June 2010

[15] S W Chew S Lucey P Lucey S Sridharan and J F ConnldquoImproved facial expression recognition via uni-hyperplaneclassificationrdquo in Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR rsquo12) pp 2554ndash2561IEEE June 2012

[16] X Fan and T Tjahjadi ldquoA spatial-temporal framework basedon histogram of gradients and optical flow for facial expressionrecognition in video sequencesrdquoPatternRecognition vol 48 no11 pp 3407ndash3416 2015

[17] S Jain CHu and J K Aggarwal ldquoFacial expression recognitionwith temporal modeling of shapesrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 1642ndash1649 November 2011

[18] C Shan S Gong and P W McOwan ldquoFacial expression rec-ognition based on Local Binary Patterns A ComprehensiveStudyrdquo Image and Vision Computing vol 27 no 6 pp 803ndash8162009

12 Mathematical Problems in Engineering

[19] Z Wang S Wang and Q Ji ldquoCapturing complex spatio-temporal relations among facial muscles for facial expressionrecognitionrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo13) pp 3422ndash3429 IEEE Portland Ore USA June 2013

[20] H Fang N Mac Parthalain A J Aubrey et al ldquoFacial ex-pression recognition in dynamic sequences an integrated ap-proachrdquo Pattern Recognition vol 47 no 3 pp 1271ndash1281 2014

[21] P Scovanner S Ali andM Shah ldquoA 3-dimensional sift descrip-tor and its application to action recognitionrdquo in Proceedingsof the 15th ACM International Conference on Multimedia (MMrsquo07) pp 357ndash360 ACM Augsburg Germany September 2007

[22] G Zhao and M Pietikainen ldquoDynamic texture recognitionusing local binary patterns with an application to facial expres-sionsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 29 no 6 pp 915ndash928 2007

[23] A Klaser M Marszałek and C Schmid ldquoA spatio-temporaldescriptor based on 3D-gradientsrdquo in Proceedings of the 19thBritish Machine Vision Conference (BMVC rsquo08) pp 275ndash281Leeds UK September 2008

[24] G Zhao XHuangMTaini S Z Li andM Pietikainen ldquoFacialexpression recognition from near-infrared videosrdquo Image andVision Computing vol 29 no 9 pp 607ndash619 2011

[25] P Ekman W V Friesen and J C Hager Facial Action CodingSystem A Human Face Salt Lake City Utah USA 2002

[26] M F Valstar and M Pantic ldquoFully automatic recognition ofthe temporal phases of facial actionsrdquo IEEE Transactions onSystems Man and Cybernetics Part B Cybernetics vol 42 no1 pp 28ndash43 2012

[27] Y Zhang and Q Ji ldquoActive and dynamic information fusion forfacial expression understanding from image sequencesrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol27 no 5 pp 699ndash714 2005

[28] M Pantic and I Patras ldquoDynamics of facial expression recog-nition of facial actions and their temporal segments from faceprofile image sequencesrdquo IEEE Transactions on Systems Manand Cybernetics Part B Cybernetics vol 36 no 2 pp 433ndash4492006

[29] Y Tong W Liao and Q Ji ldquoFacial action unit recognitionby exploiting their dynamic and semantic relationshipsrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol29 no 10 pp 1683ndash1699 2007

[30] Y Zhang Q Ji Z Zhu and B Yi ldquoDynamic facial expressionanalysis and synthesis with MPEG-4 facial animation param-etersrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 18 no 10 pp 1383ndash1396 2008

[31] G Donate M S Bartlett J C Hager P Ekman and T JSejnowski ldquoClassifying facial actionsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 21 no 10 pp974ndash989 1999

[32] Y Li S M Mavadati M H Mahoor Y Zhao and Q JildquoMeasuring the intensity of spontaneous facial action units withdynamic Bayesian networkrdquo Pattern Recognition vol 48 no 11pp 3417ndash3427 2015

[33] J J-J Lien T Kanade J F Cohn and C-C Li ldquoDetectiontracking and classification of action units in facial expressionrdquoRobotics and Autonomous Systems vol 31 no 3 pp 131ndash1462000

[34] M F Valstar and M Pantic ldquoCombined support vector ma-chines and hidden Markov models for modeling facial action

temporal dynamicsrdquo Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence and LectureNotes in Bioinformatics) vol 4796 pp 118ndash127 2007

[35] T Danisman I M Bilasco J Martinet and C DjerabaldquoIntelligent pixels of interest selection with application to facialexpression recognition using multilayer perceptronrdquo SignalProcessing vol 93 no 6 pp 1547ndash1556 2013

[36] W-L Chao J-J Ding and J-Z Liu ldquoFacial expression recog-nition based on improved local binary pattern and class-regularized locality preserving projectionrdquo Signal Processingvol 117 pp 1ndash10 2015

[37] I Kotsia and I Pitas ldquoFacial expression recognition in imagesequences using geometric deformation features and supportvector machinesrdquo IEEE Transactions on Image Processing vol16 no 1 pp 172ndash187 2007

[38] P Viola and M J Jones ldquoRobust Real-Time Face DetectionrdquoInternational Journal of Computer Vision vol 57 no 2 pp 137ndash154 2004

[39] A Asthana S Zafeiriou S Cheng andM Pantic ldquoIncrementalface alignment in the wildrdquo in Proceedings of the 27th IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo14) pp 1859ndash1866 IEEE Columbus Ohio USA June 2014

[40] T Ojala M Pietikainen and D Harwood ldquoA comparativestudy of texture measures with classification based on featureddistributionsrdquo Pattern Recognition vol 29 no 1 pp 51ndash59 1996

[41] S-S Liu Y-T Tian and CWan ldquoFacial expression recognitionmethod based on Gabor multi-orientation features fusion andblock histogramrdquo Zidonghua XuebaoActa Automatica Sinicavol 37 no 12 pp 1455ndash1463 2011

[42] C-W Hsu and C-J Lin ldquoA comparison of methods for mul-ticlass support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[43] T Kanade J F Cohn and Y Tian ldquoComprehensive database forfacial expression analysisrdquo in Proceedings of the 4th IEEE Inter-national Conference on Automatic Face and Gesture Recognition(FG rsquo00) pp 46ndash53 Grenoble France March 2000

[44] M Pantic M Valstar R Rademaker and L Maat ldquoWeb-baseddatabase for facial expression analysisrdquo in Proceedings of theIEEE International Conference on Multimedia and Expo (ICMErsquo05) pp 317ndash321 Amsterdam Netherlands July 2005

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Facial Expression Recognition from Video Sequences Based ...downloads.hindawi.com/journals/mpe/2017/7206041.pdf · Facial Expression Recognition from Video Sequences Based on Spatial-Temporal

Mathematical Problems in Engineering 7

Table 2 Classification results of using spatial-temporal motion LBP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 1 0 20 2 2 0 800Happiness 0 0 0 69 0 0 100Sadness 4 0 2 0 21 1 750Surprise 0 0 2 1 0 80 964Overall 941

Table 3 Classification results of using Gabor multiorientation fusion histogram on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 40 2 0 0 3 0 889Disgust 2 56 0 1 0 0 949Fear 1 0 17 3 4 0 680Happiness 0 0 0 69 0 0 100Sadness 4 0 0 0 24 0 857Surprise 0 1 0 0 0 82 988Overall 932

Angry Disgust FearHappiness Sadness Surprise

Figure 6 Sample images in CK+ dataset and MMI dataset

precision for grayscale values for our study and extractedframes from neutral to apex phases of facial expressionsBottom row of Figure 6 shows peak frames of six expressionsfrom six subjects in MMI dataset Table 1 shows number ofeach expression class in MMI dataset in our experiment Asseen from Table 1 MMI dataset includes fewer subjects thanCK+ dataset

Oulu-CASIAVIS facial expression database [24] includessix basic expressions from 80 subjects between 23 to 58years old 262 of the subjects are females All the imagesequences were taken under the visible light condition Oulu-CASIA VIS contains three subsets All expressions in thethree subsets were respectively captured in three different

illumination conditions normal weak and darkThe imagesof normal illumination condition are captured by usinggood normal lighting Weak illumination means that onlycomputer display is on and each subject sits in front of thedisplay Dark illumination means almost darkness Videosequences contain frames fromneutral phase to peak phase offacial expressions and video rate is 25 frames per secondTheface regions of six expressions in Oulu-CASIA VIS datasetare shown in Figure 7 (top row normal middle row weakand bottom row dark)The resolution of the images in Oulu-CASIA VIS dataset (320 times 240) is lower than that of theimages in CK+ and MMI datasets The number of videosequences is 480 for each illumination condition Table 1

8 Mathematical Problems in Engineering

Anger Disgust FearHappiness Sadness Surprise

Figure 7 Sample images in Oulu-CASIA VIS dataset

shows number of each expression class in Oulu-CASIA VISdataset in our experiment

42 Experimental Results on CK+ Dataset Three sets of ex-periments are conducted to evaluate performance of ourmethod on CK+ dataset because head motions of subjectsin CK+ dataset are smaller than that in other two datasetsThen in this paper performance of framework proposedis compared with seven state-of-the-art methods Leave-one-out cross validation strategy is applied in these setsof experiments The punishment coefficient 119862 and kernelparameters 120574 of SVM forCK+dataset are 1048576 and 372 times10minus9 respectively

In first set of experiments contributions of three LBPhistograms on three planes (three components of LBPTOP)are investigated separately to determine accuracy of facerecognition Figure 8 presents recognition rates of classifyingsix basic expressions using LBP histograms from three indi-vidual planes (ie LBP-XY LBP-XT and LBP-YT) and theircombination (ie LBPTOP) The figure indicates that usingLBPTOP yields the highest recognition rate (945) LBP-YTdescribes better variation in texture along vertical direction(935) and lowest performance is attributed to LBP-XYcontaining appearance characteristics of different subjects(822) Then we investigate the performance of spatial-temporalmotion LBP feature which joins the LBP histogramson XT and YT planes together and compare with LBPTOPBased on results shown in Figure 9 recognition rates are

LBP-XY LBP-XT LBP-YT LBPTOP60

65

70

75

80

85

90

95

100

Reco

gniti

on ra

te (

)

822

919935 945

Figure 8 Recognition rates of methods using LBP histograms fromthree planes and LBPTOP

almost similar Furthermore spatial-temporal motion LBPfeature has higher recognition rate than LBPTOP in terms ofanger and disgust Results shown in Figures 8 and 9mean thefollowing (1) variation of texture onXT andYTplane ismoresignificant than XY plane in expression sequence (2) com-pared with LBPTOP dimension of spatial-temporal motionLBP feature is reduced by 13 but recognition accuracy is notdecreased

Mathematical Problems in Engineering 9

Table 4 Classification results of using fusion features on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 0 0 22 0 2 1 880Happiness 0 0 0 69 0 0 100Sadness 3 0 1 0 24 0 857Surprise 0 0 2 1 0 80 964Overall 958

Table 5 Comparison results of using fusion feature and LBPTOP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 933 (+22) 0 0 0 67 0Disgust 0 100 (00) 0 0 0 0Fear 0 0 880 (00) 0 8 4Happiness 0 0 0 100 (00) 0 0Sadness 107 0 36 0 867 (+81) 0Surprise 0 0 24 12 0 964 (00)Overall 958 (+13)

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

100

Reco

gniti

on ra

te (

)

Spatial-temporal LBPLBPTOP

Figure 9 Recognition rates of methods using LBPTOP and spatial-temporal motion LBP

Second set of experiments evaluates effectiveness of com-bining spatial-temporal motion LBP with Gabor multiorien-tation fusion histogram Tables 2ndash4 show results obtained byusing spatial-temporal motion LBP Gabor multiorientationfusion histogram and proposed framework with multiclassSVM classifier Results include recognition rate of eachexpression and overall classifications accuracy Each blackdigital represents number of correctly classified subjects ofeach expression category and each dark and bolder digitalimage represents number of misclassified subjects of each

expression category Results show that when using combina-tion of spatial-temporalmotion LBP andGabormultiorienta-tion fusion histogram framework performs better than usingindividual features

Third set of experiments compares performance betweenproposedmethods and approach using LBPTOP [22] Resultsin Table 5 show recognition rate of method using combina-tion of spatial-temporal motion LBP with Gabor multiori-entation fusion histogram (ie proposed framework) wherefigures in parentheses represent difference in accuracy inusing LBPTOP and features proposed in this paper positivefigure shows better accuracy of proposed features Based onTable 5 proposed framework performs better in two expres-sions and same in others Furthermore overall recognitionrate is slightly better when using proposed method There-foremethod fusing spatial-temporalmotion LBPwithGabormultiorientation fusion histogram outperforms LBPTOP

In this set experiment we compared our method withsystems proposed by Eskil and Benli [13] Lucey et al [14]Chew et al [15] Fan and Tjahjadi [16] and Jain et al [17]on CK+ dataset The datasets used in these papers are thesame and identical to the dataset adopted in our studyTherefore results presented in these papers can be directlycompared with our method Methods proposed by Eskiland Benli [13] Lucey et al [14] Chew et al [15] and Fanand Tjahjadi [16] used leave-one-subject-out cross validationmethod and technology proposed by Jain et al [17] usedtenfold cross validation Based on results shown in Table 6proposed framework shows average accuracy of 958 for allsix basic facial expressions performing better than methodsdeveloped by mentioned authors in this paragraph

43 Experimental Results onMMIDataset Sevenmethods in[18ndash20] are used to perform quantitative comparisons with

10 Mathematical Problems in Engineering

Table 6 Comparison of results in proposed framework and sevenstate-of-the-art methods on CK+ dataset

Research Methodology AccuracyEskil and Benli [13] High-polygon wireframe mode 850Lucey et al [14] AAM shape 689

AAM appearance 845Combined 887

Chew et al [15] Uni-hyperplane classification 894Fan and Tjahjadi [16] PHOGTOP and optical flow 909Jain et al [17] Temporal modeling of shapes 919Proposed 958

Table 7 Comparison result of proposed framework and sevensystems on MMI dataset

Research Methodology Accuracy ()Shan et al [18] LBP 4778Wang et al [19] AdaBoost 478

HMM 515ITBN 597

Fang et al [20] Active shape model 6238Active appearance model 6435Gabor + Geometry + SVM 7067

Proposed 7192

our framework on MMI dataset The punishment coefficient119862 and kernel parameters 120574 of SVM for MMI dataset are2097152 and 931 times 10minus10 respectively Datasets used inthese papers are identical to the dataset adapted in ourstudy In this experiment proposed framework performstenfold cross validation Shan et al [18] and Fang et al [20]developed methods by conducting experiments using thesame strategy we employed Methods proposed by Wang etal [19] performed twentyfold cross validation and methodsusing active shape model [20] and active appearance model[20] performed twofold cross validation results are shownin Table 7 Table 7 indicates that performance of proposedmethod onMMI dataset is inferior to CK+ dataset because ofsubtler expressions notable changes in head rotation angleand fewer training data However proposed frameworkstill performs better than the other seven state-of-the-artmethods

44 Experimental Results on Oulu-CASIA VIS Dataset Inthis section we evaluated our framework on Oulu-CASIAVIS dataset In these set experiments proposed frameworkperforms tenfold cross validation The punishment coeffi-cient 119862 and kernel parameters 120574 of SVM for this dataset are1048507 and 933 times 10minus10 respectively First our methodis tested on three subsets of Oulu-CASIA VIS respectivelyThe experimental results are shown in Figure 10 As shownin Figure 10 the recognition rate obtained for the normalillumination condition is highest and the recognition rateobtained for the dark illumination condition is worst Evi-dently the facial images captured in the weak or dark illu-mination are often missing much useful texture information

Table 8 Comparison result of proposed framework and fourmethods on Oulu-CASIA VIS dataset

Research Methodology Accuracy ()Scovanner et al [21] 3D SIFT 5583Zhao and Pietikainen [22] LBPTOP 6813Klaser et al [23] HOG3D 7063Zhao et al [24] AdaLBP 7354Proposed 7437

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

Reco

gniti

on ra

te (

)

DarkWeakNormal

Figure 10 Recognition results of proposed methods on threesubsets of Oulu-CASIA VIS dataset

for facial expression recognition Then four methods in [21ndash24] are used to perform quantitative comparisons with ourframework on the subset obtained in normal illuminationconditions The experimental results are shown in Table 8Table 8 indicates that performance of proposed method onOulu-CASIA VIS dataset is inferior to CK+ dataset but supe-rior to MMI dataset Proposed framework still outperformsthe other four state-of-the-art methods

45 Computational Complexity Evaluation In this sectioncomputational complexity of our method is analyzed inCK+ MMI and Oulu-CASIA VIS dataset Processing time isapproximated to be time needed for preprocessing extractingspatial-temporal motion LBP and Gabor multiorientationfusion histogram and classifying each image frame of videosequence Runtime (measured using computer system clock)is estimated by usingmixed-language programming based onMATLAB andC++ on an Intel (R) Core (TM) i7-6700CPU at340GHz with 8GB RAM running onWindows 10 operatingsystem Average processing time per image is under 280 286and 249ms for CK+ MMI and Oulu-CASIA VIS datasetsrespectively Then we compare computational time of ourmethodwith system using LBPTOP [22] Comparison resultsin Table 9 show that computational time of our method islower than system using LBPTOP in three datasets Resultsshow that our method is more efficient and effective than

Mathematical Problems in Engineering 11

Table 9 Runtime of our method and LBPTOP in CK+ MMI andOulu-CASIA VIS datasets

Research CK+ MMI OuluLBPTOP 349ms 357ms 317msproposed 280ms 286ms 249ms

LBPTOP-based technology for facial expression recognitionin CK+ MMI and Oulu-CASIA VIS datasets

5 Conclusion

This paper presents novel facial expression recognitionframework integrating spatial-temporal motion LBP withGabor multiorientation fusion histogram Framework com-prises preprocessing (face recognition and points locationand tracking) dynamic and static feature extraction andfeature classification outperforming seven state-of-the-artmethods on CK+ dataset seven other methods on MMIdataset and four methods on Oulu-CASIA VIS datasetExpressions of disgust and happiness are easier to classifythan other expressions demonstrating better performance ofproposed method However recognition rates are lower forfear (880) and sadness (867) on CK+ dataset comparedwith others because most anger expressions are misclassifiedas sadness and vice-versa A lamination of proposed methodis that the occlusions head rotations and illumination canreduce accuracy of expression recognition these phenomenawill be solved in our future work A framework integratingother useful information (pupil head and body movements)with the features extracted in this paper could perform betterThis issue will also be studied in our future work Theproposed method cannot achieve real-time facial expressionrecognition We will attempt to establish an efficient dimen-sionality reduction method to reduce the computationalcomplexity of our framework in the future Furthermoreproposed system will be improved and applied in study ofhuman drowsiness expression analysis in our further work

Competing Interests

The authors declare no conflict of interests

Acknowledgments

This work was supported by the Open Foundation of StateKey Laboratory of Automotive Simulation and Control(China Grant no 20161105)

References

[1] P Ekman and E L Rosenberg What the Face Reveals Basicand Applied Studies of Spontaneous Expression Using the FacialAction Coding System Oxford University Press Oxford UK2005

[2] J A Russell J M Fernandez-Dols and G Mandler The Psy-chology of Facial Expression Cambridge University Press Cam-bridge UK 1997

[3] P Ekman and W V Friesen ldquoConstants across cultures in theface and emotionrdquo Journal of Personality and Social Psychologyvol 17 no 2 pp 124ndash129 1971

[4] B Golomb and T Sejnowski ldquoBenefits of machine understand-ing of facial expressionsrdquo in NSF ReportmdashFacial ExpressionUnderstanding pp 55ndash71 1997

[5] M Pantic ldquoFace for ambient interfacerdquo in Ambient Intelligencein Everyday Life vol 3864 of Lecture Notes on Artificial Intelli-gence pp 32ndash66 Springer Berlin Germany 2006

[6] R-L Hsu M Abdel-Mottaleb and A K Jain ldquoFace detectionin color imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 5 pp 696ndash706 2002

[7] H Fang and N Costen ldquoFrom Rank-N to Rank-1 face recog-nition based on motion similarityrdquo in Proceedings of the 20thBritish Machine Vision Conference (BMVC rsquo09) pp 1ndash11 Lon-don UK September 2009

[8] C Cornelis M D Cock and A M Radzikowska ldquoVaguelyquantified rough setsrdquo in Rough Sets Fuzzy Sets Data Miningand Granular Computing 11th International Conference RSFD-GrC 2007 Toronto Canada May 14ndash16 2007 Proceedings vol4482 of Lecture Notes in Computer Science pp 87ndash94 SpringerBerlin Germany 2007

[9] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[10] M Pantic and L UM Rothkrantz ldquoAutomatic analysis of facialexpressions the state of the artrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 22 no 12 pp 1424ndash14452000

[11] B Fasel and J Luettin ldquoAutomatic facial expression analysis asurveyrdquo Pattern Recognition vol 36 no 1 pp 259ndash275 2003

[12] C P Sumathi T Santhanam and M Mahadevi ldquoAutomaticfacial expression analysis a surveyrdquo International Journal ofComputer Science amp Engineering Survey vol 3 no 6 pp 47ndash592012

[13] M T Eskil and K S Benli ldquoFacial expression recognition basedon anatomyrdquo Computer Vision and Image Understanding vol119 pp 1ndash14 2014

[14] P Lucey J F Cohn T Kanade J Saragih Z Ambadar andI Matthews ldquoThe extended Cohn-Kanade dataset (CK+) acomplete dataset for action unit and emotion-specified expres-sionrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPRW rsquo10) pp94ndash101 San Francisco Calif USA June 2010

[15] S W Chew S Lucey P Lucey S Sridharan and J F ConnldquoImproved facial expression recognition via uni-hyperplaneclassificationrdquo in Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR rsquo12) pp 2554ndash2561IEEE June 2012

[16] X Fan and T Tjahjadi ldquoA spatial-temporal framework basedon histogram of gradients and optical flow for facial expressionrecognition in video sequencesrdquoPatternRecognition vol 48 no11 pp 3407ndash3416 2015

[17] S Jain CHu and J K Aggarwal ldquoFacial expression recognitionwith temporal modeling of shapesrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 1642ndash1649 November 2011

[18] C Shan S Gong and P W McOwan ldquoFacial expression rec-ognition based on Local Binary Patterns A ComprehensiveStudyrdquo Image and Vision Computing vol 27 no 6 pp 803ndash8162009

12 Mathematical Problems in Engineering

[19] Z Wang S Wang and Q Ji ldquoCapturing complex spatio-temporal relations among facial muscles for facial expressionrecognitionrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo13) pp 3422ndash3429 IEEE Portland Ore USA June 2013

[20] H Fang N Mac Parthalain A J Aubrey et al ldquoFacial ex-pression recognition in dynamic sequences an integrated ap-proachrdquo Pattern Recognition vol 47 no 3 pp 1271ndash1281 2014

[21] P Scovanner S Ali andM Shah ldquoA 3-dimensional sift descrip-tor and its application to action recognitionrdquo in Proceedingsof the 15th ACM International Conference on Multimedia (MMrsquo07) pp 357ndash360 ACM Augsburg Germany September 2007

[22] G Zhao and M Pietikainen ldquoDynamic texture recognitionusing local binary patterns with an application to facial expres-sionsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 29 no 6 pp 915ndash928 2007

[23] A Klaser M Marszałek and C Schmid ldquoA spatio-temporaldescriptor based on 3D-gradientsrdquo in Proceedings of the 19thBritish Machine Vision Conference (BMVC rsquo08) pp 275ndash281Leeds UK September 2008

[24] G Zhao XHuangMTaini S Z Li andM Pietikainen ldquoFacialexpression recognition from near-infrared videosrdquo Image andVision Computing vol 29 no 9 pp 607ndash619 2011

[25] P Ekman W V Friesen and J C Hager Facial Action CodingSystem A Human Face Salt Lake City Utah USA 2002

[26] M F Valstar and M Pantic ldquoFully automatic recognition ofthe temporal phases of facial actionsrdquo IEEE Transactions onSystems Man and Cybernetics Part B Cybernetics vol 42 no1 pp 28ndash43 2012

[27] Y Zhang and Q Ji ldquoActive and dynamic information fusion forfacial expression understanding from image sequencesrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol27 no 5 pp 699ndash714 2005

[28] M Pantic and I Patras ldquoDynamics of facial expression recog-nition of facial actions and their temporal segments from faceprofile image sequencesrdquo IEEE Transactions on Systems Manand Cybernetics Part B Cybernetics vol 36 no 2 pp 433ndash4492006

[29] Y Tong W Liao and Q Ji ldquoFacial action unit recognitionby exploiting their dynamic and semantic relationshipsrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol29 no 10 pp 1683ndash1699 2007

[30] Y Zhang Q Ji Z Zhu and B Yi ldquoDynamic facial expressionanalysis and synthesis with MPEG-4 facial animation param-etersrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 18 no 10 pp 1383ndash1396 2008

[31] G Donate M S Bartlett J C Hager P Ekman and T JSejnowski ldquoClassifying facial actionsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 21 no 10 pp974ndash989 1999

[32] Y Li S M Mavadati M H Mahoor Y Zhao and Q JildquoMeasuring the intensity of spontaneous facial action units withdynamic Bayesian networkrdquo Pattern Recognition vol 48 no 11pp 3417ndash3427 2015

[33] J J-J Lien T Kanade J F Cohn and C-C Li ldquoDetectiontracking and classification of action units in facial expressionrdquoRobotics and Autonomous Systems vol 31 no 3 pp 131ndash1462000

[34] M F Valstar and M Pantic ldquoCombined support vector ma-chines and hidden Markov models for modeling facial action

temporal dynamicsrdquo Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence and LectureNotes in Bioinformatics) vol 4796 pp 118ndash127 2007

[35] T Danisman I M Bilasco J Martinet and C DjerabaldquoIntelligent pixels of interest selection with application to facialexpression recognition using multilayer perceptronrdquo SignalProcessing vol 93 no 6 pp 1547ndash1556 2013

[36] W-L Chao J-J Ding and J-Z Liu ldquoFacial expression recog-nition based on improved local binary pattern and class-regularized locality preserving projectionrdquo Signal Processingvol 117 pp 1ndash10 2015

[37] I Kotsia and I Pitas ldquoFacial expression recognition in imagesequences using geometric deformation features and supportvector machinesrdquo IEEE Transactions on Image Processing vol16 no 1 pp 172ndash187 2007

[38] P Viola and M J Jones ldquoRobust Real-Time Face DetectionrdquoInternational Journal of Computer Vision vol 57 no 2 pp 137ndash154 2004

[39] A Asthana S Zafeiriou S Cheng andM Pantic ldquoIncrementalface alignment in the wildrdquo in Proceedings of the 27th IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo14) pp 1859ndash1866 IEEE Columbus Ohio USA June 2014

[40] T Ojala M Pietikainen and D Harwood ldquoA comparativestudy of texture measures with classification based on featureddistributionsrdquo Pattern Recognition vol 29 no 1 pp 51ndash59 1996

[41] S-S Liu Y-T Tian and CWan ldquoFacial expression recognitionmethod based on Gabor multi-orientation features fusion andblock histogramrdquo Zidonghua XuebaoActa Automatica Sinicavol 37 no 12 pp 1455ndash1463 2011

[42] C-W Hsu and C-J Lin ldquoA comparison of methods for mul-ticlass support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[43] T Kanade J F Cohn and Y Tian ldquoComprehensive database forfacial expression analysisrdquo in Proceedings of the 4th IEEE Inter-national Conference on Automatic Face and Gesture Recognition(FG rsquo00) pp 46ndash53 Grenoble France March 2000

[44] M Pantic M Valstar R Rademaker and L Maat ldquoWeb-baseddatabase for facial expression analysisrdquo in Proceedings of theIEEE International Conference on Multimedia and Expo (ICMErsquo05) pp 317ndash321 Amsterdam Netherlands July 2005

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Facial Expression Recognition from Video Sequences Based ...downloads.hindawi.com/journals/mpe/2017/7206041.pdf · Facial Expression Recognition from Video Sequences Based on Spatial-Temporal

8 Mathematical Problems in Engineering

Anger Disgust FearHappiness Sadness Surprise

Figure 7 Sample images in Oulu-CASIA VIS dataset

shows number of each expression class in Oulu-CASIA VISdataset in our experiment

42 Experimental Results on CK+ Dataset Three sets of ex-periments are conducted to evaluate performance of ourmethod on CK+ dataset because head motions of subjectsin CK+ dataset are smaller than that in other two datasetsThen in this paper performance of framework proposedis compared with seven state-of-the-art methods Leave-one-out cross validation strategy is applied in these setsof experiments The punishment coefficient 119862 and kernelparameters 120574 of SVM forCK+dataset are 1048576 and 372 times10minus9 respectively

In first set of experiments contributions of three LBPhistograms on three planes (three components of LBPTOP)are investigated separately to determine accuracy of facerecognition Figure 8 presents recognition rates of classifyingsix basic expressions using LBP histograms from three indi-vidual planes (ie LBP-XY LBP-XT and LBP-YT) and theircombination (ie LBPTOP) The figure indicates that usingLBPTOP yields the highest recognition rate (945) LBP-YTdescribes better variation in texture along vertical direction(935) and lowest performance is attributed to LBP-XYcontaining appearance characteristics of different subjects(822) Then we investigate the performance of spatial-temporalmotion LBP feature which joins the LBP histogramson XT and YT planes together and compare with LBPTOPBased on results shown in Figure 9 recognition rates are

LBP-XY LBP-XT LBP-YT LBPTOP60

65

70

75

80

85

90

95

100

Reco

gniti

on ra

te (

)

822

919935 945

Figure 8 Recognition rates of methods using LBP histograms fromthree planes and LBPTOP

almost similar Furthermore spatial-temporal motion LBPfeature has higher recognition rate than LBPTOP in terms ofanger and disgust Results shown in Figures 8 and 9mean thefollowing (1) variation of texture onXT andYTplane ismoresignificant than XY plane in expression sequence (2) com-pared with LBPTOP dimension of spatial-temporal motionLBP feature is reduced by 13 but recognition accuracy is notdecreased

Mathematical Problems in Engineering 9

Table 4 Classification results of using fusion features on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 0 0 22 0 2 1 880Happiness 0 0 0 69 0 0 100Sadness 3 0 1 0 24 0 857Surprise 0 0 2 1 0 80 964Overall 958

Table 5 Comparison results of using fusion feature and LBPTOP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 933 (+22) 0 0 0 67 0Disgust 0 100 (00) 0 0 0 0Fear 0 0 880 (00) 0 8 4Happiness 0 0 0 100 (00) 0 0Sadness 107 0 36 0 867 (+81) 0Surprise 0 0 24 12 0 964 (00)Overall 958 (+13)

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

100

Reco

gniti

on ra

te (

)

Spatial-temporal LBPLBPTOP

Figure 9 Recognition rates of methods using LBPTOP and spatial-temporal motion LBP

Second set of experiments evaluates effectiveness of com-bining spatial-temporal motion LBP with Gabor multiorien-tation fusion histogram Tables 2ndash4 show results obtained byusing spatial-temporal motion LBP Gabor multiorientationfusion histogram and proposed framework with multiclassSVM classifier Results include recognition rate of eachexpression and overall classifications accuracy Each blackdigital represents number of correctly classified subjects ofeach expression category and each dark and bolder digitalimage represents number of misclassified subjects of each

expression category Results show that when using combina-tion of spatial-temporalmotion LBP andGabormultiorienta-tion fusion histogram framework performs better than usingindividual features

Third set of experiments compares performance betweenproposedmethods and approach using LBPTOP [22] Resultsin Table 5 show recognition rate of method using combina-tion of spatial-temporal motion LBP with Gabor multiori-entation fusion histogram (ie proposed framework) wherefigures in parentheses represent difference in accuracy inusing LBPTOP and features proposed in this paper positivefigure shows better accuracy of proposed features Based onTable 5 proposed framework performs better in two expres-sions and same in others Furthermore overall recognitionrate is slightly better when using proposed method There-foremethod fusing spatial-temporalmotion LBPwithGabormultiorientation fusion histogram outperforms LBPTOP

In this set experiment we compared our method withsystems proposed by Eskil and Benli [13] Lucey et al [14]Chew et al [15] Fan and Tjahjadi [16] and Jain et al [17]on CK+ dataset The datasets used in these papers are thesame and identical to the dataset adopted in our studyTherefore results presented in these papers can be directlycompared with our method Methods proposed by Eskiland Benli [13] Lucey et al [14] Chew et al [15] and Fanand Tjahjadi [16] used leave-one-subject-out cross validationmethod and technology proposed by Jain et al [17] usedtenfold cross validation Based on results shown in Table 6proposed framework shows average accuracy of 958 for allsix basic facial expressions performing better than methodsdeveloped by mentioned authors in this paragraph

43 Experimental Results onMMIDataset Sevenmethods in[18ndash20] are used to perform quantitative comparisons with

10 Mathematical Problems in Engineering

Table 6 Comparison of results in proposed framework and sevenstate-of-the-art methods on CK+ dataset

Research Methodology AccuracyEskil and Benli [13] High-polygon wireframe mode 850Lucey et al [14] AAM shape 689

AAM appearance 845Combined 887

Chew et al [15] Uni-hyperplane classification 894Fan and Tjahjadi [16] PHOGTOP and optical flow 909Jain et al [17] Temporal modeling of shapes 919Proposed 958

Table 7 Comparison result of proposed framework and sevensystems on MMI dataset

Research Methodology Accuracy ()Shan et al [18] LBP 4778Wang et al [19] AdaBoost 478

HMM 515ITBN 597

Fang et al [20] Active shape model 6238Active appearance model 6435Gabor + Geometry + SVM 7067

Proposed 7192

our framework on MMI dataset The punishment coefficient119862 and kernel parameters 120574 of SVM for MMI dataset are2097152 and 931 times 10minus10 respectively Datasets used inthese papers are identical to the dataset adapted in ourstudy In this experiment proposed framework performstenfold cross validation Shan et al [18] and Fang et al [20]developed methods by conducting experiments using thesame strategy we employed Methods proposed by Wang etal [19] performed twentyfold cross validation and methodsusing active shape model [20] and active appearance model[20] performed twofold cross validation results are shownin Table 7 Table 7 indicates that performance of proposedmethod onMMI dataset is inferior to CK+ dataset because ofsubtler expressions notable changes in head rotation angleand fewer training data However proposed frameworkstill performs better than the other seven state-of-the-artmethods

44 Experimental Results on Oulu-CASIA VIS Dataset Inthis section we evaluated our framework on Oulu-CASIAVIS dataset In these set experiments proposed frameworkperforms tenfold cross validation The punishment coeffi-cient 119862 and kernel parameters 120574 of SVM for this dataset are1048507 and 933 times 10minus10 respectively First our methodis tested on three subsets of Oulu-CASIA VIS respectivelyThe experimental results are shown in Figure 10 As shownin Figure 10 the recognition rate obtained for the normalillumination condition is highest and the recognition rateobtained for the dark illumination condition is worst Evi-dently the facial images captured in the weak or dark illu-mination are often missing much useful texture information

Table 8 Comparison result of proposed framework and fourmethods on Oulu-CASIA VIS dataset

Research Methodology Accuracy ()Scovanner et al [21] 3D SIFT 5583Zhao and Pietikainen [22] LBPTOP 6813Klaser et al [23] HOG3D 7063Zhao et al [24] AdaLBP 7354Proposed 7437

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

Reco

gniti

on ra

te (

)

DarkWeakNormal

Figure 10 Recognition results of proposed methods on threesubsets of Oulu-CASIA VIS dataset

for facial expression recognition Then four methods in [21ndash24] are used to perform quantitative comparisons with ourframework on the subset obtained in normal illuminationconditions The experimental results are shown in Table 8Table 8 indicates that performance of proposed method onOulu-CASIA VIS dataset is inferior to CK+ dataset but supe-rior to MMI dataset Proposed framework still outperformsthe other four state-of-the-art methods

45 Computational Complexity Evaluation In this sectioncomputational complexity of our method is analyzed inCK+ MMI and Oulu-CASIA VIS dataset Processing time isapproximated to be time needed for preprocessing extractingspatial-temporal motion LBP and Gabor multiorientationfusion histogram and classifying each image frame of videosequence Runtime (measured using computer system clock)is estimated by usingmixed-language programming based onMATLAB andC++ on an Intel (R) Core (TM) i7-6700CPU at340GHz with 8GB RAM running onWindows 10 operatingsystem Average processing time per image is under 280 286and 249ms for CK+ MMI and Oulu-CASIA VIS datasetsrespectively Then we compare computational time of ourmethodwith system using LBPTOP [22] Comparison resultsin Table 9 show that computational time of our method islower than system using LBPTOP in three datasets Resultsshow that our method is more efficient and effective than

Mathematical Problems in Engineering 11

Table 9 Runtime of our method and LBPTOP in CK+ MMI andOulu-CASIA VIS datasets

Research CK+ MMI OuluLBPTOP 349ms 357ms 317msproposed 280ms 286ms 249ms

LBPTOP-based technology for facial expression recognitionin CK+ MMI and Oulu-CASIA VIS datasets

5 Conclusion

This paper presents novel facial expression recognitionframework integrating spatial-temporal motion LBP withGabor multiorientation fusion histogram Framework com-prises preprocessing (face recognition and points locationand tracking) dynamic and static feature extraction andfeature classification outperforming seven state-of-the-artmethods on CK+ dataset seven other methods on MMIdataset and four methods on Oulu-CASIA VIS datasetExpressions of disgust and happiness are easier to classifythan other expressions demonstrating better performance ofproposed method However recognition rates are lower forfear (880) and sadness (867) on CK+ dataset comparedwith others because most anger expressions are misclassifiedas sadness and vice-versa A lamination of proposed methodis that the occlusions head rotations and illumination canreduce accuracy of expression recognition these phenomenawill be solved in our future work A framework integratingother useful information (pupil head and body movements)with the features extracted in this paper could perform betterThis issue will also be studied in our future work Theproposed method cannot achieve real-time facial expressionrecognition We will attempt to establish an efficient dimen-sionality reduction method to reduce the computationalcomplexity of our framework in the future Furthermoreproposed system will be improved and applied in study ofhuman drowsiness expression analysis in our further work

Competing Interests

The authors declare no conflict of interests

Acknowledgments

This work was supported by the Open Foundation of StateKey Laboratory of Automotive Simulation and Control(China Grant no 20161105)

References

[1] P Ekman and E L Rosenberg What the Face Reveals Basicand Applied Studies of Spontaneous Expression Using the FacialAction Coding System Oxford University Press Oxford UK2005

[2] J A Russell J M Fernandez-Dols and G Mandler The Psy-chology of Facial Expression Cambridge University Press Cam-bridge UK 1997

[3] P Ekman and W V Friesen ldquoConstants across cultures in theface and emotionrdquo Journal of Personality and Social Psychologyvol 17 no 2 pp 124ndash129 1971

[4] B Golomb and T Sejnowski ldquoBenefits of machine understand-ing of facial expressionsrdquo in NSF ReportmdashFacial ExpressionUnderstanding pp 55ndash71 1997

[5] M Pantic ldquoFace for ambient interfacerdquo in Ambient Intelligencein Everyday Life vol 3864 of Lecture Notes on Artificial Intelli-gence pp 32ndash66 Springer Berlin Germany 2006

[6] R-L Hsu M Abdel-Mottaleb and A K Jain ldquoFace detectionin color imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 5 pp 696ndash706 2002

[7] H Fang and N Costen ldquoFrom Rank-N to Rank-1 face recog-nition based on motion similarityrdquo in Proceedings of the 20thBritish Machine Vision Conference (BMVC rsquo09) pp 1ndash11 Lon-don UK September 2009

[8] C Cornelis M D Cock and A M Radzikowska ldquoVaguelyquantified rough setsrdquo in Rough Sets Fuzzy Sets Data Miningand Granular Computing 11th International Conference RSFD-GrC 2007 Toronto Canada May 14ndash16 2007 Proceedings vol4482 of Lecture Notes in Computer Science pp 87ndash94 SpringerBerlin Germany 2007

[9] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[10] M Pantic and L UM Rothkrantz ldquoAutomatic analysis of facialexpressions the state of the artrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 22 no 12 pp 1424ndash14452000

[11] B Fasel and J Luettin ldquoAutomatic facial expression analysis asurveyrdquo Pattern Recognition vol 36 no 1 pp 259ndash275 2003

[12] C P Sumathi T Santhanam and M Mahadevi ldquoAutomaticfacial expression analysis a surveyrdquo International Journal ofComputer Science amp Engineering Survey vol 3 no 6 pp 47ndash592012

[13] M T Eskil and K S Benli ldquoFacial expression recognition basedon anatomyrdquo Computer Vision and Image Understanding vol119 pp 1ndash14 2014

[14] P Lucey J F Cohn T Kanade J Saragih Z Ambadar andI Matthews ldquoThe extended Cohn-Kanade dataset (CK+) acomplete dataset for action unit and emotion-specified expres-sionrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPRW rsquo10) pp94ndash101 San Francisco Calif USA June 2010

[15] S W Chew S Lucey P Lucey S Sridharan and J F ConnldquoImproved facial expression recognition via uni-hyperplaneclassificationrdquo in Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR rsquo12) pp 2554ndash2561IEEE June 2012

[16] X Fan and T Tjahjadi ldquoA spatial-temporal framework basedon histogram of gradients and optical flow for facial expressionrecognition in video sequencesrdquoPatternRecognition vol 48 no11 pp 3407ndash3416 2015

[17] S Jain CHu and J K Aggarwal ldquoFacial expression recognitionwith temporal modeling of shapesrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 1642ndash1649 November 2011

[18] C Shan S Gong and P W McOwan ldquoFacial expression rec-ognition based on Local Binary Patterns A ComprehensiveStudyrdquo Image and Vision Computing vol 27 no 6 pp 803ndash8162009

12 Mathematical Problems in Engineering

[19] Z Wang S Wang and Q Ji ldquoCapturing complex spatio-temporal relations among facial muscles for facial expressionrecognitionrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo13) pp 3422ndash3429 IEEE Portland Ore USA June 2013

[20] H Fang N Mac Parthalain A J Aubrey et al ldquoFacial ex-pression recognition in dynamic sequences an integrated ap-proachrdquo Pattern Recognition vol 47 no 3 pp 1271ndash1281 2014

[21] P Scovanner S Ali andM Shah ldquoA 3-dimensional sift descrip-tor and its application to action recognitionrdquo in Proceedingsof the 15th ACM International Conference on Multimedia (MMrsquo07) pp 357ndash360 ACM Augsburg Germany September 2007

[22] G Zhao and M Pietikainen ldquoDynamic texture recognitionusing local binary patterns with an application to facial expres-sionsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 29 no 6 pp 915ndash928 2007

[23] A Klaser M Marszałek and C Schmid ldquoA spatio-temporaldescriptor based on 3D-gradientsrdquo in Proceedings of the 19thBritish Machine Vision Conference (BMVC rsquo08) pp 275ndash281Leeds UK September 2008

[24] G Zhao XHuangMTaini S Z Li andM Pietikainen ldquoFacialexpression recognition from near-infrared videosrdquo Image andVision Computing vol 29 no 9 pp 607ndash619 2011

[25] P Ekman W V Friesen and J C Hager Facial Action CodingSystem A Human Face Salt Lake City Utah USA 2002

[26] M F Valstar and M Pantic ldquoFully automatic recognition ofthe temporal phases of facial actionsrdquo IEEE Transactions onSystems Man and Cybernetics Part B Cybernetics vol 42 no1 pp 28ndash43 2012

[27] Y Zhang and Q Ji ldquoActive and dynamic information fusion forfacial expression understanding from image sequencesrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol27 no 5 pp 699ndash714 2005

[28] M Pantic and I Patras ldquoDynamics of facial expression recog-nition of facial actions and their temporal segments from faceprofile image sequencesrdquo IEEE Transactions on Systems Manand Cybernetics Part B Cybernetics vol 36 no 2 pp 433ndash4492006

[29] Y Tong W Liao and Q Ji ldquoFacial action unit recognitionby exploiting their dynamic and semantic relationshipsrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol29 no 10 pp 1683ndash1699 2007

[30] Y Zhang Q Ji Z Zhu and B Yi ldquoDynamic facial expressionanalysis and synthesis with MPEG-4 facial animation param-etersrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 18 no 10 pp 1383ndash1396 2008

[31] G Donate M S Bartlett J C Hager P Ekman and T JSejnowski ldquoClassifying facial actionsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 21 no 10 pp974ndash989 1999

[32] Y Li S M Mavadati M H Mahoor Y Zhao and Q JildquoMeasuring the intensity of spontaneous facial action units withdynamic Bayesian networkrdquo Pattern Recognition vol 48 no 11pp 3417ndash3427 2015

[33] J J-J Lien T Kanade J F Cohn and C-C Li ldquoDetectiontracking and classification of action units in facial expressionrdquoRobotics and Autonomous Systems vol 31 no 3 pp 131ndash1462000

[34] M F Valstar and M Pantic ldquoCombined support vector ma-chines and hidden Markov models for modeling facial action

temporal dynamicsrdquo Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence and LectureNotes in Bioinformatics) vol 4796 pp 118ndash127 2007

[35] T Danisman I M Bilasco J Martinet and C DjerabaldquoIntelligent pixels of interest selection with application to facialexpression recognition using multilayer perceptronrdquo SignalProcessing vol 93 no 6 pp 1547ndash1556 2013

[36] W-L Chao J-J Ding and J-Z Liu ldquoFacial expression recog-nition based on improved local binary pattern and class-regularized locality preserving projectionrdquo Signal Processingvol 117 pp 1ndash10 2015

[37] I Kotsia and I Pitas ldquoFacial expression recognition in imagesequences using geometric deformation features and supportvector machinesrdquo IEEE Transactions on Image Processing vol16 no 1 pp 172ndash187 2007

[38] P Viola and M J Jones ldquoRobust Real-Time Face DetectionrdquoInternational Journal of Computer Vision vol 57 no 2 pp 137ndash154 2004

[39] A Asthana S Zafeiriou S Cheng andM Pantic ldquoIncrementalface alignment in the wildrdquo in Proceedings of the 27th IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo14) pp 1859ndash1866 IEEE Columbus Ohio USA June 2014

[40] T Ojala M Pietikainen and D Harwood ldquoA comparativestudy of texture measures with classification based on featureddistributionsrdquo Pattern Recognition vol 29 no 1 pp 51ndash59 1996

[41] S-S Liu Y-T Tian and CWan ldquoFacial expression recognitionmethod based on Gabor multi-orientation features fusion andblock histogramrdquo Zidonghua XuebaoActa Automatica Sinicavol 37 no 12 pp 1455ndash1463 2011

[42] C-W Hsu and C-J Lin ldquoA comparison of methods for mul-ticlass support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[43] T Kanade J F Cohn and Y Tian ldquoComprehensive database forfacial expression analysisrdquo in Proceedings of the 4th IEEE Inter-national Conference on Automatic Face and Gesture Recognition(FG rsquo00) pp 46ndash53 Grenoble France March 2000

[44] M Pantic M Valstar R Rademaker and L Maat ldquoWeb-baseddatabase for facial expression analysisrdquo in Proceedings of theIEEE International Conference on Multimedia and Expo (ICMErsquo05) pp 317ndash321 Amsterdam Netherlands July 2005

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Facial Expression Recognition from Video Sequences Based ...downloads.hindawi.com/journals/mpe/2017/7206041.pdf · Facial Expression Recognition from Video Sequences Based on Spatial-Temporal

Mathematical Problems in Engineering 9

Table 4 Classification results of using fusion features on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 42 0 0 0 3 0 933Disgust 0 59 0 0 0 0 100Fear 0 0 22 0 2 1 880Happiness 0 0 0 69 0 0 100Sadness 3 0 1 0 24 0 857Surprise 0 0 2 1 0 80 964Overall 958

Table 5 Comparison results of using fusion feature and LBPTOP on CK+ dataset

Expression Anger Disgust Fear Happiness Sadness Surprise AverageAnger 933 (+22) 0 0 0 67 0Disgust 0 100 (00) 0 0 0 0Fear 0 0 880 (00) 0 8 4Happiness 0 0 0 100 (00) 0 0Sadness 107 0 36 0 867 (+81) 0Surprise 0 0 24 12 0 964 (00)Overall 958 (+13)

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

100

Reco

gniti

on ra

te (

)

Spatial-temporal LBPLBPTOP

Figure 9 Recognition rates of methods using LBPTOP and spatial-temporal motion LBP

Second set of experiments evaluates effectiveness of com-bining spatial-temporal motion LBP with Gabor multiorien-tation fusion histogram Tables 2ndash4 show results obtained byusing spatial-temporal motion LBP Gabor multiorientationfusion histogram and proposed framework with multiclassSVM classifier Results include recognition rate of eachexpression and overall classifications accuracy Each blackdigital represents number of correctly classified subjects ofeach expression category and each dark and bolder digitalimage represents number of misclassified subjects of each

expression category Results show that when using combina-tion of spatial-temporalmotion LBP andGabormultiorienta-tion fusion histogram framework performs better than usingindividual features

Third set of experiments compares performance betweenproposedmethods and approach using LBPTOP [22] Resultsin Table 5 show recognition rate of method using combina-tion of spatial-temporal motion LBP with Gabor multiori-entation fusion histogram (ie proposed framework) wherefigures in parentheses represent difference in accuracy inusing LBPTOP and features proposed in this paper positivefigure shows better accuracy of proposed features Based onTable 5 proposed framework performs better in two expres-sions and same in others Furthermore overall recognitionrate is slightly better when using proposed method There-foremethod fusing spatial-temporalmotion LBPwithGabormultiorientation fusion histogram outperforms LBPTOP

In this set experiment we compared our method withsystems proposed by Eskil and Benli [13] Lucey et al [14]Chew et al [15] Fan and Tjahjadi [16] and Jain et al [17]on CK+ dataset The datasets used in these papers are thesame and identical to the dataset adopted in our studyTherefore results presented in these papers can be directlycompared with our method Methods proposed by Eskiland Benli [13] Lucey et al [14] Chew et al [15] and Fanand Tjahjadi [16] used leave-one-subject-out cross validationmethod and technology proposed by Jain et al [17] usedtenfold cross validation Based on results shown in Table 6proposed framework shows average accuracy of 958 for allsix basic facial expressions performing better than methodsdeveloped by mentioned authors in this paragraph

43 Experimental Results onMMIDataset Sevenmethods in[18ndash20] are used to perform quantitative comparisons with

10 Mathematical Problems in Engineering

Table 6 Comparison of results in proposed framework and sevenstate-of-the-art methods on CK+ dataset

Research Methodology AccuracyEskil and Benli [13] High-polygon wireframe mode 850Lucey et al [14] AAM shape 689

AAM appearance 845Combined 887

Chew et al [15] Uni-hyperplane classification 894Fan and Tjahjadi [16] PHOGTOP and optical flow 909Jain et al [17] Temporal modeling of shapes 919Proposed 958

Table 7 Comparison result of proposed framework and sevensystems on MMI dataset

Research Methodology Accuracy ()Shan et al [18] LBP 4778Wang et al [19] AdaBoost 478

HMM 515ITBN 597

Fang et al [20] Active shape model 6238Active appearance model 6435Gabor + Geometry + SVM 7067

Proposed 7192

our framework on MMI dataset The punishment coefficient119862 and kernel parameters 120574 of SVM for MMI dataset are2097152 and 931 times 10minus10 respectively Datasets used inthese papers are identical to the dataset adapted in ourstudy In this experiment proposed framework performstenfold cross validation Shan et al [18] and Fang et al [20]developed methods by conducting experiments using thesame strategy we employed Methods proposed by Wang etal [19] performed twentyfold cross validation and methodsusing active shape model [20] and active appearance model[20] performed twofold cross validation results are shownin Table 7 Table 7 indicates that performance of proposedmethod onMMI dataset is inferior to CK+ dataset because ofsubtler expressions notable changes in head rotation angleand fewer training data However proposed frameworkstill performs better than the other seven state-of-the-artmethods

44 Experimental Results on Oulu-CASIA VIS Dataset Inthis section we evaluated our framework on Oulu-CASIAVIS dataset In these set experiments proposed frameworkperforms tenfold cross validation The punishment coeffi-cient 119862 and kernel parameters 120574 of SVM for this dataset are1048507 and 933 times 10minus10 respectively First our methodis tested on three subsets of Oulu-CASIA VIS respectivelyThe experimental results are shown in Figure 10 As shownin Figure 10 the recognition rate obtained for the normalillumination condition is highest and the recognition rateobtained for the dark illumination condition is worst Evi-dently the facial images captured in the weak or dark illu-mination are often missing much useful texture information

Table 8 Comparison result of proposed framework and fourmethods on Oulu-CASIA VIS dataset

Research Methodology Accuracy ()Scovanner et al [21] 3D SIFT 5583Zhao and Pietikainen [22] LBPTOP 6813Klaser et al [23] HOG3D 7063Zhao et al [24] AdaLBP 7354Proposed 7437

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

Reco

gniti

on ra

te (

)

DarkWeakNormal

Figure 10 Recognition results of proposed methods on threesubsets of Oulu-CASIA VIS dataset

for facial expression recognition Then four methods in [21ndash24] are used to perform quantitative comparisons with ourframework on the subset obtained in normal illuminationconditions The experimental results are shown in Table 8Table 8 indicates that performance of proposed method onOulu-CASIA VIS dataset is inferior to CK+ dataset but supe-rior to MMI dataset Proposed framework still outperformsthe other four state-of-the-art methods

45 Computational Complexity Evaluation In this sectioncomputational complexity of our method is analyzed inCK+ MMI and Oulu-CASIA VIS dataset Processing time isapproximated to be time needed for preprocessing extractingspatial-temporal motion LBP and Gabor multiorientationfusion histogram and classifying each image frame of videosequence Runtime (measured using computer system clock)is estimated by usingmixed-language programming based onMATLAB andC++ on an Intel (R) Core (TM) i7-6700CPU at340GHz with 8GB RAM running onWindows 10 operatingsystem Average processing time per image is under 280 286and 249ms for CK+ MMI and Oulu-CASIA VIS datasetsrespectively Then we compare computational time of ourmethodwith system using LBPTOP [22] Comparison resultsin Table 9 show that computational time of our method islower than system using LBPTOP in three datasets Resultsshow that our method is more efficient and effective than

Mathematical Problems in Engineering 11

Table 9 Runtime of our method and LBPTOP in CK+ MMI andOulu-CASIA VIS datasets

Research CK+ MMI OuluLBPTOP 349ms 357ms 317msproposed 280ms 286ms 249ms

LBPTOP-based technology for facial expression recognitionin CK+ MMI and Oulu-CASIA VIS datasets

5 Conclusion

This paper presents novel facial expression recognitionframework integrating spatial-temporal motion LBP withGabor multiorientation fusion histogram Framework com-prises preprocessing (face recognition and points locationand tracking) dynamic and static feature extraction andfeature classification outperforming seven state-of-the-artmethods on CK+ dataset seven other methods on MMIdataset and four methods on Oulu-CASIA VIS datasetExpressions of disgust and happiness are easier to classifythan other expressions demonstrating better performance ofproposed method However recognition rates are lower forfear (880) and sadness (867) on CK+ dataset comparedwith others because most anger expressions are misclassifiedas sadness and vice-versa A lamination of proposed methodis that the occlusions head rotations and illumination canreduce accuracy of expression recognition these phenomenawill be solved in our future work A framework integratingother useful information (pupil head and body movements)with the features extracted in this paper could perform betterThis issue will also be studied in our future work Theproposed method cannot achieve real-time facial expressionrecognition We will attempt to establish an efficient dimen-sionality reduction method to reduce the computationalcomplexity of our framework in the future Furthermoreproposed system will be improved and applied in study ofhuman drowsiness expression analysis in our further work

Competing Interests

The authors declare no conflict of interests

Acknowledgments

This work was supported by the Open Foundation of StateKey Laboratory of Automotive Simulation and Control(China Grant no 20161105)

References

[1] P Ekman and E L Rosenberg What the Face Reveals Basicand Applied Studies of Spontaneous Expression Using the FacialAction Coding System Oxford University Press Oxford UK2005

[2] J A Russell J M Fernandez-Dols and G Mandler The Psy-chology of Facial Expression Cambridge University Press Cam-bridge UK 1997

[3] P Ekman and W V Friesen ldquoConstants across cultures in theface and emotionrdquo Journal of Personality and Social Psychologyvol 17 no 2 pp 124ndash129 1971

[4] B Golomb and T Sejnowski ldquoBenefits of machine understand-ing of facial expressionsrdquo in NSF ReportmdashFacial ExpressionUnderstanding pp 55ndash71 1997

[5] M Pantic ldquoFace for ambient interfacerdquo in Ambient Intelligencein Everyday Life vol 3864 of Lecture Notes on Artificial Intelli-gence pp 32ndash66 Springer Berlin Germany 2006

[6] R-L Hsu M Abdel-Mottaleb and A K Jain ldquoFace detectionin color imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 5 pp 696ndash706 2002

[7] H Fang and N Costen ldquoFrom Rank-N to Rank-1 face recog-nition based on motion similarityrdquo in Proceedings of the 20thBritish Machine Vision Conference (BMVC rsquo09) pp 1ndash11 Lon-don UK September 2009

[8] C Cornelis M D Cock and A M Radzikowska ldquoVaguelyquantified rough setsrdquo in Rough Sets Fuzzy Sets Data Miningand Granular Computing 11th International Conference RSFD-GrC 2007 Toronto Canada May 14ndash16 2007 Proceedings vol4482 of Lecture Notes in Computer Science pp 87ndash94 SpringerBerlin Germany 2007

[9] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[10] M Pantic and L UM Rothkrantz ldquoAutomatic analysis of facialexpressions the state of the artrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 22 no 12 pp 1424ndash14452000

[11] B Fasel and J Luettin ldquoAutomatic facial expression analysis asurveyrdquo Pattern Recognition vol 36 no 1 pp 259ndash275 2003

[12] C P Sumathi T Santhanam and M Mahadevi ldquoAutomaticfacial expression analysis a surveyrdquo International Journal ofComputer Science amp Engineering Survey vol 3 no 6 pp 47ndash592012

[13] M T Eskil and K S Benli ldquoFacial expression recognition basedon anatomyrdquo Computer Vision and Image Understanding vol119 pp 1ndash14 2014

[14] P Lucey J F Cohn T Kanade J Saragih Z Ambadar andI Matthews ldquoThe extended Cohn-Kanade dataset (CK+) acomplete dataset for action unit and emotion-specified expres-sionrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPRW rsquo10) pp94ndash101 San Francisco Calif USA June 2010

[15] S W Chew S Lucey P Lucey S Sridharan and J F ConnldquoImproved facial expression recognition via uni-hyperplaneclassificationrdquo in Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR rsquo12) pp 2554ndash2561IEEE June 2012

[16] X Fan and T Tjahjadi ldquoA spatial-temporal framework basedon histogram of gradients and optical flow for facial expressionrecognition in video sequencesrdquoPatternRecognition vol 48 no11 pp 3407ndash3416 2015

[17] S Jain CHu and J K Aggarwal ldquoFacial expression recognitionwith temporal modeling of shapesrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 1642ndash1649 November 2011

[18] C Shan S Gong and P W McOwan ldquoFacial expression rec-ognition based on Local Binary Patterns A ComprehensiveStudyrdquo Image and Vision Computing vol 27 no 6 pp 803ndash8162009

12 Mathematical Problems in Engineering

[19] Z Wang S Wang and Q Ji ldquoCapturing complex spatio-temporal relations among facial muscles for facial expressionrecognitionrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo13) pp 3422ndash3429 IEEE Portland Ore USA June 2013

[20] H Fang N Mac Parthalain A J Aubrey et al ldquoFacial ex-pression recognition in dynamic sequences an integrated ap-proachrdquo Pattern Recognition vol 47 no 3 pp 1271ndash1281 2014

[21] P Scovanner S Ali andM Shah ldquoA 3-dimensional sift descrip-tor and its application to action recognitionrdquo in Proceedingsof the 15th ACM International Conference on Multimedia (MMrsquo07) pp 357ndash360 ACM Augsburg Germany September 2007

[22] G Zhao and M Pietikainen ldquoDynamic texture recognitionusing local binary patterns with an application to facial expres-sionsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 29 no 6 pp 915ndash928 2007

[23] A Klaser M Marszałek and C Schmid ldquoA spatio-temporaldescriptor based on 3D-gradientsrdquo in Proceedings of the 19thBritish Machine Vision Conference (BMVC rsquo08) pp 275ndash281Leeds UK September 2008

[24] G Zhao XHuangMTaini S Z Li andM Pietikainen ldquoFacialexpression recognition from near-infrared videosrdquo Image andVision Computing vol 29 no 9 pp 607ndash619 2011

[25] P Ekman W V Friesen and J C Hager Facial Action CodingSystem A Human Face Salt Lake City Utah USA 2002

[26] M F Valstar and M Pantic ldquoFully automatic recognition ofthe temporal phases of facial actionsrdquo IEEE Transactions onSystems Man and Cybernetics Part B Cybernetics vol 42 no1 pp 28ndash43 2012

[27] Y Zhang and Q Ji ldquoActive and dynamic information fusion forfacial expression understanding from image sequencesrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol27 no 5 pp 699ndash714 2005

[28] M Pantic and I Patras ldquoDynamics of facial expression recog-nition of facial actions and their temporal segments from faceprofile image sequencesrdquo IEEE Transactions on Systems Manand Cybernetics Part B Cybernetics vol 36 no 2 pp 433ndash4492006

[29] Y Tong W Liao and Q Ji ldquoFacial action unit recognitionby exploiting their dynamic and semantic relationshipsrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol29 no 10 pp 1683ndash1699 2007

[30] Y Zhang Q Ji Z Zhu and B Yi ldquoDynamic facial expressionanalysis and synthesis with MPEG-4 facial animation param-etersrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 18 no 10 pp 1383ndash1396 2008

[31] G Donate M S Bartlett J C Hager P Ekman and T JSejnowski ldquoClassifying facial actionsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 21 no 10 pp974ndash989 1999

[32] Y Li S M Mavadati M H Mahoor Y Zhao and Q JildquoMeasuring the intensity of spontaneous facial action units withdynamic Bayesian networkrdquo Pattern Recognition vol 48 no 11pp 3417ndash3427 2015

[33] J J-J Lien T Kanade J F Cohn and C-C Li ldquoDetectiontracking and classification of action units in facial expressionrdquoRobotics and Autonomous Systems vol 31 no 3 pp 131ndash1462000

[34] M F Valstar and M Pantic ldquoCombined support vector ma-chines and hidden Markov models for modeling facial action

temporal dynamicsrdquo Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence and LectureNotes in Bioinformatics) vol 4796 pp 118ndash127 2007

[35] T Danisman I M Bilasco J Martinet and C DjerabaldquoIntelligent pixels of interest selection with application to facialexpression recognition using multilayer perceptronrdquo SignalProcessing vol 93 no 6 pp 1547ndash1556 2013

[36] W-L Chao J-J Ding and J-Z Liu ldquoFacial expression recog-nition based on improved local binary pattern and class-regularized locality preserving projectionrdquo Signal Processingvol 117 pp 1ndash10 2015

[37] I Kotsia and I Pitas ldquoFacial expression recognition in imagesequences using geometric deformation features and supportvector machinesrdquo IEEE Transactions on Image Processing vol16 no 1 pp 172ndash187 2007

[38] P Viola and M J Jones ldquoRobust Real-Time Face DetectionrdquoInternational Journal of Computer Vision vol 57 no 2 pp 137ndash154 2004

[39] A Asthana S Zafeiriou S Cheng andM Pantic ldquoIncrementalface alignment in the wildrdquo in Proceedings of the 27th IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo14) pp 1859ndash1866 IEEE Columbus Ohio USA June 2014

[40] T Ojala M Pietikainen and D Harwood ldquoA comparativestudy of texture measures with classification based on featureddistributionsrdquo Pattern Recognition vol 29 no 1 pp 51ndash59 1996

[41] S-S Liu Y-T Tian and CWan ldquoFacial expression recognitionmethod based on Gabor multi-orientation features fusion andblock histogramrdquo Zidonghua XuebaoActa Automatica Sinicavol 37 no 12 pp 1455ndash1463 2011

[42] C-W Hsu and C-J Lin ldquoA comparison of methods for mul-ticlass support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[43] T Kanade J F Cohn and Y Tian ldquoComprehensive database forfacial expression analysisrdquo in Proceedings of the 4th IEEE Inter-national Conference on Automatic Face and Gesture Recognition(FG rsquo00) pp 46ndash53 Grenoble France March 2000

[44] M Pantic M Valstar R Rademaker and L Maat ldquoWeb-baseddatabase for facial expression analysisrdquo in Proceedings of theIEEE International Conference on Multimedia and Expo (ICMErsquo05) pp 317ndash321 Amsterdam Netherlands July 2005

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Facial Expression Recognition from Video Sequences Based ...downloads.hindawi.com/journals/mpe/2017/7206041.pdf · Facial Expression Recognition from Video Sequences Based on Spatial-Temporal

10 Mathematical Problems in Engineering

Table 6 Comparison of results in proposed framework and sevenstate-of-the-art methods on CK+ dataset

Research Methodology AccuracyEskil and Benli [13] High-polygon wireframe mode 850Lucey et al [14] AAM shape 689

AAM appearance 845Combined 887

Chew et al [15] Uni-hyperplane classification 894Fan and Tjahjadi [16] PHOGTOP and optical flow 909Jain et al [17] Temporal modeling of shapes 919Proposed 958

Table 7 Comparison result of proposed framework and sevensystems on MMI dataset

Research Methodology Accuracy ()Shan et al [18] LBP 4778Wang et al [19] AdaBoost 478

HMM 515ITBN 597

Fang et al [20] Active shape model 6238Active appearance model 6435Gabor + Geometry + SVM 7067

Proposed 7192

our framework on MMI dataset The punishment coefficient119862 and kernel parameters 120574 of SVM for MMI dataset are2097152 and 931 times 10minus10 respectively Datasets used inthese papers are identical to the dataset adapted in ourstudy In this experiment proposed framework performstenfold cross validation Shan et al [18] and Fang et al [20]developed methods by conducting experiments using thesame strategy we employed Methods proposed by Wang etal [19] performed twentyfold cross validation and methodsusing active shape model [20] and active appearance model[20] performed twofold cross validation results are shownin Table 7 Table 7 indicates that performance of proposedmethod onMMI dataset is inferior to CK+ dataset because ofsubtler expressions notable changes in head rotation angleand fewer training data However proposed frameworkstill performs better than the other seven state-of-the-artmethods

44 Experimental Results on Oulu-CASIA VIS Dataset Inthis section we evaluated our framework on Oulu-CASIAVIS dataset In these set experiments proposed frameworkperforms tenfold cross validation The punishment coeffi-cient 119862 and kernel parameters 120574 of SVM for this dataset are1048507 and 933 times 10minus10 respectively First our methodis tested on three subsets of Oulu-CASIA VIS respectivelyThe experimental results are shown in Figure 10 As shownin Figure 10 the recognition rate obtained for the normalillumination condition is highest and the recognition rateobtained for the dark illumination condition is worst Evi-dently the facial images captured in the weak or dark illu-mination are often missing much useful texture information

Table 8 Comparison result of proposed framework and fourmethods on Oulu-CASIA VIS dataset

Research Methodology Accuracy ()Scovanner et al [21] 3D SIFT 5583Zhao and Pietikainen [22] LBPTOP 6813Klaser et al [23] HOG3D 7063Zhao et al [24] AdaLBP 7354Proposed 7437

Ang

er

Disg

ust

Fear

Hap

pine

ss

Sadn

ess

Surp

rise

Tota

l0

10

20

30

40

50

60

70

80

90

Reco

gniti

on ra

te (

)

DarkWeakNormal

Figure 10 Recognition results of proposed methods on threesubsets of Oulu-CASIA VIS dataset

for facial expression recognition Then four methods in [21ndash24] are used to perform quantitative comparisons with ourframework on the subset obtained in normal illuminationconditions The experimental results are shown in Table 8Table 8 indicates that performance of proposed method onOulu-CASIA VIS dataset is inferior to CK+ dataset but supe-rior to MMI dataset Proposed framework still outperformsthe other four state-of-the-art methods

45 Computational Complexity Evaluation In this sectioncomputational complexity of our method is analyzed inCK+ MMI and Oulu-CASIA VIS dataset Processing time isapproximated to be time needed for preprocessing extractingspatial-temporal motion LBP and Gabor multiorientationfusion histogram and classifying each image frame of videosequence Runtime (measured using computer system clock)is estimated by usingmixed-language programming based onMATLAB andC++ on an Intel (R) Core (TM) i7-6700CPU at340GHz with 8GB RAM running onWindows 10 operatingsystem Average processing time per image is under 280 286and 249ms for CK+ MMI and Oulu-CASIA VIS datasetsrespectively Then we compare computational time of ourmethodwith system using LBPTOP [22] Comparison resultsin Table 9 show that computational time of our method islower than system using LBPTOP in three datasets Resultsshow that our method is more efficient and effective than

Mathematical Problems in Engineering 11

Table 9 Runtime of our method and LBPTOP in CK+ MMI andOulu-CASIA VIS datasets

Research CK+ MMI OuluLBPTOP 349ms 357ms 317msproposed 280ms 286ms 249ms

LBPTOP-based technology for facial expression recognitionin CK+ MMI and Oulu-CASIA VIS datasets

5 Conclusion

This paper presents novel facial expression recognitionframework integrating spatial-temporal motion LBP withGabor multiorientation fusion histogram Framework com-prises preprocessing (face recognition and points locationand tracking) dynamic and static feature extraction andfeature classification outperforming seven state-of-the-artmethods on CK+ dataset seven other methods on MMIdataset and four methods on Oulu-CASIA VIS datasetExpressions of disgust and happiness are easier to classifythan other expressions demonstrating better performance ofproposed method However recognition rates are lower forfear (880) and sadness (867) on CK+ dataset comparedwith others because most anger expressions are misclassifiedas sadness and vice-versa A lamination of proposed methodis that the occlusions head rotations and illumination canreduce accuracy of expression recognition these phenomenawill be solved in our future work A framework integratingother useful information (pupil head and body movements)with the features extracted in this paper could perform betterThis issue will also be studied in our future work Theproposed method cannot achieve real-time facial expressionrecognition We will attempt to establish an efficient dimen-sionality reduction method to reduce the computationalcomplexity of our framework in the future Furthermoreproposed system will be improved and applied in study ofhuman drowsiness expression analysis in our further work

Competing Interests

The authors declare no conflict of interests

Acknowledgments

This work was supported by the Open Foundation of StateKey Laboratory of Automotive Simulation and Control(China Grant no 20161105)

References

[1] P Ekman and E L Rosenberg What the Face Reveals Basicand Applied Studies of Spontaneous Expression Using the FacialAction Coding System Oxford University Press Oxford UK2005

[2] J A Russell J M Fernandez-Dols and G Mandler The Psy-chology of Facial Expression Cambridge University Press Cam-bridge UK 1997

[3] P Ekman and W V Friesen ldquoConstants across cultures in theface and emotionrdquo Journal of Personality and Social Psychologyvol 17 no 2 pp 124ndash129 1971

[4] B Golomb and T Sejnowski ldquoBenefits of machine understand-ing of facial expressionsrdquo in NSF ReportmdashFacial ExpressionUnderstanding pp 55ndash71 1997

[5] M Pantic ldquoFace for ambient interfacerdquo in Ambient Intelligencein Everyday Life vol 3864 of Lecture Notes on Artificial Intelli-gence pp 32ndash66 Springer Berlin Germany 2006

[6] R-L Hsu M Abdel-Mottaleb and A K Jain ldquoFace detectionin color imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 5 pp 696ndash706 2002

[7] H Fang and N Costen ldquoFrom Rank-N to Rank-1 face recog-nition based on motion similarityrdquo in Proceedings of the 20thBritish Machine Vision Conference (BMVC rsquo09) pp 1ndash11 Lon-don UK September 2009

[8] C Cornelis M D Cock and A M Radzikowska ldquoVaguelyquantified rough setsrdquo in Rough Sets Fuzzy Sets Data Miningand Granular Computing 11th International Conference RSFD-GrC 2007 Toronto Canada May 14ndash16 2007 Proceedings vol4482 of Lecture Notes in Computer Science pp 87ndash94 SpringerBerlin Germany 2007

[9] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[10] M Pantic and L UM Rothkrantz ldquoAutomatic analysis of facialexpressions the state of the artrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 22 no 12 pp 1424ndash14452000

[11] B Fasel and J Luettin ldquoAutomatic facial expression analysis asurveyrdquo Pattern Recognition vol 36 no 1 pp 259ndash275 2003

[12] C P Sumathi T Santhanam and M Mahadevi ldquoAutomaticfacial expression analysis a surveyrdquo International Journal ofComputer Science amp Engineering Survey vol 3 no 6 pp 47ndash592012

[13] M T Eskil and K S Benli ldquoFacial expression recognition basedon anatomyrdquo Computer Vision and Image Understanding vol119 pp 1ndash14 2014

[14] P Lucey J F Cohn T Kanade J Saragih Z Ambadar andI Matthews ldquoThe extended Cohn-Kanade dataset (CK+) acomplete dataset for action unit and emotion-specified expres-sionrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPRW rsquo10) pp94ndash101 San Francisco Calif USA June 2010

[15] S W Chew S Lucey P Lucey S Sridharan and J F ConnldquoImproved facial expression recognition via uni-hyperplaneclassificationrdquo in Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR rsquo12) pp 2554ndash2561IEEE June 2012

[16] X Fan and T Tjahjadi ldquoA spatial-temporal framework basedon histogram of gradients and optical flow for facial expressionrecognition in video sequencesrdquoPatternRecognition vol 48 no11 pp 3407ndash3416 2015

[17] S Jain CHu and J K Aggarwal ldquoFacial expression recognitionwith temporal modeling of shapesrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 1642ndash1649 November 2011

[18] C Shan S Gong and P W McOwan ldquoFacial expression rec-ognition based on Local Binary Patterns A ComprehensiveStudyrdquo Image and Vision Computing vol 27 no 6 pp 803ndash8162009

12 Mathematical Problems in Engineering

[19] Z Wang S Wang and Q Ji ldquoCapturing complex spatio-temporal relations among facial muscles for facial expressionrecognitionrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo13) pp 3422ndash3429 IEEE Portland Ore USA June 2013

[20] H Fang N Mac Parthalain A J Aubrey et al ldquoFacial ex-pression recognition in dynamic sequences an integrated ap-proachrdquo Pattern Recognition vol 47 no 3 pp 1271ndash1281 2014

[21] P Scovanner S Ali andM Shah ldquoA 3-dimensional sift descrip-tor and its application to action recognitionrdquo in Proceedingsof the 15th ACM International Conference on Multimedia (MMrsquo07) pp 357ndash360 ACM Augsburg Germany September 2007

[22] G Zhao and M Pietikainen ldquoDynamic texture recognitionusing local binary patterns with an application to facial expres-sionsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 29 no 6 pp 915ndash928 2007

[23] A Klaser M Marszałek and C Schmid ldquoA spatio-temporaldescriptor based on 3D-gradientsrdquo in Proceedings of the 19thBritish Machine Vision Conference (BMVC rsquo08) pp 275ndash281Leeds UK September 2008

[24] G Zhao XHuangMTaini S Z Li andM Pietikainen ldquoFacialexpression recognition from near-infrared videosrdquo Image andVision Computing vol 29 no 9 pp 607ndash619 2011

[25] P Ekman W V Friesen and J C Hager Facial Action CodingSystem A Human Face Salt Lake City Utah USA 2002

[26] M F Valstar and M Pantic ldquoFully automatic recognition ofthe temporal phases of facial actionsrdquo IEEE Transactions onSystems Man and Cybernetics Part B Cybernetics vol 42 no1 pp 28ndash43 2012

[27] Y Zhang and Q Ji ldquoActive and dynamic information fusion forfacial expression understanding from image sequencesrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol27 no 5 pp 699ndash714 2005

[28] M Pantic and I Patras ldquoDynamics of facial expression recog-nition of facial actions and their temporal segments from faceprofile image sequencesrdquo IEEE Transactions on Systems Manand Cybernetics Part B Cybernetics vol 36 no 2 pp 433ndash4492006

[29] Y Tong W Liao and Q Ji ldquoFacial action unit recognitionby exploiting their dynamic and semantic relationshipsrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol29 no 10 pp 1683ndash1699 2007

[30] Y Zhang Q Ji Z Zhu and B Yi ldquoDynamic facial expressionanalysis and synthesis with MPEG-4 facial animation param-etersrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 18 no 10 pp 1383ndash1396 2008

[31] G Donate M S Bartlett J C Hager P Ekman and T JSejnowski ldquoClassifying facial actionsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 21 no 10 pp974ndash989 1999

[32] Y Li S M Mavadati M H Mahoor Y Zhao and Q JildquoMeasuring the intensity of spontaneous facial action units withdynamic Bayesian networkrdquo Pattern Recognition vol 48 no 11pp 3417ndash3427 2015

[33] J J-J Lien T Kanade J F Cohn and C-C Li ldquoDetectiontracking and classification of action units in facial expressionrdquoRobotics and Autonomous Systems vol 31 no 3 pp 131ndash1462000

[34] M F Valstar and M Pantic ldquoCombined support vector ma-chines and hidden Markov models for modeling facial action

temporal dynamicsrdquo Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence and LectureNotes in Bioinformatics) vol 4796 pp 118ndash127 2007

[35] T Danisman I M Bilasco J Martinet and C DjerabaldquoIntelligent pixels of interest selection with application to facialexpression recognition using multilayer perceptronrdquo SignalProcessing vol 93 no 6 pp 1547ndash1556 2013

[36] W-L Chao J-J Ding and J-Z Liu ldquoFacial expression recog-nition based on improved local binary pattern and class-regularized locality preserving projectionrdquo Signal Processingvol 117 pp 1ndash10 2015

[37] I Kotsia and I Pitas ldquoFacial expression recognition in imagesequences using geometric deformation features and supportvector machinesrdquo IEEE Transactions on Image Processing vol16 no 1 pp 172ndash187 2007

[38] P Viola and M J Jones ldquoRobust Real-Time Face DetectionrdquoInternational Journal of Computer Vision vol 57 no 2 pp 137ndash154 2004

[39] A Asthana S Zafeiriou S Cheng andM Pantic ldquoIncrementalface alignment in the wildrdquo in Proceedings of the 27th IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo14) pp 1859ndash1866 IEEE Columbus Ohio USA June 2014

[40] T Ojala M Pietikainen and D Harwood ldquoA comparativestudy of texture measures with classification based on featureddistributionsrdquo Pattern Recognition vol 29 no 1 pp 51ndash59 1996

[41] S-S Liu Y-T Tian and CWan ldquoFacial expression recognitionmethod based on Gabor multi-orientation features fusion andblock histogramrdquo Zidonghua XuebaoActa Automatica Sinicavol 37 no 12 pp 1455ndash1463 2011

[42] C-W Hsu and C-J Lin ldquoA comparison of methods for mul-ticlass support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[43] T Kanade J F Cohn and Y Tian ldquoComprehensive database forfacial expression analysisrdquo in Proceedings of the 4th IEEE Inter-national Conference on Automatic Face and Gesture Recognition(FG rsquo00) pp 46ndash53 Grenoble France March 2000

[44] M Pantic M Valstar R Rademaker and L Maat ldquoWeb-baseddatabase for facial expression analysisrdquo in Proceedings of theIEEE International Conference on Multimedia and Expo (ICMErsquo05) pp 317ndash321 Amsterdam Netherlands July 2005

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Facial Expression Recognition from Video Sequences Based ...downloads.hindawi.com/journals/mpe/2017/7206041.pdf · Facial Expression Recognition from Video Sequences Based on Spatial-Temporal

Mathematical Problems in Engineering 11

Table 9 Runtime of our method and LBPTOP in CK+ MMI andOulu-CASIA VIS datasets

Research CK+ MMI OuluLBPTOP 349ms 357ms 317msproposed 280ms 286ms 249ms

LBPTOP-based technology for facial expression recognitionin CK+ MMI and Oulu-CASIA VIS datasets

5 Conclusion

This paper presents novel facial expression recognitionframework integrating spatial-temporal motion LBP withGabor multiorientation fusion histogram Framework com-prises preprocessing (face recognition and points locationand tracking) dynamic and static feature extraction andfeature classification outperforming seven state-of-the-artmethods on CK+ dataset seven other methods on MMIdataset and four methods on Oulu-CASIA VIS datasetExpressions of disgust and happiness are easier to classifythan other expressions demonstrating better performance ofproposed method However recognition rates are lower forfear (880) and sadness (867) on CK+ dataset comparedwith others because most anger expressions are misclassifiedas sadness and vice-versa A lamination of proposed methodis that the occlusions head rotations and illumination canreduce accuracy of expression recognition these phenomenawill be solved in our future work A framework integratingother useful information (pupil head and body movements)with the features extracted in this paper could perform betterThis issue will also be studied in our future work Theproposed method cannot achieve real-time facial expressionrecognition We will attempt to establish an efficient dimen-sionality reduction method to reduce the computationalcomplexity of our framework in the future Furthermoreproposed system will be improved and applied in study ofhuman drowsiness expression analysis in our further work

Competing Interests

The authors declare no conflict of interests

Acknowledgments

This work was supported by the Open Foundation of StateKey Laboratory of Automotive Simulation and Control(China Grant no 20161105)

References

[1] P Ekman and E L Rosenberg What the Face Reveals Basicand Applied Studies of Spontaneous Expression Using the FacialAction Coding System Oxford University Press Oxford UK2005

[2] J A Russell J M Fernandez-Dols and G Mandler The Psy-chology of Facial Expression Cambridge University Press Cam-bridge UK 1997

[3] P Ekman and W V Friesen ldquoConstants across cultures in theface and emotionrdquo Journal of Personality and Social Psychologyvol 17 no 2 pp 124ndash129 1971

[4] B Golomb and T Sejnowski ldquoBenefits of machine understand-ing of facial expressionsrdquo in NSF ReportmdashFacial ExpressionUnderstanding pp 55ndash71 1997

[5] M Pantic ldquoFace for ambient interfacerdquo in Ambient Intelligencein Everyday Life vol 3864 of Lecture Notes on Artificial Intelli-gence pp 32ndash66 Springer Berlin Germany 2006

[6] R-L Hsu M Abdel-Mottaleb and A K Jain ldquoFace detectionin color imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 5 pp 696ndash706 2002

[7] H Fang and N Costen ldquoFrom Rank-N to Rank-1 face recog-nition based on motion similarityrdquo in Proceedings of the 20thBritish Machine Vision Conference (BMVC rsquo09) pp 1ndash11 Lon-don UK September 2009

[8] C Cornelis M D Cock and A M Radzikowska ldquoVaguelyquantified rough setsrdquo in Rough Sets Fuzzy Sets Data Miningand Granular Computing 11th International Conference RSFD-GrC 2007 Toronto Canada May 14ndash16 2007 Proceedings vol4482 of Lecture Notes in Computer Science pp 87ndash94 SpringerBerlin Germany 2007

[9] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[10] M Pantic and L UM Rothkrantz ldquoAutomatic analysis of facialexpressions the state of the artrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 22 no 12 pp 1424ndash14452000

[11] B Fasel and J Luettin ldquoAutomatic facial expression analysis asurveyrdquo Pattern Recognition vol 36 no 1 pp 259ndash275 2003

[12] C P Sumathi T Santhanam and M Mahadevi ldquoAutomaticfacial expression analysis a surveyrdquo International Journal ofComputer Science amp Engineering Survey vol 3 no 6 pp 47ndash592012

[13] M T Eskil and K S Benli ldquoFacial expression recognition basedon anatomyrdquo Computer Vision and Image Understanding vol119 pp 1ndash14 2014

[14] P Lucey J F Cohn T Kanade J Saragih Z Ambadar andI Matthews ldquoThe extended Cohn-Kanade dataset (CK+) acomplete dataset for action unit and emotion-specified expres-sionrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPRW rsquo10) pp94ndash101 San Francisco Calif USA June 2010

[15] S W Chew S Lucey P Lucey S Sridharan and J F ConnldquoImproved facial expression recognition via uni-hyperplaneclassificationrdquo in Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR rsquo12) pp 2554ndash2561IEEE June 2012

[16] X Fan and T Tjahjadi ldquoA spatial-temporal framework basedon histogram of gradients and optical flow for facial expressionrecognition in video sequencesrdquoPatternRecognition vol 48 no11 pp 3407ndash3416 2015

[17] S Jain CHu and J K Aggarwal ldquoFacial expression recognitionwith temporal modeling of shapesrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 1642ndash1649 November 2011

[18] C Shan S Gong and P W McOwan ldquoFacial expression rec-ognition based on Local Binary Patterns A ComprehensiveStudyrdquo Image and Vision Computing vol 27 no 6 pp 803ndash8162009

12 Mathematical Problems in Engineering

[19] Z Wang S Wang and Q Ji ldquoCapturing complex spatio-temporal relations among facial muscles for facial expressionrecognitionrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo13) pp 3422ndash3429 IEEE Portland Ore USA June 2013

[20] H Fang N Mac Parthalain A J Aubrey et al ldquoFacial ex-pression recognition in dynamic sequences an integrated ap-proachrdquo Pattern Recognition vol 47 no 3 pp 1271ndash1281 2014

[21] P Scovanner S Ali andM Shah ldquoA 3-dimensional sift descrip-tor and its application to action recognitionrdquo in Proceedingsof the 15th ACM International Conference on Multimedia (MMrsquo07) pp 357ndash360 ACM Augsburg Germany September 2007

[22] G Zhao and M Pietikainen ldquoDynamic texture recognitionusing local binary patterns with an application to facial expres-sionsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 29 no 6 pp 915ndash928 2007

[23] A Klaser M Marszałek and C Schmid ldquoA spatio-temporaldescriptor based on 3D-gradientsrdquo in Proceedings of the 19thBritish Machine Vision Conference (BMVC rsquo08) pp 275ndash281Leeds UK September 2008

[24] G Zhao XHuangMTaini S Z Li andM Pietikainen ldquoFacialexpression recognition from near-infrared videosrdquo Image andVision Computing vol 29 no 9 pp 607ndash619 2011

[25] P Ekman W V Friesen and J C Hager Facial Action CodingSystem A Human Face Salt Lake City Utah USA 2002

[26] M F Valstar and M Pantic ldquoFully automatic recognition ofthe temporal phases of facial actionsrdquo IEEE Transactions onSystems Man and Cybernetics Part B Cybernetics vol 42 no1 pp 28ndash43 2012

[27] Y Zhang and Q Ji ldquoActive and dynamic information fusion forfacial expression understanding from image sequencesrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol27 no 5 pp 699ndash714 2005

[28] M Pantic and I Patras ldquoDynamics of facial expression recog-nition of facial actions and their temporal segments from faceprofile image sequencesrdquo IEEE Transactions on Systems Manand Cybernetics Part B Cybernetics vol 36 no 2 pp 433ndash4492006

[29] Y Tong W Liao and Q Ji ldquoFacial action unit recognitionby exploiting their dynamic and semantic relationshipsrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol29 no 10 pp 1683ndash1699 2007

[30] Y Zhang Q Ji Z Zhu and B Yi ldquoDynamic facial expressionanalysis and synthesis with MPEG-4 facial animation param-etersrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 18 no 10 pp 1383ndash1396 2008

[31] G Donate M S Bartlett J C Hager P Ekman and T JSejnowski ldquoClassifying facial actionsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 21 no 10 pp974ndash989 1999

[32] Y Li S M Mavadati M H Mahoor Y Zhao and Q JildquoMeasuring the intensity of spontaneous facial action units withdynamic Bayesian networkrdquo Pattern Recognition vol 48 no 11pp 3417ndash3427 2015

[33] J J-J Lien T Kanade J F Cohn and C-C Li ldquoDetectiontracking and classification of action units in facial expressionrdquoRobotics and Autonomous Systems vol 31 no 3 pp 131ndash1462000

[34] M F Valstar and M Pantic ldquoCombined support vector ma-chines and hidden Markov models for modeling facial action

temporal dynamicsrdquo Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence and LectureNotes in Bioinformatics) vol 4796 pp 118ndash127 2007

[35] T Danisman I M Bilasco J Martinet and C DjerabaldquoIntelligent pixels of interest selection with application to facialexpression recognition using multilayer perceptronrdquo SignalProcessing vol 93 no 6 pp 1547ndash1556 2013

[36] W-L Chao J-J Ding and J-Z Liu ldquoFacial expression recog-nition based on improved local binary pattern and class-regularized locality preserving projectionrdquo Signal Processingvol 117 pp 1ndash10 2015

[37] I Kotsia and I Pitas ldquoFacial expression recognition in imagesequences using geometric deformation features and supportvector machinesrdquo IEEE Transactions on Image Processing vol16 no 1 pp 172ndash187 2007

[38] P Viola and M J Jones ldquoRobust Real-Time Face DetectionrdquoInternational Journal of Computer Vision vol 57 no 2 pp 137ndash154 2004

[39] A Asthana S Zafeiriou S Cheng andM Pantic ldquoIncrementalface alignment in the wildrdquo in Proceedings of the 27th IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo14) pp 1859ndash1866 IEEE Columbus Ohio USA June 2014

[40] T Ojala M Pietikainen and D Harwood ldquoA comparativestudy of texture measures with classification based on featureddistributionsrdquo Pattern Recognition vol 29 no 1 pp 51ndash59 1996

[41] S-S Liu Y-T Tian and CWan ldquoFacial expression recognitionmethod based on Gabor multi-orientation features fusion andblock histogramrdquo Zidonghua XuebaoActa Automatica Sinicavol 37 no 12 pp 1455ndash1463 2011

[42] C-W Hsu and C-J Lin ldquoA comparison of methods for mul-ticlass support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[43] T Kanade J F Cohn and Y Tian ldquoComprehensive database forfacial expression analysisrdquo in Proceedings of the 4th IEEE Inter-national Conference on Automatic Face and Gesture Recognition(FG rsquo00) pp 46ndash53 Grenoble France March 2000

[44] M Pantic M Valstar R Rademaker and L Maat ldquoWeb-baseddatabase for facial expression analysisrdquo in Proceedings of theIEEE International Conference on Multimedia and Expo (ICMErsquo05) pp 317ndash321 Amsterdam Netherlands July 2005

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Facial Expression Recognition from Video Sequences Based ...downloads.hindawi.com/journals/mpe/2017/7206041.pdf · Facial Expression Recognition from Video Sequences Based on Spatial-Temporal

12 Mathematical Problems in Engineering

[19] Z Wang S Wang and Q Ji ldquoCapturing complex spatio-temporal relations among facial muscles for facial expressionrecognitionrdquo in Proceedings of the 26th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo13) pp 3422ndash3429 IEEE Portland Ore USA June 2013

[20] H Fang N Mac Parthalain A J Aubrey et al ldquoFacial ex-pression recognition in dynamic sequences an integrated ap-proachrdquo Pattern Recognition vol 47 no 3 pp 1271ndash1281 2014

[21] P Scovanner S Ali andM Shah ldquoA 3-dimensional sift descrip-tor and its application to action recognitionrdquo in Proceedingsof the 15th ACM International Conference on Multimedia (MMrsquo07) pp 357ndash360 ACM Augsburg Germany September 2007

[22] G Zhao and M Pietikainen ldquoDynamic texture recognitionusing local binary patterns with an application to facial expres-sionsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 29 no 6 pp 915ndash928 2007

[23] A Klaser M Marszałek and C Schmid ldquoA spatio-temporaldescriptor based on 3D-gradientsrdquo in Proceedings of the 19thBritish Machine Vision Conference (BMVC rsquo08) pp 275ndash281Leeds UK September 2008

[24] G Zhao XHuangMTaini S Z Li andM Pietikainen ldquoFacialexpression recognition from near-infrared videosrdquo Image andVision Computing vol 29 no 9 pp 607ndash619 2011

[25] P Ekman W V Friesen and J C Hager Facial Action CodingSystem A Human Face Salt Lake City Utah USA 2002

[26] M F Valstar and M Pantic ldquoFully automatic recognition ofthe temporal phases of facial actionsrdquo IEEE Transactions onSystems Man and Cybernetics Part B Cybernetics vol 42 no1 pp 28ndash43 2012

[27] Y Zhang and Q Ji ldquoActive and dynamic information fusion forfacial expression understanding from image sequencesrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol27 no 5 pp 699ndash714 2005

[28] M Pantic and I Patras ldquoDynamics of facial expression recog-nition of facial actions and their temporal segments from faceprofile image sequencesrdquo IEEE Transactions on Systems Manand Cybernetics Part B Cybernetics vol 36 no 2 pp 433ndash4492006

[29] Y Tong W Liao and Q Ji ldquoFacial action unit recognitionby exploiting their dynamic and semantic relationshipsrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol29 no 10 pp 1683ndash1699 2007

[30] Y Zhang Q Ji Z Zhu and B Yi ldquoDynamic facial expressionanalysis and synthesis with MPEG-4 facial animation param-etersrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 18 no 10 pp 1383ndash1396 2008

[31] G Donate M S Bartlett J C Hager P Ekman and T JSejnowski ldquoClassifying facial actionsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 21 no 10 pp974ndash989 1999

[32] Y Li S M Mavadati M H Mahoor Y Zhao and Q JildquoMeasuring the intensity of spontaneous facial action units withdynamic Bayesian networkrdquo Pattern Recognition vol 48 no 11pp 3417ndash3427 2015

[33] J J-J Lien T Kanade J F Cohn and C-C Li ldquoDetectiontracking and classification of action units in facial expressionrdquoRobotics and Autonomous Systems vol 31 no 3 pp 131ndash1462000

[34] M F Valstar and M Pantic ldquoCombined support vector ma-chines and hidden Markov models for modeling facial action

temporal dynamicsrdquo Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence and LectureNotes in Bioinformatics) vol 4796 pp 118ndash127 2007

[35] T Danisman I M Bilasco J Martinet and C DjerabaldquoIntelligent pixels of interest selection with application to facialexpression recognition using multilayer perceptronrdquo SignalProcessing vol 93 no 6 pp 1547ndash1556 2013

[36] W-L Chao J-J Ding and J-Z Liu ldquoFacial expression recog-nition based on improved local binary pattern and class-regularized locality preserving projectionrdquo Signal Processingvol 117 pp 1ndash10 2015

[37] I Kotsia and I Pitas ldquoFacial expression recognition in imagesequences using geometric deformation features and supportvector machinesrdquo IEEE Transactions on Image Processing vol16 no 1 pp 172ndash187 2007

[38] P Viola and M J Jones ldquoRobust Real-Time Face DetectionrdquoInternational Journal of Computer Vision vol 57 no 2 pp 137ndash154 2004

[39] A Asthana S Zafeiriou S Cheng andM Pantic ldquoIncrementalface alignment in the wildrdquo in Proceedings of the 27th IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo14) pp 1859ndash1866 IEEE Columbus Ohio USA June 2014

[40] T Ojala M Pietikainen and D Harwood ldquoA comparativestudy of texture measures with classification based on featureddistributionsrdquo Pattern Recognition vol 29 no 1 pp 51ndash59 1996

[41] S-S Liu Y-T Tian and CWan ldquoFacial expression recognitionmethod based on Gabor multi-orientation features fusion andblock histogramrdquo Zidonghua XuebaoActa Automatica Sinicavol 37 no 12 pp 1455ndash1463 2011

[42] C-W Hsu and C-J Lin ldquoA comparison of methods for mul-ticlass support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[43] T Kanade J F Cohn and Y Tian ldquoComprehensive database forfacial expression analysisrdquo in Proceedings of the 4th IEEE Inter-national Conference on Automatic Face and Gesture Recognition(FG rsquo00) pp 46ndash53 Grenoble France March 2000

[44] M Pantic M Valstar R Rademaker and L Maat ldquoWeb-baseddatabase for facial expression analysisrdquo in Proceedings of theIEEE International Conference on Multimedia and Expo (ICMErsquo05) pp 317ndash321 Amsterdam Netherlands July 2005

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Facial Expression Recognition from Video Sequences Based ...downloads.hindawi.com/journals/mpe/2017/7206041.pdf · Facial Expression Recognition from Video Sequences Based on Spatial-Temporal

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of