Challenges in facial expression trackingApproach to Facial Actions Temporal Modelling" 2013. • S....

Challenges in facial expression trackingChallenges in facial expression tracking

Gábor SzirtesRealeyes OÜ

www.realeyesit.com

Roadmap

� Why faces?

� Natural inspirations

� Processing pipeline

� Detection� Detection

� Shape alignment

� Analysis/synthesis: expression analysis

� Where to go

� References

Why faces?

� Unique ID: biometrics for verification/identification

� Non-verbal communication channel: HCI, monitoring, controlmonitoring, control

� detection, alignment, modeling, relighting, head pose tracking, gender/age recognition

� Affective science Realeyes!

Faces

vs

Medical imaging

http://www.made4ll.com/medical-equipment/normal-brain-ct-scan/

http://cmp.felk.cvut.cz/eccv2004/tutorials/eccv2004-t4.htm

Face recognition by humans

Face recognition by humans

Pinha et al 2006

...we are not there yet

…but on the right track:

Manifold hypothesis“Natural data lives in a low-dimensional (non-linear) manifold”

Deep Learning Tutorial at ICML 2013 by Yann LeCun and Marc'Aurelio Ranzato

Manifold hypothesis


Back to the engineers The usual pipeline

State of the art

Face detection techniquesIndustrial standard:

Viola-Jones detector

� Integral image� Haar-like features� AdaBoost� Appearance based� Slow to train, fast to evaluate� Needs to search over scale and

space� Still one of the best methods!

State of the artdetection/learning/feature selection

Zhang, Cha, and Zhengyou Zhang. A survey of recent advances in face detection. Tech. rep., Microsoft Research, 2010.

State of the artDetection - where were we ~2011

X. Zhu, 2012

A simple, yet efficient method: ASEF

(Bolme et al, 2011)

Liao & Jain - unconstrained face detection

NPD- normalized pixel differenceFeature:Learning method & feature selection: boosted regression

tree cascades

Liao, Shengcai, Anil K. Jain, and Stan Z. Li. "Unconstrained Face Detection", Tech. report, 2012

Liao & Jain - unconstrained face detection

pittpatt: http://www.ri.cmu.edu/person.html?type=publications&person_id=270

measurements were made on four subsets of 100 images from FDDB

Zhu & Ramanan - joint detection, localization, pose estimation

• Model topological changes with view-specificmixture of tree structured elastic models (simple

way to capture 3D!)• Branches appear and disappear.• Branches appear and disappear.• Capture self-occlusion due to viewpoint

changes

X. Zhu, 2012

Zhu & Ramanan - joint detection, localization, pose estimation

ModelModel

I : Image, L : locations of parts, m : mixture, prior of mixture

Inference

X. Zhu, 2012

Zhu & Ramanan

X. Zhu, 2012

Explaining deformations by statistical models (registration)

• Helps parameterize the deformation in the appearance space (e.g. gives the correspondence between parts of 2 images)

• The goal is to have a compact set of • The goal is to have a compact set of parameters to characterize pose, expression,…

• Model should be robust, generalize well and efficient

Cootes et al 1995

Appearance Models

�Shape models represent shape variation

�Eigen-models can represent texture variation

�Combined appearance models represent both

�Generative model�Generative model

� general

� specific

� compact

Building Appearance Models

�For each example extract shape vector

Shape, x = (x1,y1, … , xn, yn)T

�Build statistical shape model, ssbPxx +=


�For each example, extract texture vector


Texture, gWarp to

mean

shape

�For each example, extract texture vector



�Normalise vectors

�Build eigen-model

Texture, gWarp to

mean

shape

ggbPgg +=

Combined Models

�Shape and texture often correlated� When smile, shadows change (texture) and shape changes

� Learning this correlation leads to more compact (and specific) model(and specific) model

Combined Appearance Models

cQ

WQQc

b

Wbb

==

=

g

s

g

s

c

ggbPgg +=ssbPxx += ggbPgg +=ssbPxx +=

cQgg

cQxx

g

s

+=+= Varying c changes both shape and

texture

Image interpretation

• New image is explained by a synthetic image generated from the learnt model

• It works by adjusting the model parameters and assumes good initializationand assumes good initialization

• Recent flavors differ in 1, coupling shape and texture, 2, texture representation and 3, parameter optimization

Cootes et al 1998

Constrained Local Model (Saragih 2011)

• Independent response maps over local patches

• Regularized landmark mean-shift for parameter updates

CLM continued

� Fast convergence, more robust and flexible than AAM

� Still assumes close enough initialization� Strong impact of the learned mean shape� Cannot handle occlusion

Facial expressions

� Physiology� Contraction of facial muscles

� Results in deformed facial features (eye brows, lips, skin texture, wrinkles etc)

�� Measurement

� Location, intensity, dynamics

� Temporal evolution: onset, apex, offset

� Inter-personal variations

� Difficult to define absolute intensities

� Compare to the person's 'neutral' face

Challenges

� View or pose variation� Translation, scale, in-plane rotation

� Out-of-plane rotation (difficult)

� Uncontrolled environment� Background clutter� Background clutter

� Partial occlusions

� Lighting conditions, illumination changes, shadow

� Quality: image noise, compression artifacts etc

� Face variability� Age, gender, race

� Facial hair, make-up, glasses

� Posed vs. spontaneous expressions

40. dia

6 Erik Murphy-Chutorian and Mohan Manubhai Trivedi, Head Pose Estimation in Computer Vision: A Survey, IEEE Transactions on Pattern Analysisand Machine Intelligence, Volume 31 Issue 4, April 2009, Pages 607-626Ákos Utasi; Challenges

Affect Modelling

Evolution of affect modelling� Categorical: complex state = single label

� Dimensional [Russell, 1980]: 2D, 3D ... 5D space

� Appraisal-based approach: componential model

� Emotion = evaluation of internal state and the state of the outside world

42. dia

14 Source: www.psychology.nottingham.ac.uk/staff/rwnÁkos Utasi;

15 Source: Eduardo M. Eisman, Víctor López, Juan Luis Castro, Controlling the emotional state of an embodied conversationalagent with a dynamic probabilistic fuzzy rules based system. Expert Systems with Applications, Volume 36, Issue 6, August 2009, Pages 9698–9708Ákos Utasi;

Geometric Features

43. dia

12 Source: YingLi Tian, Shizhi Chen, Understanding Effects of Image Resolution for Facial Expression Analysis, Journal of Computer Vision and Image ProcessingÁkos Utasi;

13 Source: YingLi Tian, Shizhi Chen, Understanding Effects of Image Resolution for Facial Expression Analysis, Journal of Computer Vision and Image ProcessingÁkos Utasi;

Texture Descriptors• Popular descriptors

o Histograms of Oriented Gradients (HOG)o Local Binary Patterns (LBP) + variantso Gabor wavelets, Gabor jets, etc

• Local Phase Quantization [Ojansivu, 2008]• Local Phase Quantization [Ojansivu, 2008]o Robust to image blurringo Motivation:

image blur

DFT

h - Gaussian blur H - Gaussian

observation

Spatial Deformation Features

45. dia

4 Source: Beat Fasel, Juergen Luettin, Automatic Facial Expression Analysis: A Survey, 2003Ákos Utasi;

Optical Flow Motion Features

46. dia

19 Source: Beat Fasel, Juergen Luettin, Automatic Facial Expression Analysis: A Survey, 2003Ákos Utasi;

Action Units

47. dia

9 Source: B. Jiang, M. Valstar, B. Martinez, and M. Pantic, "A Dynamic Appearance Descriptor Approach to Facial Actions Temporal Modelling," IEEE Transactions of Systems, Man and Cybernetics — Part B, 2013.Ákos Utasi;

10 Source: Vinay Bettadapura, Face Expression Recognition and Analysis: The State of the Art, Technical ReportÁkos Utasi;

11 Source: Vinay Bettadapura, Face Expression Recognition and Analysis: The State of the Art, Technical ReportÁkos Utasi;

Action Unit - FACS coding

AU 27 - Mouth Stretch

AU 9 - Nose winkler

48. dia

16 Source: http://www.scholarpedia.org/article/Facial_expression_analysisÁkos Utasi;

17 Source: http://www.cs.cmu.edu/~face/facs.htmÁkos Utasi; Image Resolution vs Performance

240x160 120x80

Performance of Human Observers!

60x40 30x20

15x10

Du,Martinez 2011

49. dia

3 Source: Shichuan Du, Aleix M. Martinez, The resolution of facial expressions of emotion, Journal of Vision (2011) 11(13):24, 1–13Ákos Utasi;

Recent Advances

• Pyramid Feature Representationo Encodes both the local texture and the global shapeo More robust to shifts

Khana et al 2013

50. dia

1 Source: Rizwan Ahmed Khana, Alexandre Meyera, Hubert Konika, Saïda Bouakaza, Framework for reliable, real-time facial expression recognition for low resolution images, Journal Pattern Recognition Letters, Volume 34 Issue 10, July, 2013, Pages 1159-1168Ákos Utasi; Recent Advances

• Three Orthogonal Planes (TOP) descriptoro Extend descriptors to the temporal domaino Extract features at orthogonal XY, XT, YT planes

independently

Jiang et al, 2013

Recent Advances

• Free-form Deformation (FFD)o Advantage: sensitive to subtle motion

Apply FFD

Note the squinting of the eyes (AU6).

t_{start}

t_{end}

FFD

-

MHI

Note the squinting of the eyes (AU6).Traditional Motion History Image (MHI) cannot capture it.

Where to go

• Learned features• Variable feature length and type• Learned metrics for matching• Dynamic model switching• Dynamic model switching• Extension with other body parts• Discovery of additional expressions or states

References• S. M. Crouzet and S. J. Thorpe “Low-level cues and ultra-fast face detection” 2011

• P. Sinha et al “Face Recognition by Humans: 19 Results all Computer Vision Researchers Should Know About“ 2006

• P Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features” 2001

• Bolme et al “Average of Synthetic Exact Filters” 2009

• X. Zhu and D. Ramanan “Face Detection, Pose Estimation, and Landmark Localization in the • X. Zhu and D. Ramanan “Face Detection, Pose Estimation, and Landmark Localization in the Wild” 2012

• T.F. Cootes and C.J. Taylor and D.H. Cooper and J. Graham "Active shape models - their training and application“ 1995

• G. J. Edwards and C.J. Taylor and T.F. Cootes. "Interpreting face images using active appearance models“ 1998

• J. M. Saragih and S. Lucey and J. F. Cohn "Deformable Model Fitting by Regularized Landmark Mean-Shift" 2011

• X. Xiong and F. De la Torre “Supervised Descent Method and its Applications to Face Alignment” 2013

References cont’d• Cl C. Chibelushi and F. Bourel, "Facial Expression Recognition: A Brief Tutorial Overview",

2002

• B. Jiang and M. Valstar and B. Martinez and M. Pantic, "A Dynamic Appearance Descriptor Approach to Facial Actions Temporal Modelling" 2013.

• S. Du, Al. M. Martinez, “The resolution of facial expressions of emotion”, 2011

• R A Khana and A Meyera and H Konika and S Bouakaza “Framework for reliable, real-time facial expression recognition for low resolution images” 2013

•• V Ojansivu and J Heikkilä: “Blur Insensitive Texture Classification Using Local Phase Quantization” 2008

Interesting places

• http://www.face-rec.org/databases/

• http://www.scholarpedia.org/article/Facial_expression_analysis

• http://www.cs.cmu.edu/~face/facs.htm Thank you for your

attention!attention!

You can reach me at [email protected]

Face detection

• Challengeso Multiview, occlusion, large variation of faces,

illumination, context, etc.

• State of the art

• Industry standard: Viola Jones and variants• Industry standard: Viola Jones and variants

• Featureso Haar waveletso SURF, HOG, LBP, NPD

• Learning methodso boosting: adaboost, floatboost, ...o logistic regression, one-class svm, multi-class

svmo mlp

• Deep Architectures

Appearance based face detection

Uses statistical analysis and machine learning techniques to learn the “characteristics” of a face from a large set of images

• PCA, LDA

• Support Vector Machines• Support Vector Machines

• Neural Networks

• Hidden Markov Models

• Adaboost

Most successful approach, fast and robust• Detection rates 80-90% at a false positive rate of 10%

Needs to search over scale and space and requires large set of training examples

Industry StandardViola Jones

FeaturesCascade structure variants

C. Huang, H. Ai, Y. Li, and S. Lao, "High -performance Rotation Invariant Multi-View Face Dete ction ", IEEE Trans. on PAMI, 29(4), 2007.

Learning methodsReal boost

RealBoost

Multiclass

Learning methodsFloat boost

FloatBoost

Multiclass

FeaturesHaar features

Rectangular features with flexible sizes and distances introduced.

The rotated integral image/summed area table.

Rectangular features with flexible sizes and distances introduced.

FeaturesLBP

At each iteration, the LBP representation LBPx is computed from the subwindow and fed to the SVM classifier to determine whether it is a face or not. The classifier decides on the ”faceness” of the subwindow according to the sign of the following function:

F(LBPx) = Sgn(sum αi yi K(LBPx, LBPti ) + b)i=1

where LBPti is the LBP representation of the training sample ti , yi is 1 or -1 depending on whether ti is a positive or negative sample (face or nonface), l is the number of samples, b is a

A Discriminative Feature Space for Detecting and Recognizing Faces CVPR2004, Abdenour Hadid, Matti Pietikainen

sample (face or nonface), l is the number of samples, b is a scaler (bias), and K(., .) a second degree polynomial kernel function.

Surf cascade

Features: SURF– 2x2 cell of patch, each cell is 8-dim vector

• Sum of dx, |dx| when dy >=0

• Sum of dx, |dx| when dy <0

• Sum of dy, |dy| when dx >=0• Sum of dy, |dy| when dx >=0

• Sum of dy, |dy| when dx <0

– Total is 2x2x8 = 32 dim feature vector, 8-channel integral images

Feature Pool– In a 40x40 face detection template slide the patch (x, y, w, h) with fixed step =

4 pixels. Each cell at least 8x8 pixels, w or h at least 16 pixels with 1:1, 1:2, 2:3... aspect-ratio (w/h), totally 396 local SURF patches

Weak classifier: logistic regression on 32dim SURF– h(x) = P(y|x, w) = 1/(1+exp(-ywx).

Surf cascade on FDDBHuang & al. Multiview Face

Detection - vector boosting

Huang & al. Multiview Face Detection sparse granular features + WFS tree


Huang & al. Multiview Face Detectionsparse granular features + WFS tree


The first four sparse granular features in the root node learned by the Vector Boosting algorithm and heuristic search method. They are illustrated as linear features: white blocks indicate positive granules while black ones denote negative granules. As each granule is, in fact, the average of pixels in the corresponding patch, granules of different sizes are drawn in different gray levels.

ROC curves on CMU rotate testing set (223 frontal faces with different RIP angles in 50 images).

Deep learning


Deep learningMainstream object Recognition pipeline 2006 - 2012


Deep learning Convolutional Neural Nets


Deep learning Convolutional Neural Nets


Classical Pattern Recognition Pipeline

76. dia

22 Source: Claude C. Chibelushi, Fabrice Bourel, "Facial Expression Recognition: A Brief Tutorial Overview", 20022Ákos Utasi;


• Image acquisitiono Single image or sequence (temporal characteristics)

• Pre-processingo Noise removal, contrast normalizationo Noise removal, contrast normalizationo Face/parts detection, segmentation, trackingo Geometrical normalization: rotation, scale

• Feature extractiono Image pixels high-level representation (lower

dimension)o Geometric, Appearance/Texture, or Motion-basedo Whole face (holistic) or part-based (local)o Image based or Model based (3DMM/AAM/CLM)

Three tasks in face analysis

X. Zhu, 2012

• Classificationo Estimate the expression category/dimensiono Wide range of classifiers (SVM, Boosting, etc)o Two types:


Two types:� Action unit (AU) detection� Prototype expressions (happy, surprised, etc)

• Spatial vs Spatio-temporal classificationo Spatio-temporal: model the dynamics

• Hierarchical classificationo AU (low-level) Expression (high-level)

Challenges in facial expression trackingApproach to Facial Actions Temporal Modelling" 2013. • S....

Documents

Transcript of Challenges in facial expression trackingApproach to Facial Actions Temporal Modelling" 2013. • S....