Challenges in facial expression trackingApproach to Facial Actions Temporal Modelling" 2013. • S....
Transcript of Challenges in facial expression trackingApproach to Facial Actions Temporal Modelling" 2013. • S....
Challenges in facial expression trackingChallenges in facial expression tracking
Gábor SzirtesRealeyes OÜ
www.realeyesit.com
Roadmap
� Why faces?
� Natural inspirations
� Processing pipeline
� Detection� Detection
� Shape alignment
� Analysis/synthesis: expression analysis
� Where to go
� References
Why faces?
� Unique ID: biometrics for verification/identification
� Non-verbal communication channel: HCI, monitoring, controlmonitoring, control
� detection, alignment, modeling, relighting, head pose tracking, gender/age recognition
� Affective science Realeyes!
Faces
vs
Medical imaging
http://www.made4ll.com/medical-equipment/normal-brain-ct-scan/
http://cmp.felk.cvut.cz/eccv2004/tutorials/eccv2004-t4.htm
Face recognition by humans
Face recognition by humans
Pinha et al 2006
...we are not there yet
…but on the right track:
Manifold hypothesis“Natural data lives in a low-dimensional (non-linear) manifold”
Deep Learning Tutorial at ICML 2013 by Yann LeCun and Marc'Aurelio Ranzato
Manifold hypothesis
Deep Learning Tutorial at ICML 2013 by Yann LeCun and Marc'Aurelio Ranzato
Back to the engineers The usual pipeline
State of the art
Face detection techniquesIndustrial standard:
Viola-Jones detector
� Integral image� Haar-like features� AdaBoost� Appearance based� Slow to train, fast to evaluate� Needs to search over scale and
space� Still one of the best methods!
State of the artdetection/learning/feature selection
Zhang, Cha, and Zhengyou Zhang. A survey of recent advances in face detection. Tech. rep., Microsoft Research, 2010.
State of the artDetection - where were we ~2011
X. Zhu, 2012
A simple, yet efficient method: ASEF
(Bolme et al, 2011)
Liao & Jain - unconstrained face detection
NPD- normalized pixel differenceFeature:Learning method & feature selection: boosted regression
tree cascades
Liao, Shengcai, Anil K. Jain, and Stan Z. Li. "Unconstrained Face Detection", Tech. report, 2012
Liao & Jain - unconstrained face detection
pittpatt: http://www.ri.cmu.edu/person.html?type=publications&person_id=270
measurements were made on four subsets of 100 images from FDDB
Zhu & Ramanan - joint detection, localization, pose estimation
• Model topological changes with view-specificmixture of tree structured elastic models (simple
way to capture 3D!)• Branches appear and disappear.• Branches appear and disappear.• Capture self-occlusion due to viewpoint
changes
X. Zhu, 2012
Zhu & Ramanan - joint detection, localization, pose estimation
ModelModel
I : Image, L : locations of parts, m : mixture, prior of mixture
Inference
X. Zhu, 2012
Zhu & Ramanan
X. Zhu, 2012
Explaining deformations by statistical models (registration)
• Helps parameterize the deformation in the appearance space (e.g. gives the correspondence between parts of 2 images)
• The goal is to have a compact set of • The goal is to have a compact set of parameters to characterize pose, expression,…
• Model should be robust, generalize well and efficient
Cootes et al 1995
Appearance Models
�Shape models represent shape variation
�Eigen-models can represent texture variation
�Combined appearance models represent both
�Generative model�Generative model
� general
� specific
� compact
Building Appearance Models
�For each example extract shape vector
Shape, x = (x1,y1, … , xn, yn)T
�Build statistical shape model, ssbPxx +=
Building Appearance Models
�For each example, extract texture vector
Shape, x = (x1,y1, … , xn, yn)T
Texture, gWarp to
mean
shape
�For each example, extract texture vector
Building Appearance Models
Shape, x = (x1,y1, … , xn, yn)T
�Normalise vectors
�Build eigen-model
Texture, gWarp to
mean
shape
ggbPgg +=
Combined Models
�Shape and texture often correlated� When smile, shadows change (texture) and shape changes
� Learning this correlation leads to more compact (and specific) model(and specific) model
Combined Appearance Models
cQ
WQQc
b
Wbb
==
=
g
s
g
s
c
ggbPgg +=ssbPxx += ggbPgg +=ssbPxx +=
cQgg
cQxx
g
s
+=+= Varying c changes both shape and
texture
Image interpretation
• New image is explained by a synthetic image generated from the learnt model
• It works by adjusting the model parameters and assumes good initializationand assumes good initialization
• Recent flavors differ in 1, coupling shape and texture, 2, texture representation and 3, parameter optimization
Cootes et al 1998
Constrained Local Model (Saragih 2011)
• Independent response maps over local patches
• Regularized landmark mean-shift for parameter updates
CLM continued
� Fast convergence, more robust and flexible than AAM
� Still assumes close enough initialization� Strong impact of the learned mean shape� Cannot handle occlusion
Facial expressions
� Physiology� Contraction of facial muscles
� Results in deformed facial features (eye brows, lips, skin texture, wrinkles etc)
�� Measurement
� Location, intensity, dynamics
� Temporal evolution: onset, apex, offset
� Inter-personal variations
� Difficult to define absolute intensities
� Compare to the person's 'neutral' face
Challenges
� View or pose variation� Translation, scale, in-plane rotation
� Out-of-plane rotation (difficult)
� Uncontrolled environment� Background clutter� Background clutter
� Partial occlusions
� Lighting conditions, illumination changes, shadow
� Quality: image noise, compression artifacts etc
� Face variability� Age, gender, race
� Facial hair, make-up, glasses
� Posed vs. spontaneous expressions
40. dia
6 Erik Murphy-Chutorian and Mohan Manubhai Trivedi, Head Pose Estimation in Computer Vision: A Survey, IEEE Transactions on Pattern Analysisand Machine Intelligence, Volume 31 Issue 4, April 2009, Pages 607-626Ákos Utasi; Challenges
Affect Modelling
Evolution of affect modelling� Categorical: complex state = single label
� Dimensional [Russell, 1980]: 2D, 3D ... 5D space
� Appraisal-based approach: componential model
� Emotion = evaluation of internal state and the state of the outside world
42. dia
14 Source: www.psychology.nottingham.ac.uk/staff/rwnÁkos Utasi;
15 Source: Eduardo M. Eisman, Víctor López, Juan Luis Castro, Controlling the emotional state of an embodied conversationalagent with a dynamic probabilistic fuzzy rules based system. Expert Systems with Applications, Volume 36, Issue 6, August 2009, Pages 9698–9708Ákos Utasi;
Geometric Features
43. dia
12 Source: YingLi Tian, Shizhi Chen, Understanding Effects of Image Resolution for Facial Expression Analysis, Journal of Computer Vision and Image ProcessingÁkos Utasi;
13 Source: YingLi Tian, Shizhi Chen, Understanding Effects of Image Resolution for Facial Expression Analysis, Journal of Computer Vision and Image ProcessingÁkos Utasi;
Texture Descriptors• Popular descriptors
o Histograms of Oriented Gradients (HOG)o Local Binary Patterns (LBP) + variantso Gabor wavelets, Gabor jets, etc
• Local Phase Quantization [Ojansivu, 2008]• Local Phase Quantization [Ojansivu, 2008]o Robust to image blurringo Motivation:
image blur
DFT
h - Gaussian blur H - Gaussian
observation
Spatial Deformation Features
45. dia
4 Source: Beat Fasel, Juergen Luettin, Automatic Facial Expression Analysis: A Survey, 2003Ákos Utasi;
Optical Flow Motion Features
46. dia
19 Source: Beat Fasel, Juergen Luettin, Automatic Facial Expression Analysis: A Survey, 2003Ákos Utasi;
Action Units
47. dia
9 Source: B. Jiang, M. Valstar, B. Martinez, and M. Pantic, "A Dynamic Appearance Descriptor Approach to Facial Actions Temporal Modelling," IEEE Transactions of Systems, Man and Cybernetics — Part B, 2013.Ákos Utasi;
10 Source: Vinay Bettadapura, Face Expression Recognition and Analysis: The State of the Art, Technical ReportÁkos Utasi;
11 Source: Vinay Bettadapura, Face Expression Recognition and Analysis: The State of the Art, Technical ReportÁkos Utasi;
Action Unit - FACS coding
AU 27 - Mouth Stretch
AU 9 - Nose winkler
48. dia
16 Source: http://www.scholarpedia.org/article/Facial_expression_analysisÁkos Utasi;
17 Source: http://www.cs.cmu.edu/~face/facs.htmÁkos Utasi; Image Resolution vs Performance
240x160 120x80
Performance of Human Observers!
60x40 30x20
15x10
Du,Martinez 2011
49. dia
3 Source: Shichuan Du, Aleix M. Martinez, The resolution of facial expressions of emotion, Journal of Vision (2011) 11(13):24, 1–13Ákos Utasi;
Recent Advances
• Pyramid Feature Representationo Encodes both the local texture and the global shapeo More robust to shifts
Khana et al 2013
50. dia
1 Source: Rizwan Ahmed Khana, Alexandre Meyera, Hubert Konika, Saïda Bouakaza, Framework for reliable, real-time facial expression recognition for low resolution images, Journal Pattern Recognition Letters, Volume 34 Issue 10, July, 2013, Pages 1159-1168Ákos Utasi; Recent Advances
• Three Orthogonal Planes (TOP) descriptoro Extend descriptors to the temporal domaino Extract features at orthogonal XY, XT, YT planes
independently
Jiang et al, 2013
Recent Advances
• Free-form Deformation (FFD)o Advantage: sensitive to subtle motion
Apply FFD
Note the squinting of the eyes (AU6).
t_{start}
t_{end}
FFD
-
MHI
Note the squinting of the eyes (AU6).Traditional Motion History Image (MHI) cannot capture it.
Where to go
• Learned features• Variable feature length and type• Learned metrics for matching• Dynamic model switching• Dynamic model switching• Extension with other body parts• Discovery of additional expressions or states
References• S. M. Crouzet and S. J. Thorpe “Low-level cues and ultra-fast face detection” 2011
• P. Sinha et al “Face Recognition by Humans: 19 Results all Computer Vision Researchers Should Know About“ 2006
• P Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features” 2001
• Bolme et al “Average of Synthetic Exact Filters” 2009
• X. Zhu and D. Ramanan “Face Detection, Pose Estimation, and Landmark Localization in the • X. Zhu and D. Ramanan “Face Detection, Pose Estimation, and Landmark Localization in the Wild” 2012
• T.F. Cootes and C.J. Taylor and D.H. Cooper and J. Graham "Active shape models - their training and application“ 1995
• G. J. Edwards and C.J. Taylor and T.F. Cootes. "Interpreting face images using active appearance models“ 1998
• J. M. Saragih and S. Lucey and J. F. Cohn "Deformable Model Fitting by Regularized Landmark Mean-Shift" 2011
• X. Xiong and F. De la Torre “Supervised Descent Method and its Applications to Face Alignment” 2013
References cont’d• Cl C. Chibelushi and F. Bourel, "Facial Expression Recognition: A Brief Tutorial Overview",
2002
• B. Jiang and M. Valstar and B. Martinez and M. Pantic, "A Dynamic Appearance Descriptor Approach to Facial Actions Temporal Modelling" 2013.
• S. Du, Al. M. Martinez, “The resolution of facial expressions of emotion”, 2011
• R A Khana and A Meyera and H Konika and S Bouakaza “Framework for reliable, real-time facial expression recognition for low resolution images” 2013
•• V Ojansivu and J Heikkilä: “Blur Insensitive Texture Classification Using Local Phase Quantization” 2008
Interesting places
• http://www.face-rec.org/databases/
• http://www.scholarpedia.org/article/Facial_expression_analysis
• http://www.cs.cmu.edu/~face/facs.htm Thank you for your
attention!attention!
You can reach me at [email protected]
Face detection
• Challengeso Multiview, occlusion, large variation of faces,
illumination, context, etc.
• State of the art
• Industry standard: Viola Jones and variants• Industry standard: Viola Jones and variants
• Featureso Haar waveletso SURF, HOG, LBP, NPD
• Learning methodso boosting: adaboost, floatboost, ...o logistic regression, one-class svm, multi-class
svmo mlp
• Deep Architectures
Appearance based face detection
Uses statistical analysis and machine learning techniques to learn the “characteristics” of a face from a large set of images
• PCA, LDA
• Support Vector Machines• Support Vector Machines
• Neural Networks
• Hidden Markov Models
• Adaboost
Most successful approach, fast and robust• Detection rates 80-90% at a false positive rate of 10%
Needs to search over scale and space and requires large set of training examples
Industry StandardViola Jones
FeaturesCascade structure variants
C. Huang, H. Ai, Y. Li, and S. Lao, "High -performance Rotation Invariant Multi-View Face Dete ction ", IEEE Trans. on PAMI, 29(4), 2007.
Learning methodsReal boost
RealBoost
Multiclass
Learning methodsFloat boost
FloatBoost
Multiclass
FeaturesHaar features
Rectangular features with flexible sizes and distances introduced.
The rotated integral image/summed area table.
Rectangular features with flexible sizes and distances introduced.
FeaturesLBP
At each iteration, the LBP representation LBPx is computed from the subwindow and fed to the SVM classifier to determine whether it is a face or not. The classifier decides on the ”faceness” of the subwindow according to the sign of the following function:
F(LBPx) = Sgn(sum αi yi K(LBPx, LBPti ) + b)i=1
where LBPti is the LBP representation of the training sample ti , yi is 1 or -1 depending on whether ti is a positive or negative sample (face or nonface), l is the number of samples, b is a
A Discriminative Feature Space for Detecting and Recognizing Faces CVPR2004, Abdenour Hadid, Matti Pietikainen
sample (face or nonface), l is the number of samples, b is a scaler (bias), and K(., .) a second degree polynomial kernel function.
Surf cascade
Features: SURF– 2x2 cell of patch, each cell is 8-dim vector
• Sum of dx, |dx| when dy >=0
• Sum of dx, |dx| when dy <0
• Sum of dy, |dy| when dx >=0• Sum of dy, |dy| when dx >=0
• Sum of dy, |dy| when dx <0
– Total is 2x2x8 = 32 dim feature vector, 8-channel integral images
Feature Pool– In a 40x40 face detection template slide the patch (x, y, w, h) with fixed step =
4 pixels. Each cell at least 8x8 pixels, w or h at least 16 pixels with 1:1, 1:2, 2:3... aspect-ratio (w/h), totally 396 local SURF patches
Weak classifier: logistic regression on 32dim SURF– h(x) = P(y|x, w) = 1/(1+exp(-ywx).
Surf cascade on FDDBHuang & al. Multiview Face
Detection - vector boosting
Huang & al. Multiview Face Detection sparse granular features + WFS tree
C. Huang, H. Ai, Y. Li, and S. Lao, "High -performance Rotation Invariant Multi-View Face Dete ction ", IEEE Trans. on PAMI, 29(4), 2007.
Huang & al. Multiview Face Detectionsparse granular features + WFS tree
C. Huang, H. Ai, Y. Li, and S. Lao, "High -performance Rotation Invariant Multi-View Face Dete ction ", IEEE Trans. on PAMI, 29(4), 2007.
The first four sparse granular features in the root node learned by the Vector Boosting algorithm and heuristic search method. They are illustrated as linear features: white blocks indicate positive granules while black ones denote negative granules. As each granule is, in fact, the average of pixels in the corresponding patch, granules of different sizes are drawn in different gray levels.
ROC curves on CMU rotate testing set (223 frontal faces with different RIP angles in 50 images).
Deep learning
Deep Learning Tutorial at ICML 2013 by Yann LeCun and Marc'Aurelio Ranzato
Deep learningMainstream object Recognition pipeline 2006 - 2012
Deep Learning Tutorial at ICML 2013 by Yann LeCun and Marc'Aurelio Ranzato
Deep learning Convolutional Neural Nets
Deep Learning Tutorial at ICML 2013 by Yann LeCun and Marc'Aurelio Ranzato
Deep learning Convolutional Neural Nets
Deep Learning Tutorial at ICML 2013 by Yann LeCun and Marc'Aurelio Ranzato
Classical Pattern Recognition Pipeline
76. dia
22 Source: Claude C. Chibelushi, Fabrice Bourel, "Facial Expression Recognition: A Brief Tutorial Overview", 20022Ákos Utasi;
Classical Pattern Recognition Pipeline
• Image acquisitiono Single image or sequence (temporal characteristics)
• Pre-processingo Noise removal, contrast normalizationo Noise removal, contrast normalizationo Face/parts detection, segmentation, trackingo Geometrical normalization: rotation, scale
• Feature extractiono Image pixels high-level representation (lower
dimension)o Geometric, Appearance/Texture, or Motion-basedo Whole face (holistic) or part-based (local)o Image based or Model based (3DMM/AAM/CLM)
Three tasks in face analysis
X. Zhu, 2012
• Classificationo Estimate the expression category/dimensiono Wide range of classifiers (SVM, Boosting, etc)o Two types:
Classical Pattern Recognition Pipeline
Two types:� Action unit (AU) detection� Prototype expressions (happy, surprised, etc)
• Spatial vs Spatio-temporal classificationo Spatio-temporal: model the dynamics
• Hierarchical classificationo AU (low-level) Expression (high-level)