m ultimodal emotion recognition

24
multimodal emotion recognition recognition models- application dependency discrete / dimensional / appraisal theory models theoretical models of multimodal integration direct / separate / dominant / motor integration modality synchronization visemes/ EMGs & FAPs / SC-RSP & speech temporal evolution and modality sequentiality multimodal recognition techniques classifiers + context + goals + cognition/attention + modality significance in interaction

description

m ultimodal emotion recognition. recognition models- application dependency discrete / dimensional / appraisal theory models theoretical models of multimodal integration direct / separate / dominant / motor integration modality synchronization - PowerPoint PPT Presentation

Transcript of m ultimodal emotion recognition

Page 1: m ultimodal emotion recognition

multimodal emotion recognition

• recognition models- application dependency – discrete / dimensional / appraisal theory models

• theoretical models of multimodal integration – direct / separate / dominant / motor integration

• modality synchronization – visemes/ EMGs & FAPs / SC-RSP & speech

• temporal evolution and modality sequentiality– multimodal recognition techniques

• classifiers + context + goals + cognition/attention + modality significance in interaction

Page 2: m ultimodal emotion recognition

Two approaches have been developed and used for audiovisual emotion recognition:

A. Separated Recognition An attention-feedback recurrent neural network

applied to emotion recognition from speech.

A neurofuzzy system including a-priori knowledge used for emotion recognition from facial expressions.

multimodal emotion recognition

Page 3: m ultimodal emotion recognition

Separated Recognition

The goal was to evaluate performance of the obtained recognition by each modality. Visual feeltracing was required.

Pause detection & tune-based analysis, with speech playing the main role, was the means

to synchronise the two modalities.

Page 4: m ultimodal emotion recognition

Emotion analysis & facial expressions

• A rule-based system for emotion recognition was created, characterising a user’s emotional state in terms of the six universal, or archetypal, expressions (joy, surprise, fear, anger, disgust, sadness.

• Rules have been created in terms of the MPEG-4 FAPs for each of these expressions.

Page 5: m ultimodal emotion recognition

Sample Profiles of Anger

A1: F4[22, 124], F31[-131, -25], F32[-136,-34], F33[-189,-109],

F34[-183,-105], F35[-101,-31], F36[-108,-32], F37[29,85],

F38[27,89]

A2: F19[-330,-200], F20[-335,-205], F21[200,330],

F22[205,335], F31[-200,-80], F32[-194,-74], F33[-190,-70],

F34=[-190,-70]

A3: F19 [-330,-200], F20[-335,-205], F21[200,330],

F22[205,335], F31[-200,-80], F32[-194,-74], F33[70,190],

F34[70,190]

Page 6: m ultimodal emotion recognition

Emotion analysis & facial expressions

f : Values derived from the calculated distances

G : the value of a corresponding FAP

Page 7: m ultimodal emotion recognition
Page 8: m ultimodal emotion recognition

Expressions : Rule more often activated (% examined cases)

• Anger [open_jaw_low, lower_top_midlip_medium, raise_bottom_midlip_high, raise_left_inner_eyebrow_low,

raise_right_inner_eyebrow_low, raise_left_medium_eyebrow_low, raise_right_medium_eyebrow_low, squeeze_left_eyebrow_high,

squeeze_right_eyebrow_high, wrinkles_between_eyebrows_high, raise_left_outer_cornerlip_medium,

raise_right_outer_cornerlip_medium] (47%)• Joy [open_jaw_high, lower_top_midlip_low,

raise_bottom_midlip_verylow, widening_mouth_high, close_left_eye_high, close_right_eye_high] (39%)

• Disgust [open_jaw_low, lower_top_midlip_low, raise_bottom_midlip_high, widening_mouth_low, close_left_eye_high,

close_right_eye_high, raise_left_inner_eyebrow_medium, raise_right_inner_eyebrow_medium, raise_left_medium_eyebrow_medium,

raise_right_medium_eyebrow_medium, wrinkles_between_eyebrows_medium] {33%)

Page 9: m ultimodal emotion recognition

Expressions : Rule more often activated (% examined cases)

• Surprise [open_jaw_high, raise_bottom_midlip_verylow, widening_mouth_low, close_left_eye_low, close_right_eye_low,

raise_left_inner_eyebrow_high, raise_right_inner_eyebrow_high, raise_left_medium_eyebrow_high, raise_right_medium_eyebrow_high,

raise_left_outer_eyebrow_high, raise_right_outer_eyebrow_high, squeeze_left_eyebrow_low, squeeze_right_eyebrow_low,

wrinkles_between_eyebrows_low] (71%)• Neutral [open_jaw_low, lower_top_midlip_medium, raise_left_inner_eyebrow_medium, raise_right_inner_eyebrow_medium,

raise_left_medium_eyebrow_medium, raise_right_medium_eyebrow_medium,

raise_left_outer_eyebrow_medium, raise_right_outer_eyebrow_medium, squeeze_left_eyebrow_medium, squeeze_right_eyebrow_medium,

wrinkles_between_eyebrows_medium, raise_left_outer_cornerlip_medium,

raise_right_outer_cornerlip_medium] (70%)

Page 10: m ultimodal emotion recognition

Expression based Emotion Analysis Results

• These rules were extended to deal with 2-D continuous (activation-evaluation) 4 quadrant emotional space

• They were applied to QUB SALAS generated data to test the performance to real life emotion expressive data sets.

Page 11: m ultimodal emotion recognition
Page 12: m ultimodal emotion recognition
Page 13: m ultimodal emotion recognition

Clustering/Neurofuzzy Analysis of Facial Features

• The rule-based expression/emotion analysis system was extended to handle specific characteristics of each user in continuous 2-D emotional analysis.

• Novel clustering and fuzzy reasoning techniques were developed and used for producing specific FAP ranges (around 10 clusters) for each user and providing rules to handle them.

• Results on the continuous 2-D emotional framework with SALAS data indicate that a good performance (reaching 80%) was obtained applying the adapted systems to each specific user.

Page 14: m ultimodal emotion recognition

Direct Multimodal Recognition

The attention-feedback recurrent neural network architecture (ANNA) was applied to emotion recognition based on all input modalities.

Features extracted from all input modalities (linguistic, paralinguistic speech, FAPs) were provided by processing and analysing common SALAS emotional expressive data.

Page 15: m ultimodal emotion recognition

Emotion Recognition based on ANNA

• ANNA hidden layer = emotion state, + feedback control for attention (= IMC)

• Learning laws for ANNA developed• ANNA fuses all modalities or only one

Page 16: m ultimodal emotion recognition

BASIC EMOTION RECOGNITION ARCHITECTURE

Attention control system:

Feature vector Inputs:

Emotion state ashidden layer

Output as recognised emotional state

Page 17: m ultimodal emotion recognition

Text Post-Processing Module

• Prof. Whissell compiled ‘Dictionary of Affect in Language (DAL)’

• Mapping of ~9000 words → (activation-evaluation), based on students’ assessment

• Take words from meaningful segments obtained by pause detection → (activation-evaluation) space

• But humans use context to assign emotional content to words

Page 18: m ultimodal emotion recognition

ANNA on top correlated ASSESS features

• Quadrant match using top 10 activation features + top 10 evaluation features and activation – evaluation output space:

Feeltracer jd cc dr em

Avg Quad Match

0.42 0.39 0.37 0.45

Std Dev 0.03 0.02 0.02 0.04

Page 19: m ultimodal emotion recognition

ANNA on top correlated ASSESS features

• Half-plane match using top 10 activation features and activation only output space:

Feeltracer jd cc dr em

Avg Quad Match

0.75 0.66 0.64 0.74

Std Dev 0.02 0.02 0.02 0.03

Page 20: m ultimodal emotion recognition

Multi-modal Results

• 500 training epochs, 3 runs per dataset, final results being averaged (with associated Sdev).

• 5 hidden layer (EMOT) neurons and 5 feedback layer (IMC) neurons ; learning rate fixed at 0.001.

• Of each dataset 4 parts used for training and 1 part for testing the net on ‘unseen’ inputs.

• A=Activation, E=Evaluation; FT stands for the FeelTracer used as supervisor; AVG denotes the average quadrant match (for 2D-space) or average half-plane match for (1D-space) over 3 runs.

• PCA on the ASSESS features to reduce them from about 500 to around 7-10 as describing most of the volatility.

Page 21: m ultimodal emotion recognition

Multi-modal Results: Using A Output only

Classification using A output only: relatively high (in three cases up to 98%, and with two more at

90% or above)Effectiveness of data from FeelTracer EM:Average success rates of (86%, 88%, 98%, 89%,

95%, 98%, 98%) for 7 choices of input combinations: (ass/fap/dal/a+f/d+f/a+d/a+d+f)

Also high success with Feeltracer JD Consistently lower values for FeelTracer DR(all in 60-66% band) Also for CC (64%, 54%, 75%, 63%, 73%, 73%,

73%).

Page 22: m ultimodal emotion recognition

Ontology Representation & Facial Expression/Emotion Analysis

Use ontologies for real life usage of facial expression/emotion analysis results

• Extensibility (Ontologies form an excellent basis for considering

issues like constrained reasoning, personalisation, adaptation, which have been shown crucial for applying our results to real life applications )

• Standardisation(OWL ontologies form a standard knowledge

representation and reasoning Web framework)

Page 23: m ultimodal emotion recognition

Facial Emotion Analysis Ontology Development

• An ontology has been created to represent the geometry and different variations of facial expressions based on the MPEG-4 Face Animation Parameters (FDPs) and Face Definition Parameters (FAPs).

• The ontology was built using the ontology language OWL DL and the Protégé OWL ontologies development Plugin.

• The ontology will be the tool for extending the obtained results to real life applications dealing with specific users’ profiles & constraints.

Page 24: m ultimodal emotion recognition

Concept and Relation Examples

Concepts– Face– Face_Animation_Parameter– Face_Definition_Parameter– Facial_Expression

Relations is_Defined_By is_Animated_By has_Facial_Expression