Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias,...

40
multimodal emotion multimodal emotion recognition and recognition and expressivity analysis expressivity analysis ICME 2005 Special Session ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems Lab National Technical University of Athens
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias,...

Page 1: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

multimodal emotion multimodal emotion recognition and expressivity recognition and expressivity analysisanalysis

ICME 2005 Special SessionICME 2005 Special Session

Stefanos Kollias, Kostas KarpouzisImage, Video and Multimedia

Systems Lab National Technical University of

Athens

Page 2: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session2

expressivity and emotion expressivity and emotion recognitionrecognition

• affective computing– capability of machines to recognize, express,

model, communicate and respond to emotional information

• computers need the ability to recognize human emotion– everyday HCI is emotional: three-quarters of

computer users admit to swearing at computers

– user input and system reaction are important to pinpoint problems or provide natural interfaces

Page 3: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session3

the the targeted interaction targeted interaction frameworkframework

• Generating intelligent interfaces with affective, learning, reasoning and adaptive capabilities.

• Multidisciplinary expertise is the basic means for novel interfaces, including perception and emotion recognition, semantic analysis, cognition, modelling and expression generation and production of multimodal avatars capable of adapting to the goals and context of interaction.

•  Humans function due to four primary modes of being, i.e., affect, motivation, cognition, and behavior; these are related to feeling, wanting, thinking, and acting.

• Affect is particularly difficult requiring to understand and model the causes and consequences of emotions.  The latter, especially as realized in behavior, is a daunting task

Page 4: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session4

Page 5: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session5

everyday emotional stateseveryday emotional states• dramatic extremes (terror,

misery, elation) are fascinating, but marginal for HCI.

• the target of an affect-aware system– register everyday states

with an emotional component – excitement, boredom, irritation, enthusiasm, stress, satisfaction, amusement

– achieve sensitivity to everyday emotional states

I think you might be getting just a *wee* bit bored – maybe a coffee?

Page 6: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session6

affective computing affective computing applicationsapplications

• detect specific incidents/situations that need human intervention– e.g. anger detection in a call center

• naturalistic interfaces– keyboard/mouse/pointer paradigm can be

difficult for the elderly, handicapped people or children

– speech and gesture interfaces can be useful

Page 7: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session7

the EU perspectivethe EU perspective• Until 2002, related research was dominated

by mid-scale projects– ERMIS: multimodal emotion recognition (facial

expressions, linguistic and prosody analysis)– NECA: networked affective ECAs– SAFIRA: affective input interfaces– NICE: Natural Interactive Communication for

Edutainment– MEGA: Multisensory Expressive Gesture

Applications– INTERFACE: Multimodal Analysis/Synthesis

System for Human Interaction to Virtual and Augmented Environments

Page 8: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session8

the EU perspectivethe EU perspective

• FP6 (2002-2006) issued two calls for multimodal interfaces – Call 1 (April 2003) and Call 5 (September 2005)

covering multimodal and multilingual areas– Integrated Projects: AMI – Augmented Multi-

party Interaction and CHIL - Computers In the Human Interaction Loop

– Networks of Excellence: Humaine and Similar– Other calls covered “Leisure and

entertainment”, “e-Inclusion”, “Cognitive systems” and “Presence and Interaction”

Page 9: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session9

the HUMAINE Network of the HUMAINE Network of ExcellenceExcellence

• FP6 Call 1 Network of Excellence: Research on Emotions and Human-Machine Interaction

• start: 1st January 2004, duration: 48 months• IST thematic priority: Multimodal Interfaces

– emotions in human-machine interaction– creation of a new, interdisciplinary research

community– advancing the state of the art in a principled

way

Page 10: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session10

the HUMAINE Network of the HUMAINE Network of ExcellenceExcellence

• 33 partner groups from 14 countries

• coordinated by Queen’s University of Belfast

• goals of HUMAINE:– integrate existing

expertise in psychology, computer engineering, cognition, interaction and usability

– promote shared insight• http://emotion-research.net

Page 11: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session11

moving forwardmoving forward• future EU orientations include (extracted from Call 1

evaluation, 2004):– adaptability and re-configurable interfaces – collaborative technologies and interfaces in the arts– less explored modalities, e.g. haptics, bio-sensing– affective computing, including character and facial

expression recognition and animation– more product innovation and industrial impact

• FP7 direction: Simulation, Visualization, Interaction, Mixed Reality– blending semantic/knowledge and interface

technologies

Page 12: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session12

the special sessionthe special session• segment-based approach to the recognition of

emotions in speech– M. Shami, M. Kamel, University of Waterloo

• comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition

– T. Vogt, E. Andre, University of Augsburg

• annotation and detection of blended emotions in real human-human dialogs recorded in a call center

– L. Vidrascu, L. Devillers, LIMSI-CNRS, France

• a real-time lip sync system using a genetic algorithm for automatic neural network configuration

– G. Zoric, I. Pandzic, University of Zagreb

• visual/acoustic emotion recognition– Cheng-Yao Chen, Yue-Kai Huang, Perry Cook, Princeton University

• an intelligent system for facial emotion recognition– R. Cowie, E. Douglas-Cowie, Queen’s University of Belfast, J. Taylor, King's College, S. Ioannou,

M. Wallace, IVML/NTUA

Page 13: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session13

the big picturethe big picture

• feature extraction from multiple modalities– prosody, words, face, gestures,

biosignals…

• unimodal recognition• multimodal recognition• using detected features to cater for

affective interaction

Page 14: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session14

audiovisual emotion audiovisual emotion recognitionrecognition

User Interface

LinguisticAnalysis

Subsystem

ParalinguisticAnalysis

Subsystem

FacialAnalysis

Subsystem

EmotionRecognitionSubsystem

Speech Speech Video

LinguisticParameters

PhoneticParameters

FacialParameters

EmotionalState

• the core system combines modules dealing with – visual signs– linguistic content

of speech (what you say)

– paralinguistic content (how you say it)

• and recognition based on all the signs

Page 15: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session15

facial analysis modulefacial analysis module

• face detection, i.e. finding a face without prior information about its location

• using prior knowledge about where to look– face tracking– extraction of key regions and points in the

face– monitoring of movements over time (as

features for user’s expressions/emotions)

• provide confidence level for the validity of each detected feature

Page 16: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session16

facial analysis modulefacial analysis module• face detection, obtained through SVM

classification• facial feature extraction, by robust estimation of

the primary facial features, i.e., eyes, mouth, eyebrows and nose

• fusion of different extraction techniques, with confidence level estimation.

• MPEG-4 FP and FAP feature extraction to feed the expression and emotion recognition task.

• 3-D modeling for improved accuracy in FP and FAP feature estimation, at an increased computational load, when the facial user model is known.

Page 17: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session17

facial analysis modulefacial analysis module

the extracted mask for the eyes

detected feature points in the masks

Page 18: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session18

FAP estimation FAP estimation

Absence of clear quantitative definition of FAPs

It is possible to model FAPs through FDP feature points movement using distances s(x,y)

e.g. close_t_r_eyelid (F20) - close_b_r_eyelid (F22) D13=s (3.2,3.4) f13= D13 - D13-

NEUTRAL

Page 19: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session19

Classifyface

no face

Quick rejection

- variance

- skin color

Preprocessing

Subspace Projection

SVM classifier

face detectionface detection

Page 20: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session20

detected face estimation of the active contour of

the face

extraction of the facial area

face detectionface detection

Page 21: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session21

extraction of eyes and mouth – key regions within the face

extraction of MPEG-4 Facial Points (FPs) – key points in the eye & mouth regions

facial feature extractionfacial feature extraction

Page 22: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session22

other visual featuresother visual features

• visemes, eye gaze, head pose– movement patterns, temporal correlations

• hand gestures, body movements– deictic/conversational gestures– “body language”

• measurable parameters to render expressivity on affective ECAs– spatial extent, repetitiveness, volume, etc.

Page 23: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session23

• Step 1: Scan or approximate 3d model (in this case estimated from video data only using face space approach)

video analysis using 3Dvideo analysis using 3D

Page 24: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session24

• Step 2: Represent 3d model using a predefined template geometry, the same templateis used for expressions.

This template shows higherdensity around eyes, and mouthand lower density around flatter areas such as cheeks, forehead, etc.

video analysis using 3Dvideo analysis using 3D

Page 25: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session25

•Step 3: Construct database of facial expressions by recording various actors. The statistics derived from these performances is stored in terms of a “Dynamic Face Space” •Step 4: Apply the expressions to the actor in the video data:

video analysis using 3Dvideo analysis using 3D

Page 26: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session26

•Step 5: Matching : rotate head + apply various expressions and match current state with 2D video frame

- Global Minimization process

video analysis using 3Dvideo analysis using 3D

Page 27: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session27

video analysis using 3Dvideo analysis using 3D

• the global matching/minimization process is complex

• it is sensitive to – illumination, which may vary across sequence,– shading, shadowing effects on the face, – color changes, or color differences– variability in expressions, some expressions– can not be generated using the statistics of the

a priori recorded sequences

• it is time consuming (several minutes per frame)

Page 28: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session28

Local template matching Pose estimation

video analysis using 3Dvideo analysis using 3D

Page 29: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session29

video analysis using 3Dvideo analysis using 3D

Page 30: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session30

3D models

video analysis using 3Dvideo analysis using 3D

Page 31: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session31

Add expressions

video analysis using 3Dvideo analysis using 3D

Page 32: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session32

auditory moduleauditory module

• linguistic analysis aims to extract the words that the speaker produces

• paralinguistic analysis aims to extract significant variations in the way words are produced - mainly in pitch, loudness, timing, and ‘voice quality’

• both are designed to cope with the less than perfect signals that are likely to occur in real use

Page 33: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session33

llinguistic analysisinguistic analysis

Signal Enhancement/Adaptation Module

Speech RecognitionModule

TextPost-Processing

Module

Short-termspectral domain

Singular valuedecomposition

SpeechSignal

EnhancedSpeechSignal Text

LinguisticParameters

(a) The Linguistic Analysis Subsystem

(b) The Speech Recognition Module

(a)

(b)

ParameterExtraction

Module

SearchEngine

Dictionary

AcousticModeling

LanguageModelingEnhanced

SpeechSignal

Text

Page 34: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session34

paralinguistic analysisparalinguistic analysis

• ASSESS, developed by QUB, describes speech at multiple levels – intensity & spectrum; edits, pauses, frication;

raw pitch estimates & a smooth fitted curve; rises & falls in intensity & pitch

Page 35: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session35

integrating the evidenceintegrating the evidence

Facial Emotion State Detection

Phonetic Emotion State Detection

Linguistic Emotion State Detection

Emotional State Detection

• level 1:– facial emotion– phonetic emotion– linguistic emotion

• level 2:– “total” emotional

state (result of the "level 1 emotions")

• modeling technique: fuzzy set theory (research by Massaro suggests this models the way humans integrate signs)

Page 36: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session36

integrating the evidenceintegrating the evidence

• mechanisms linking attention and emotion in the brain form a useful model

Goals (Inhibition) ACG

Salience NBM(Ach source)

Valence (Amygdala)

Goals (SFG)

Thalamus /SuperiorColliculus

IMC (hetermodal CX)

Visual Input

Page 37: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session37

biosignal analysisbiosignal analysis• different emotional expressions produce different

changes in autonomic activity:

– anger: increased heart rate and skin temperature– fear: increased heart rate, decreased skin

temperature– happiness: decreased heart rate, no change in

skin temperature• easily integrated with external channels (face and

speech)

presentation by J. Kim in the HUMAINE WP4 workshop, September 2004

Page 38: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session38

biosignal analysisbiosignal analysis

BVP- Blood volume pulse

EMG – Muscle tension

EKG– Heart rate

Respiration – Breathing rate

Temperature

GSR – Skin conductivity

Acoustics and noise

EEG – Brain waves

Page 39: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session39

biosignal analysisbiosignal analysis• skin-sensing requires physical contact• need to improve accuracy, robustness to motion

artifacts– vulnerable to distortion

• most research measures artificially elicited emotions in a lab environment and a from single subject

• different individuals show emotion with different response in autonomic channels (hard for multi-subjects)

• rarely studied physiological emotion recognition, literature offers ideas rather than well-defined solutions

Page 40: Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

July 7, 2005multimodal emotion recognition and expressivity analysis

ICME 2005 Special Session40

mmultimodal emotion ultimodal emotion recognitionrecognition

• recognition models- application dependency – discrete / dimensional / appraisal theory models

• theoretical models of multimodal integration – direct / separate / dominant / motor integration

• modality synchronization – visemes/ EMGs & FAPs / SC-RSP & speech

• temporal evolution and modality sequentiality– multimodal recognition techniques

• classifiers + context + goals + cognition/attention + modality significance in interaction