Emotion Recognition and Cognitive

72
From imagination to impact Julien Epps, Fang Chen and Bo Yin Using Information to Drive Decisions Emotion Recognition and Cognitive Load Measurement from Speech NICTA Copyright 2010 1 [email protected] The material is mainly coming from the tutorial of APSIP A A nnual Summit and Conference December 14 th , 2010

Transcript of Emotion Recognition and Cognitive

Page 1: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 172

From imagination to impact

Julien Epps Fang Chen and Bo Yin

Using Information to Drive Decisions

Emotion Recognition and CognitiveLoad Measurement from Speech

NICTA Copyright 20101

fangchennictacomau

The material is mainly coming from the tutorial ofAPSIPA Annual Summit and Conference

December 14th 2010

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 272

Overview

bull Emotion and Mental State Recognition Systems

bull Emotions and Cognitive Loadndash What are they and how can they be measured

bull Feature Extraction

ndash

bull Normalisation Modelling and Classification

bull Speech Databases and their Design

bull Applications of Emotion Recognition and Cognitive Load

Measurement

bull Summary

2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 372

Emotion and Mental State

3

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 472

Emotion Recognition Some History

bull The Measurement of Emotion

ndash Whately-Smith 1922

bull Emotion in speech (science)ndash Scherer 1980s-present

bull Emotion in synthetic speech

ndash 1990sbull Emotion recognition

ndash Dellaert et al 1996 ndash prosodic features

bull Applications of emotion recognition

ndash Petrushin 1999 Lee et al 2001 ndash call centres

bull Affective computing

ndash Picard 2000

bull 2000 ~10 papers per year4

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 572

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 672

Cognitive Load Some History

bull Working memory

ndash Miller et al 1960

ndash Wickens 1980s Baddeley 1990s onwards

bull Cognitive load in learning

ndash Sweller (mid 80s)

bull Working memory and languagendash Gathercole 1993

bull Cognitive load in speech (science)

ndash Berthold and Jameson 1999

bull Cognitive load classification from speech

ndash Yin et al 2008

6

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772

Emotion Recognition Key Components

bull Typical acoustic based system

speech featureextraction

featurenormalisation

voicingdetection

7

emotion

modelling

classification

regression

recognizedemotion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872

Emotion Recognition Different Approaches

bull Acoustic vs prosodic vs linguistic

bull Detailed spectral vs broad spectral features

bull Direct vs adapted models

bull Static (utterance) vs dynamic (framemulti-frame)

bull Categorical vs ordinal vs regression-basedclassification

bull Recognition vs detection

8

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972

Emotion Recognition Key Current Challenges

bull Basis for emotion mental states in physiology and

hence speech productionndash Shared with expressive speech synthesis

ndash Difficult to reverse-engineer spoken emotion

ndash Use of wide range of physiological sensors may be helpful

bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration

ndash Complex emotion dependency varies between different contexts

(eg some words more emotive than others)

9

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072

Emotion Recognition Key Current Challenges

bull Characterizing emotion mental states using robust

featuresndash Lots of features in literature

ndash Why should a given feature be not be useful

ndash Are some features more sensitive to some emotionsmental states

an w y

ndash How should temporal information best be used

10

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172

Emotion Recognition Key Current Challenges

bull Recognition of emotion in naturally occurring

speechndash Much initial work on emotion recognition done on acted speech

ndash Initial work on cognitive load classification done on carefully

controlled experimental data

ndash ea emot ons meta states ne acte or exper menta ata

ndash Taking emotion recognition from the lab to the field

11

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272

Emotion Recognition Key Current Challenges

bull Understanding application areasndash Lots of vision statements about affective computing

bull Detail still coming

ndash Understanding likely limits of automatic approaches

bull Match approaches with suitable applications

ndash

range of different situations

ndash New applications still emerging

12

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372

13

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 2: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 272

Overview

bull Emotion and Mental State Recognition Systems

bull Emotions and Cognitive Loadndash What are they and how can they be measured

bull Feature Extraction

ndash

bull Normalisation Modelling and Classification

bull Speech Databases and their Design

bull Applications of Emotion Recognition and Cognitive Load

Measurement

bull Summary

2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 372

Emotion and Mental State

3

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 472

Emotion Recognition Some History

bull The Measurement of Emotion

ndash Whately-Smith 1922

bull Emotion in speech (science)ndash Scherer 1980s-present

bull Emotion in synthetic speech

ndash 1990sbull Emotion recognition

ndash Dellaert et al 1996 ndash prosodic features

bull Applications of emotion recognition

ndash Petrushin 1999 Lee et al 2001 ndash call centres

bull Affective computing

ndash Picard 2000

bull 2000 ~10 papers per year4

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 572

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 672

Cognitive Load Some History

bull Working memory

ndash Miller et al 1960

ndash Wickens 1980s Baddeley 1990s onwards

bull Cognitive load in learning

ndash Sweller (mid 80s)

bull Working memory and languagendash Gathercole 1993

bull Cognitive load in speech (science)

ndash Berthold and Jameson 1999

bull Cognitive load classification from speech

ndash Yin et al 2008

6

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772

Emotion Recognition Key Components

bull Typical acoustic based system

speech featureextraction

featurenormalisation

voicingdetection

7

emotion

modelling

classification

regression

recognizedemotion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872

Emotion Recognition Different Approaches

bull Acoustic vs prosodic vs linguistic

bull Detailed spectral vs broad spectral features

bull Direct vs adapted models

bull Static (utterance) vs dynamic (framemulti-frame)

bull Categorical vs ordinal vs regression-basedclassification

bull Recognition vs detection

8

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972

Emotion Recognition Key Current Challenges

bull Basis for emotion mental states in physiology and

hence speech productionndash Shared with expressive speech synthesis

ndash Difficult to reverse-engineer spoken emotion

ndash Use of wide range of physiological sensors may be helpful

bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration

ndash Complex emotion dependency varies between different contexts

(eg some words more emotive than others)

9

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072

Emotion Recognition Key Current Challenges

bull Characterizing emotion mental states using robust

featuresndash Lots of features in literature

ndash Why should a given feature be not be useful

ndash Are some features more sensitive to some emotionsmental states

an w y

ndash How should temporal information best be used

10

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172

Emotion Recognition Key Current Challenges

bull Recognition of emotion in naturally occurring

speechndash Much initial work on emotion recognition done on acted speech

ndash Initial work on cognitive load classification done on carefully

controlled experimental data

ndash ea emot ons meta states ne acte or exper menta ata

ndash Taking emotion recognition from the lab to the field

11

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272

Emotion Recognition Key Current Challenges

bull Understanding application areasndash Lots of vision statements about affective computing

bull Detail still coming

ndash Understanding likely limits of automatic approaches

bull Match approaches with suitable applications

ndash

range of different situations

ndash New applications still emerging

12

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372

13

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 3: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 372

Emotion and Mental State

3

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 472

Emotion Recognition Some History

bull The Measurement of Emotion

ndash Whately-Smith 1922

bull Emotion in speech (science)ndash Scherer 1980s-present

bull Emotion in synthetic speech

ndash 1990sbull Emotion recognition

ndash Dellaert et al 1996 ndash prosodic features

bull Applications of emotion recognition

ndash Petrushin 1999 Lee et al 2001 ndash call centres

bull Affective computing

ndash Picard 2000

bull 2000 ~10 papers per year4

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 572

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 672

Cognitive Load Some History

bull Working memory

ndash Miller et al 1960

ndash Wickens 1980s Baddeley 1990s onwards

bull Cognitive load in learning

ndash Sweller (mid 80s)

bull Working memory and languagendash Gathercole 1993

bull Cognitive load in speech (science)

ndash Berthold and Jameson 1999

bull Cognitive load classification from speech

ndash Yin et al 2008

6

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772

Emotion Recognition Key Components

bull Typical acoustic based system

speech featureextraction

featurenormalisation

voicingdetection

7

emotion

modelling

classification

regression

recognizedemotion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872

Emotion Recognition Different Approaches

bull Acoustic vs prosodic vs linguistic

bull Detailed spectral vs broad spectral features

bull Direct vs adapted models

bull Static (utterance) vs dynamic (framemulti-frame)

bull Categorical vs ordinal vs regression-basedclassification

bull Recognition vs detection

8

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972

Emotion Recognition Key Current Challenges

bull Basis for emotion mental states in physiology and

hence speech productionndash Shared with expressive speech synthesis

ndash Difficult to reverse-engineer spoken emotion

ndash Use of wide range of physiological sensors may be helpful

bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration

ndash Complex emotion dependency varies between different contexts

(eg some words more emotive than others)

9

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072

Emotion Recognition Key Current Challenges

bull Characterizing emotion mental states using robust

featuresndash Lots of features in literature

ndash Why should a given feature be not be useful

ndash Are some features more sensitive to some emotionsmental states

an w y

ndash How should temporal information best be used

10

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172

Emotion Recognition Key Current Challenges

bull Recognition of emotion in naturally occurring

speechndash Much initial work on emotion recognition done on acted speech

ndash Initial work on cognitive load classification done on carefully

controlled experimental data

ndash ea emot ons meta states ne acte or exper menta ata

ndash Taking emotion recognition from the lab to the field

11

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272

Emotion Recognition Key Current Challenges

bull Understanding application areasndash Lots of vision statements about affective computing

bull Detail still coming

ndash Understanding likely limits of automatic approaches

bull Match approaches with suitable applications

ndash

range of different situations

ndash New applications still emerging

12

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372

13

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 4: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 472

Emotion Recognition Some History

bull The Measurement of Emotion

ndash Whately-Smith 1922

bull Emotion in speech (science)ndash Scherer 1980s-present

bull Emotion in synthetic speech

ndash 1990sbull Emotion recognition

ndash Dellaert et al 1996 ndash prosodic features

bull Applications of emotion recognition

ndash Petrushin 1999 Lee et al 2001 ndash call centres

bull Affective computing

ndash Picard 2000

bull 2000 ~10 papers per year4

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 572

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 672

Cognitive Load Some History

bull Working memory

ndash Miller et al 1960

ndash Wickens 1980s Baddeley 1990s onwards

bull Cognitive load in learning

ndash Sweller (mid 80s)

bull Working memory and languagendash Gathercole 1993

bull Cognitive load in speech (science)

ndash Berthold and Jameson 1999

bull Cognitive load classification from speech

ndash Yin et al 2008

6

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772

Emotion Recognition Key Components

bull Typical acoustic based system

speech featureextraction

featurenormalisation

voicingdetection

7

emotion

modelling

classification

regression

recognizedemotion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872

Emotion Recognition Different Approaches

bull Acoustic vs prosodic vs linguistic

bull Detailed spectral vs broad spectral features

bull Direct vs adapted models

bull Static (utterance) vs dynamic (framemulti-frame)

bull Categorical vs ordinal vs regression-basedclassification

bull Recognition vs detection

8

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972

Emotion Recognition Key Current Challenges

bull Basis for emotion mental states in physiology and

hence speech productionndash Shared with expressive speech synthesis

ndash Difficult to reverse-engineer spoken emotion

ndash Use of wide range of physiological sensors may be helpful

bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration

ndash Complex emotion dependency varies between different contexts

(eg some words more emotive than others)

9

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072

Emotion Recognition Key Current Challenges

bull Characterizing emotion mental states using robust

featuresndash Lots of features in literature

ndash Why should a given feature be not be useful

ndash Are some features more sensitive to some emotionsmental states

an w y

ndash How should temporal information best be used

10

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172

Emotion Recognition Key Current Challenges

bull Recognition of emotion in naturally occurring

speechndash Much initial work on emotion recognition done on acted speech

ndash Initial work on cognitive load classification done on carefully

controlled experimental data

ndash ea emot ons meta states ne acte or exper menta ata

ndash Taking emotion recognition from the lab to the field

11

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272

Emotion Recognition Key Current Challenges

bull Understanding application areasndash Lots of vision statements about affective computing

bull Detail still coming

ndash Understanding likely limits of automatic approaches

bull Match approaches with suitable applications

ndash

range of different situations

ndash New applications still emerging

12

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372

13

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 5: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 572

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 672

Cognitive Load Some History

bull Working memory

ndash Miller et al 1960

ndash Wickens 1980s Baddeley 1990s onwards

bull Cognitive load in learning

ndash Sweller (mid 80s)

bull Working memory and languagendash Gathercole 1993

bull Cognitive load in speech (science)

ndash Berthold and Jameson 1999

bull Cognitive load classification from speech

ndash Yin et al 2008

6

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772

Emotion Recognition Key Components

bull Typical acoustic based system

speech featureextraction

featurenormalisation

voicingdetection

7

emotion

modelling

classification

regression

recognizedemotion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872

Emotion Recognition Different Approaches

bull Acoustic vs prosodic vs linguistic

bull Detailed spectral vs broad spectral features

bull Direct vs adapted models

bull Static (utterance) vs dynamic (framemulti-frame)

bull Categorical vs ordinal vs regression-basedclassification

bull Recognition vs detection

8

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972

Emotion Recognition Key Current Challenges

bull Basis for emotion mental states in physiology and

hence speech productionndash Shared with expressive speech synthesis

ndash Difficult to reverse-engineer spoken emotion

ndash Use of wide range of physiological sensors may be helpful

bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration

ndash Complex emotion dependency varies between different contexts

(eg some words more emotive than others)

9

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072

Emotion Recognition Key Current Challenges

bull Characterizing emotion mental states using robust

featuresndash Lots of features in literature

ndash Why should a given feature be not be useful

ndash Are some features more sensitive to some emotionsmental states

an w y

ndash How should temporal information best be used

10

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172

Emotion Recognition Key Current Challenges

bull Recognition of emotion in naturally occurring

speechndash Much initial work on emotion recognition done on acted speech

ndash Initial work on cognitive load classification done on carefully

controlled experimental data

ndash ea emot ons meta states ne acte or exper menta ata

ndash Taking emotion recognition from the lab to the field

11

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272

Emotion Recognition Key Current Challenges

bull Understanding application areasndash Lots of vision statements about affective computing

bull Detail still coming

ndash Understanding likely limits of automatic approaches

bull Match approaches with suitable applications

ndash

range of different situations

ndash New applications still emerging

12

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372

13

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 6: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 672

Cognitive Load Some History

bull Working memory

ndash Miller et al 1960

ndash Wickens 1980s Baddeley 1990s onwards

bull Cognitive load in learning

ndash Sweller (mid 80s)

bull Working memory and languagendash Gathercole 1993

bull Cognitive load in speech (science)

ndash Berthold and Jameson 1999

bull Cognitive load classification from speech

ndash Yin et al 2008

6

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772

Emotion Recognition Key Components

bull Typical acoustic based system

speech featureextraction

featurenormalisation

voicingdetection

7

emotion

modelling

classification

regression

recognizedemotion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872

Emotion Recognition Different Approaches

bull Acoustic vs prosodic vs linguistic

bull Detailed spectral vs broad spectral features

bull Direct vs adapted models

bull Static (utterance) vs dynamic (framemulti-frame)

bull Categorical vs ordinal vs regression-basedclassification

bull Recognition vs detection

8

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972

Emotion Recognition Key Current Challenges

bull Basis for emotion mental states in physiology and

hence speech productionndash Shared with expressive speech synthesis

ndash Difficult to reverse-engineer spoken emotion

ndash Use of wide range of physiological sensors may be helpful

bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration

ndash Complex emotion dependency varies between different contexts

(eg some words more emotive than others)

9

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072

Emotion Recognition Key Current Challenges

bull Characterizing emotion mental states using robust

featuresndash Lots of features in literature

ndash Why should a given feature be not be useful

ndash Are some features more sensitive to some emotionsmental states

an w y

ndash How should temporal information best be used

10

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172

Emotion Recognition Key Current Challenges

bull Recognition of emotion in naturally occurring

speechndash Much initial work on emotion recognition done on acted speech

ndash Initial work on cognitive load classification done on carefully

controlled experimental data

ndash ea emot ons meta states ne acte or exper menta ata

ndash Taking emotion recognition from the lab to the field

11

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272

Emotion Recognition Key Current Challenges

bull Understanding application areasndash Lots of vision statements about affective computing

bull Detail still coming

ndash Understanding likely limits of automatic approaches

bull Match approaches with suitable applications

ndash

range of different situations

ndash New applications still emerging

12

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372

13

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 7: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772

Emotion Recognition Key Components

bull Typical acoustic based system

speech featureextraction

featurenormalisation

voicingdetection

7

emotion

modelling

classification

regression

recognizedemotion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872

Emotion Recognition Different Approaches

bull Acoustic vs prosodic vs linguistic

bull Detailed spectral vs broad spectral features

bull Direct vs adapted models

bull Static (utterance) vs dynamic (framemulti-frame)

bull Categorical vs ordinal vs regression-basedclassification

bull Recognition vs detection

8

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972

Emotion Recognition Key Current Challenges

bull Basis for emotion mental states in physiology and

hence speech productionndash Shared with expressive speech synthesis

ndash Difficult to reverse-engineer spoken emotion

ndash Use of wide range of physiological sensors may be helpful

bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration

ndash Complex emotion dependency varies between different contexts

(eg some words more emotive than others)

9

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072

Emotion Recognition Key Current Challenges

bull Characterizing emotion mental states using robust

featuresndash Lots of features in literature

ndash Why should a given feature be not be useful

ndash Are some features more sensitive to some emotionsmental states

an w y

ndash How should temporal information best be used

10

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172

Emotion Recognition Key Current Challenges

bull Recognition of emotion in naturally occurring

speechndash Much initial work on emotion recognition done on acted speech

ndash Initial work on cognitive load classification done on carefully

controlled experimental data

ndash ea emot ons meta states ne acte or exper menta ata

ndash Taking emotion recognition from the lab to the field

11

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272

Emotion Recognition Key Current Challenges

bull Understanding application areasndash Lots of vision statements about affective computing

bull Detail still coming

ndash Understanding likely limits of automatic approaches

bull Match approaches with suitable applications

ndash

range of different situations

ndash New applications still emerging

12

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372

13

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 8: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872

Emotion Recognition Different Approaches

bull Acoustic vs prosodic vs linguistic

bull Detailed spectral vs broad spectral features

bull Direct vs adapted models

bull Static (utterance) vs dynamic (framemulti-frame)

bull Categorical vs ordinal vs regression-basedclassification

bull Recognition vs detection

8

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972

Emotion Recognition Key Current Challenges

bull Basis for emotion mental states in physiology and

hence speech productionndash Shared with expressive speech synthesis

ndash Difficult to reverse-engineer spoken emotion

ndash Use of wide range of physiological sensors may be helpful

bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration

ndash Complex emotion dependency varies between different contexts

(eg some words more emotive than others)

9

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072

Emotion Recognition Key Current Challenges

bull Characterizing emotion mental states using robust

featuresndash Lots of features in literature

ndash Why should a given feature be not be useful

ndash Are some features more sensitive to some emotionsmental states

an w y

ndash How should temporal information best be used

10

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172

Emotion Recognition Key Current Challenges

bull Recognition of emotion in naturally occurring

speechndash Much initial work on emotion recognition done on acted speech

ndash Initial work on cognitive load classification done on carefully

controlled experimental data

ndash ea emot ons meta states ne acte or exper menta ata

ndash Taking emotion recognition from the lab to the field

11

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272

Emotion Recognition Key Current Challenges

bull Understanding application areasndash Lots of vision statements about affective computing

bull Detail still coming

ndash Understanding likely limits of automatic approaches

bull Match approaches with suitable applications

ndash

range of different situations

ndash New applications still emerging

12

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372

13

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 9: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972

Emotion Recognition Key Current Challenges

bull Basis for emotion mental states in physiology and

hence speech productionndash Shared with expressive speech synthesis

ndash Difficult to reverse-engineer spoken emotion

ndash Use of wide range of physiological sensors may be helpful

bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration

ndash Complex emotion dependency varies between different contexts

(eg some words more emotive than others)

9

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072

Emotion Recognition Key Current Challenges

bull Characterizing emotion mental states using robust

featuresndash Lots of features in literature

ndash Why should a given feature be not be useful

ndash Are some features more sensitive to some emotionsmental states

an w y

ndash How should temporal information best be used

10

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172

Emotion Recognition Key Current Challenges

bull Recognition of emotion in naturally occurring

speechndash Much initial work on emotion recognition done on acted speech

ndash Initial work on cognitive load classification done on carefully

controlled experimental data

ndash ea emot ons meta states ne acte or exper menta ata

ndash Taking emotion recognition from the lab to the field

11

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272

Emotion Recognition Key Current Challenges

bull Understanding application areasndash Lots of vision statements about affective computing

bull Detail still coming

ndash Understanding likely limits of automatic approaches

bull Match approaches with suitable applications

ndash

range of different situations

ndash New applications still emerging

12

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372

13

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 10: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072

Emotion Recognition Key Current Challenges

bull Characterizing emotion mental states using robust

featuresndash Lots of features in literature

ndash Why should a given feature be not be useful

ndash Are some features more sensitive to some emotionsmental states

an w y

ndash How should temporal information best be used

10

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172

Emotion Recognition Key Current Challenges

bull Recognition of emotion in naturally occurring

speechndash Much initial work on emotion recognition done on acted speech

ndash Initial work on cognitive load classification done on carefully

controlled experimental data

ndash ea emot ons meta states ne acte or exper menta ata

ndash Taking emotion recognition from the lab to the field

11

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272

Emotion Recognition Key Current Challenges

bull Understanding application areasndash Lots of vision statements about affective computing

bull Detail still coming

ndash Understanding likely limits of automatic approaches

bull Match approaches with suitable applications

ndash

range of different situations

ndash New applications still emerging

12

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372

13

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 11: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172

Emotion Recognition Key Current Challenges

bull Recognition of emotion in naturally occurring

speechndash Much initial work on emotion recognition done on acted speech

ndash Initial work on cognitive load classification done on carefully

controlled experimental data

ndash ea emot ons meta states ne acte or exper menta ata

ndash Taking emotion recognition from the lab to the field

11

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272

Emotion Recognition Key Current Challenges

bull Understanding application areasndash Lots of vision statements about affective computing

bull Detail still coming

ndash Understanding likely limits of automatic approaches

bull Match approaches with suitable applications

ndash

range of different situations

ndash New applications still emerging

12

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372

13

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 12: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272

Emotion Recognition Key Current Challenges

bull Understanding application areasndash Lots of vision statements about affective computing

bull Detail still coming

ndash Understanding likely limits of automatic approaches

bull Match approaches with suitable applications

ndash

range of different situations

ndash New applications still emerging

12

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372

13

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 13: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372

13

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 14: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472

Emotions and CL Psychology

bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution

bull Also a signalling system between people

bull Important in learning behaviours

ndash Results of studies are complex to interpret as an engineer

lsquo rsquondash

ndash Cultural variation exists

ndash Emotions can often be discretely described only as lsquofull-blownrsquo

emotions

bull Cowie et al 2001

bull Pragmaticsndash Emotions of interest are dictated by application and availability of

labelled data usually single-culture often fully blown

14

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 15: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572

Emotions and CL Psychology

bull Emotion representations and structuresndash Affective circumplex

arousal

valence

excited

happy

sad

nervous

tense

ndash Arousal alertnessresponsiveness to stimuli

ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional

bull Reducing to two dimensions can lead to different structures

ndash Cowie et al 2001

15

depressed calm

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 16: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672

Emotions and CL Psychology

bull One choice FeelTracendash For real time

ndash Cowie et al 2000

16

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 17: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772

Emotions and CL Measuring Emotion

bull Measuring emotionsndash Self-reports of subjective experience

bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list

ndash Real time methods

-

bull eg arousal-affect emotion space cursor

ndash Cued review

bull Subjects watch video of themselves performing the task rate emotions

experienced post-hoc

ndash Observer ratings

ndash Behavioural measures (this presentation)

ndash Physiological measures (more later)

17

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 18: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash NASA Task Load Index (Hart and Staveland 1988)

18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 19: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972

Emotions and CL Measuring CL

bull Measuring CLndash Subjective rating scales

ndash Subjective workload assessment technique(Reid and Nygren 1988)

ndash Mental Effort Load

raquo a Very little conscious mental effort or

concentration re uired Activit is almost

19

automatic requiring little or no attention

raquo b Moderate conscious mental effort or

concentration required Complexity of

activity is moderately high due to

uncertainty unpredictability or

unfamiliarity Considerable attention

requiredraquo c Extensive mental effort and

concentration are necessary Very

complex activity requiring total attention

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 20: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072

Emotions and CL Measuring CL

bull Measuring other mental states (speech literature)ndash Stress

ndash Uncertainty

ndash Level of interest

ndash Deceptive speech

ndash enta sor ers

bull Depression anxiety autism

ndash Behaviour (eg drunkenness)

ndash Disfluency (indicative of some mental states)

ndash Note differences in temporal occurrence of mental state

20

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 21: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172

21

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 22: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272

Feature Extraction Origins

bull Effect of emotion on speech productionndash Scherer 1989

22

Sympathetic NS

(lsquofight or flightrsquo)

Autonomic NS

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 23: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Larsen and Fredrickson 2003

ndash Arousal component of emotion better conveyed using vocal cues

than the affective component

ndash steners can cons stent y nom nate voca cues or st ngu s ng

emotions

ndash Specific acoustic cues not generally linked to any particular emotions

bull Scherer 1986

23

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 24: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472

Feature Extraction Origins

bull Effect of emotionmental state on speech

productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)

bull Fundamental frequency F 0

bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)

bull ner y

bull Speech rate

ndash Stress related to (Streeter et al 1983)

bull Increased energy

bull Increased pitch

24

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 25: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572

Feature Extraction Methods

bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced

portion of speechndash Remove unvoicedsilence

bull Energy based methods

bull -

bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention

bull Effect of cognitive load on speech productionndash Speech rate

ndash Energy contour

ndash F 0

ndash Spectral parameters

bull Scherer et al 200225

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 26: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672

Feature Extraction Methods

bull Energy

02 04 06 08 1 12 14 16 18 2

x 104

-1

0

1

2

x 104

neutral speech

26

02 04 06 08 1 12 14 16 18 2

x 104

-2

0

2

x 104

10 20 30 40 50 60 70 80 90

10

15

10 20 30 40 50 60 70 80 90

10

15

20

angry speech

log energy

log energy

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 27: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772

Feature Extraction Methods

bull Spectral featuresndash Tilt

bull Many systemsbull lsquoBroadrsquo spectral measure

ndash Weighted frequency

lsquo rsquo

ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure

ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)

bull Detailed spectral measure

ndash Group delay from all pole spectrum

bull Captures bandwidth information (Sethu et al 2007)

ndash Formant frequencies

ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures

27

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 28: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872

Feature Extraction Methods

bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for

estimating glottal flow from speech (Alku 1991)

ndash Glottal formants F g F c from glottal flow spectrum

bull Sethu 2009

ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)

pr mary spee quo en

bull Airas and Alku 2006 Yap et al 2010

ndash Voice quality parameters

bull Lugger and Yang 2007

ndash Glottal statistics

bull Iliev and Scordilis 2008

ndash Vocal fold opening

bull Murphy and Laukkanen 2010

ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 29: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972

Feature Extraction Methods

bull Glottal features (example)ndash Yap et al 2010

ndash High load harr creaky voice quality

29

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 30: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)

Sethu

2009

30

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 31: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172

Feature Extraction Empirical Comparison

bull Top individual features (12 actors 14 emotions)ndash Scherer 1996

ndash Pitch

bull Mean standard deviation 25th and 75th percentile

ndash Energy ndash mean

ndash Speech rate

ndash Long-term voiced average spectrum

bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz

ndash Hammarberg index

bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz

ndash Voicing energy up to 1 kHz

ndash Long-term unvoiced average spectrum

bull 125-200 5000-8000 Hz31

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 32: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272

Feature Extraction Empirical Comparison

bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009

32

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 33: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372

Feature Extraction Temporal Information

bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t

ndash k how many delta features

ndash spac ng etween eatures

ndash d controls span of each delta

ndash Important in cognitive load (Yin et al 2008)

bull MFCC 522

bull MFCC+Prosodic (Concatenated) 593

bull MFCC+Prosodic Acceleration 644

bull MFCC+Prosodic SDC 657

bull Much larger performance difference in other CL research ~20-60 impr

33

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 34: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472

Feature Extraction Temporal Information

bull Parameter Contoursndash Pitch Sethu 2009

bull 5-class LDC Emotional Prosody

34

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 35: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572

Feature Extraction Empirical Comparison

bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database

FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy

MFCC 522

35

MFCC + SDC 602

MFCC + Pitch + Energy + SDC 792

MFCC + Pitch + Energy + 3 Glottal features + SDC 844

F1 + F2 + F3 + SDC 677

SCF + SCA + SDC 872

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 36: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672

Feature Extraction Prosodic

bull Emotionndash Duration rate pause pitch energy features

bull Based on ASR alignmentbull Ang et al 2002

ndash Pitch level and range speech tempo loudness

bull Emotional speech synthesisbull Schroumlder 2001

bull Cognitive loadndash Duration

bull Speech is slower under higher cognitive load

bull Pitch formant trajectories less variable (more monotone)

bull Yap et al 2009

36

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 37: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772

Feature Extraction Linguistic

bull Emotionndash Fused linguistic explored

in a few papersndash eg 10-30 impr

Lee et al 2002

ndash g t c er et a

37

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 38: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872

Feature Extraction Linguistic

bull Higher cognitive loadndash Sentence fragments false starts and errors increase

ndash Onset latency pauses increasendash Articulationspeech rate decrease

bull Berthold and Jameson 1999

ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e

measuresndash More use of plural pronouns less of singular

bull Khawaja et al 2009

ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date

38

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 39: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972

Speech Cues Related to Cognitive Load (CL)

bull Disfluenciesndash Interruption rate

ndash Proportion of the effective speech in the whole speech period

ndash Keywords for correction or repeating bull Inter-sentential pausing

ndash Length and frequency of big pauses

39

ndash Length and frequency of small pausesndash Length of intra-sentence segments

bull Slower speech ratendash Syllable rate

bull Response Latencyndash Delay of generating speech

ndash Particular hybrid prosodic pattern

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 40: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072

Classification

40

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 41: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172

Sources of Variability in Emotion Features

bull Two main sources of variabilityndash Phonetic

ndash Speaker identity

bull Both stronger sources of variability

bull

ndash Model itbull eg UBM-GMM with many mixtures

ndash Remove it

bull eg normalisation

bull Other variabilityndash Phoneticlinguistic dependence of emotion information

ndash Session

ndash Age 41

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 42: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272

Normalisation

bull Typesndash Per utterance

ndash Per speaker Sethu et al 2007

42No warping Warping

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 43: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372

Modelling

bull Usual considerationsndash Structure of problem andor database

ndash Amount of training data available rarr number of parameters to use

bull Rate of classification decisionndash Often one per utterance

bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip

ndash Continuous eg [-11]

ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo

43

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 44: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472

Modelling ndash Common Approaches

bull Gaussian mixture model (GMM)

ndash w m weight of the mrsquoth mixture

rsquo

sum=

minus

minusminusminus=

M

m

mm

T

m

mK

mw p1

1

2 12

)()(

2

1exp

)2(

1)( microxCmicrox

C

x

π

ndash m

ndash CCCCm covariance matrix of the mrsquoth mixture

44-6 -4 -2 0 2 4 6 8

0

01

02

03

04

05

06

07

08

09

1

overall pdf

individual

mixtures

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 45: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572

Modelling ndash Common Approaches

bull Maximum a priori GMMndash Previously Train each GMM separately for each target class

ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)

ndash Acoustic space of all speakers

ndash Trained on ver lar e database

bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages

bull Mixtures in each emotion class with same index correspond to same

acoustic class

bull Fast scoring

45

UBM-GMM

Adapted modelhappiness

Adapted model

sadness

x1

x2

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 46: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Can capture temporal variation

bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours

ndash Nwe et al 2003 LFPC features F0 speaking rate

1 32

ndash e u e a p c ra ec or es n u erance

bull Assumes emotions have same trajectory structure in all utterances

46

Nwe et al 2003Larger utterance

variability than Schuumlller

et al 2003

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 47: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772

Modelling ndash Common Approaches

bull Hidden Markov model (HMM)ndash Lee et al 2004

ndash Phoneme class based statesbull Vowelglidenasalstopfricative

bull Clear what a state represents no assumptions about utterance structure

ndash

47

f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 48: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872

SVM Classifier with GMM Super Vectors

NICTA Copyright 201048

991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086

991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150

983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150

983140983145983155983156983154983145983138983157983156983145983151983150983155

Cl ifi ti E i i l R lt

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 49: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972

Classification Empirical Results

ndash LDC Emotional Prosody

ndash Static classifiers

ndash Sethu et al 2009

ndash

ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC

GMM 431 371

PNN 441 356

SVM 405 399

Database (EMO-DB)ndash Schuumlller et al 2005

49

Cl ifi ti H P f

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 50: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072

Classification Human Performance

bull Emotionndash Dellaert et al 1996

ndash ~ 17 for 7-classspeaker dependent

(independent 35)

c er et a

~60 acc for 8-class

bull Cognitive loadndash Le 2009

ndash 62 acc overall

50

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 51: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172

and their Design

51

Databases Desirable Attributes

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 52: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272

Databases Desirable Attributes

bull Emotionndash Emotions sufficiently fully blown to be distinguishable

ndash Naturalnessndash Listener ratings

ndash Matched to context of use

bull Cognitive loadndash Multiple discrete load levels

ndash Levels distinguishable by subjective ratings

ndash Possibly additional sensor data (physiological)

ndash No stress involved subjects motivated

bull Allndash Large multiple speaker

ndash Phonetically linguistically diverse 52

Databases Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 53: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372

Databases Design

bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem

bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content

bull

ndash Allow use of emotive words

bull Descriptorsndash Elicited emotionsrarr well defined descriptors

ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings

ndash Douglas-Cowie et al 2003

53

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 54: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472

Databases Resources (English)

bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories

ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro

bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +

valenceactivationdominance labelling

ndash httpsailusceduiemocap

bull LDC Emotional Prosody

ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC

2002S28 US$2500

54

Databases Resources (English)

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 55: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572

Databases Resources (English)

bull Reading-Leedsndash 4frac12 hours unscripted TV interviews

bull Cognitive Loadndash DRIVAWORK

bull 14 hours in-car speech under varying workload several physiological

reference signals

ndash SUSAS SUSC-1

bull contains some psychological stress conditions

bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S

78 US$500

ndash Other databases proprietary

bull More complete listing

ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55

Databases Experimental Design

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 56: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672

Databases Experimental Design

bull Cognitive load task design Span testsndash Word span digit span

bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task

ndash Conway et al 2005

ndash Reading span

bull Ability to co ordinate processing and storage resources

bull Subjects must read fluently and answer comprehension questions

ndash Daneman and Carpenter 1980

ndash Speaking span

bull Randomly selected words

bull Speaking span = maximum number of words for which a subject can

successfully construct a grammatically correct sentence

ndash Daneman 1991

ndash Others counting operation reading digit

56

Databases CL Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 57: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772

Databases CL Examples

bull Stroop testndash Subjects presented colour words in

different coloursbull Low congruent Medium incongruent

ndash Time pressure added for high load

bull Story reading ndash Story reading followed by QampA

ndash 3 different levels of text difficulty (Lexile Framework for Reading

wwwlexilecom)

ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session

ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)

ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp

ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)

57

Databases Trends

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 58: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872

Databases Trends

bull lsquoNaturallyrsquo elicited emotion

bull Larger databases

bull Telephone speech

bull Realistic data (ie without controlled lab conditions)

bull Later noise

58

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 59: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972

Applications of Emotion

Load Measurement

59

Applications Principles

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 60: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072

Applications Principles

bull Like other speech processing appsndash Need microphone

ndash Best in clean environmentsbull Trend towards processing power in the loop

ndash VoIP

ndash Mobile phone iPhone SmartPhone

bull Trend towards user-aware interfacesndash Affective computing

bull The work overload problemndash Mental processes play an increasing role in determining the

workload in most jobs (Gaillard et al 1993)

60

Applications Task Performance

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 61: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172

pp

bull Adjusting the human-computer balance for

optimised task performance

ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks

61

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 62: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272

pp p

bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity

ndash Spare or residual capacity turns out to be an important practicalmeasure

bull Parasuraman et al 2008

ndash Correlation between workload and performance is poorer inunderload and overload tasks

bull Vidulich and Wickens 1986 Yeh and Wickens 1988

ndash No studies for physiological or behavioural measures

62

Applications Examples

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 63: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372

pp p

bull From the literature ndash Air traffic control vehicle handling shipping military rail

bull Embrey et al 2006

ndash Learning and vigilance tasks

bull Sweller 1988 Berka et al 2007

ndash

bull Lamble et al 1999ndash Team decision efficiency

bull Johnston et al 2006

ndash Collaboration load

bull Dillenbourg and Betrancourt 2006ndash Augmented cognition

bull Schmorrow 2005

63

Applications Biomedical

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 64: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472

bull Brain training for ADHDndash Improve working memory

ndash wwwcogmedcomndash Supported by substantial research base

bull Diagnosistreatment of mental disorders ndash Objective measure

ndash eg Workman et al 2007

ndash Still to be investigated

bull Management of psychological stressndash Has occupational health and safety implications

ndash Still to be investigated

64

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 65: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572

Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion

Emotion Recognition ndash Speech Pitch Contour

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 66: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672

PitchEstimation

EmotionalSpeech

VAD

Segmented PitchContour

Segmental Linear

Approximation

f s e t b

NICTA Copyright 201066

16 17 18 19 2

I n i t i a l O f

Parameters ndash b s x

PITCH ndash LINEAR APPROXIMATION

05 1 15 2 25 3 35

HMM BASED

EMOTION MODELS

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 67: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772

bull Ideally want to compare emotion recognition and

CL systems on same data but

ndash No ground truthbull Classification becomes clustering

ndash

NICTA Copyright 201067

bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise

ndash Which data

bull Almost by definition there is no database that contains

both emotion and cognitive load that is also usefully

annotated

Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 68: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872

bull What we did

ndash Took two near-state-of-the-art systems

bull Relative to LDC Emotional Prosody andbull Stroop Test corpora

ndash Applied each system to both corpora

NICTA Copyright 201068

bull Use closed-set trainingtest

ndash Produced a set of emotion and cognitive load labels for

each corpus

ndash Compared clusterings based on labels

Emotion vs CL Rand IndexEmotion vs CL Rand Index

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 69: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972

bull Adjusted Rand Index

NICTA Copyright 201069

ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical

ndash ARI cong 0 means random labeling

Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 70: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072

bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions

bull Neutral

bull Angerbull Sadness

bull Happiness

NICTA Copyright 201070

bull Stroop Test Datandash 3 CL levels

bull Resultsndash LDC Emotional Prosody

bull ARI = 00087

ndash Stroop Test Corpus

bull ARI = 00253

Conclusion

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 71: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172

bull Data

ndash Plenty of acted experimental databases as reference

ndash Natural data are current challenge

bull Features

ndash Broad vs detailed spectral features implications for modelling

ndash Temporal information needs to be captured

ndash Centroid features (together) are promising

bull Classification

ndash Speaker variability is a very strong effect

ndash Select correct classification paradigm for problem

bull Applications

ndash Emerging many still to be well defined

71

Acknowledgements

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72

Page 72: Emotion Recognition and Cognitive

7242019 Emotion Recognition and Cognitive

httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272

bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo

Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang

bull 11 PhD students (UNSWNICTA)

ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen

ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators

ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia

Tech US) Prof John Sweller (UNSW)

72