Emotional speech: Towards a new generation of - ResearchGate
Emotional Speech
description
Transcript of Emotional Speech
![Page 1: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/1.jpg)
Emotional Speech
![Page 2: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/2.jpg)
Overview
Who cares? The Idea of Emotion Difficulties in approaching Describing Emotion Computational Models Modeling Emotion in Speech An example – Ang ’02
![Page 3: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/3.jpg)
Who Cares?
Practical impact Detecting Frustration/Anger Stress/Distress Help call prioritizing Tutorials – Boredom/Confusion/Frustration
Pacing/Positive feedback User acceptance
Users preferred talking head using ES (Stallo, in Schröder)
![Page 4: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/4.jpg)
Who Cares?
Esoteric Impact Is artificial intelligence possible w/o detection of
emotion? w/o display of “emotion”?
Do we experience someone/something as understanding us if it can’t understand our emotional state/experience?
![Page 5: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/5.jpg)
Who Cares? – Izard ’77
Emotion & Perception E & Cognition E & Action E & Personality Development Understanding a speaker’s emotional state
gives us insight into his/her intention, desire, motivation (Zimring)
![Page 6: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/6.jpg)
The Bad News (Picard ’97)
Maintaining realistic expectations User’s confidence in information Potential to forge affective channels Problem solving vs. empathic/observational Symmetry of communication Privacy issues
![Page 7: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/7.jpg)
Idea of Emotion (Hergenhahn ’01)
Descartes “Passions”
Understood emotions as originating from bothphysiological and cognitive sources
Pineal gland
Late 1800’s – early 1900’s Psychology was study of consciousness
William James “The Science of Mental Life” Major method was introspection – mental
– Relies on a person reporting his/her experience
![Page 8: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/8.jpg)
Idea of Emotion
1930’s – 1950’s Behaviorist tradition – study of behavior
“Objective” (at least measurable and observable) Emerged from academia – a lot of rats suffered Explains everything in terms of stimulus / response Fails to explain some crucial issues, e.g., language
![Page 9: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/9.jpg)
Idea of emotion
1950’s – Cognitive “Revolution” Piaget, Miller, Chomsky, et al.
Miller The Science of Mental Life John Searle
Syntax vs. semantics Materialism vs. Dualism What are reasonable expectations?
No one expects to get wet in a pool filled with ping pong ball models
of water molecules.Searle ’90
![Page 10: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/10.jpg)
Difficulties in approaching (Cowie)
E is resistant to capture in symbols Speech presents special problems Modeling of primary E’s not so useful Consensus Display Rules (Ekman) Mixes “Love/hate relationship” Negative response to simulated displays
![Page 11: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/11.jpg)
Difficulties in approaching
Quality of reference data Rating believability (Schröder)
Forced choice tests often ignore issue of appropriateness/believability
“How appropriate was utterance to given E” (Rank 98)
(Iida, et al.) Rated using scales for preference and for subjective degree of expressed E.
Subject generosity Temporal and contextual relationships
“[Utterances were] said by two actors in the emotions of happiness, sadness, anger […]”
Pereira
![Page 12: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/12.jpg)
Describing Emotion
Invariants
Everything it is possible to analyze depends on a clear method of distinguishing the
similar from the dissimilar.– Carl Linnaeus
= =
= ≠
≠
![Page 13: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/13.jpg)
Describing Emotion (Cowie)
Primary emotions Acceptance, anger, anticipation, disgust, joy,
fear, sadness, surprise
Secondary Emotions Arousal Attitude An aside: Intention may generate all of these
![Page 14: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/14.jpg)
activity decisiveness haughtiness restrained adoration delighted helplessness restraint alarm dependence hope righteousness alertness depression humiliation rigor anger desire indifference routine animosity despair inferiority sadness annoyance dimness initiative satisfaction anxiety disappointment intensity satisfied appetite disgust interest skepticism approval disqualification scorn artificiality disregard involvement serenity astonishment disrespect joy servility at ease distress leniency shame attraction droopy loneliness sharpness balanced embarrassment longing shyness belonging embitterment love simplicity bitterness enjoyment meditative sincerity bliss envy mirth sleepy restlessness blur exaggeration misery slumber boldness excitement sorrow boredom fatigue naturalness stability calmness fear nervousness stubbornness caution firmness pain suffering clearness frankness panic superiority compassion fondness passiveness surprise complexity friendly patience suspiciousness concern frustration pity sympathy conciliated gaiety pleasure tenderness confidence generosity posing tension constraint gloom pride tolerance hate confusion grateful quiescence tranquility contempt greediness regret uneasiness contentment grievance relaxed unstable courage guilt relief vigilance yearning craving happiness repulsion weakness criticism haste respect worry curiosity
![Page 15: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/15.jpg)
Data of Emotion (Lang ’87)
Everyone generally agrees on existence Basic datum is a state of feeling
Completely private Include understanding of antecedents and
consequences Important to determine how E is represented
in memory Suggest a Turing test (but don’t describe…)
“…emotion is a fact upon which all introspection agrees. [Most emotional states] are states which
we have experienced personally.(Gellhorn & Loofbourrow ’63)
![Page 16: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/16.jpg)
Describing Emotion
One approach: continuous dim. model (Cowie/Lang)
Activation – evaluation space Add control Curse of dimensionality Primary E’s differ on at least 2 dimensions of
this scale (Pereira)
![Page 17: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/17.jpg)
Computational Models (Pfeifer ’87)
Emotion as process Emotion generation Influence of emotion Goal oriented nature Interaction between subsystems E. as heuristics Representation of emotion
![Page 18: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/18.jpg)
Computational Models (Pfeifer ’87)
Examines models dimensionally A) Symbolic vs non-symbolic (cognitive vs AI) B) Augmented by emotion vs focused on emotion
All approaches deal with E as process Unclear whether system state = emotion Models must function in complex, uncontrollable,
unpredictable context No model for physiological aspect Emotions tightly coupled to commonsense reasoning
![Page 19: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/19.jpg)
Modeling Emotion in Speech
Synthesis: basic issues (Schröder) How is a given emotion expressed? Which properties of the E state are to be
expressed? Relationship between this state and another
Approaches Formant synthesis (Burkhardt) Diphone concatenation Unit selection
![Page 20: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/20.jpg)
Modeling Emotion in Speech
Formant synthesis (Burkhardt) High degree of control “emoSyn”
Mean pitch, pitch range, variation, phrase and word contour, flutter, intensity, rate, phonation type, vowel precision, lip spread
Two experiments Stimuli systematically varied, then classified Prototype generated and varied slightly
![Page 21: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/21.jpg)
Modeling Emotion in Speech
Formant synthesis (Burkhardt) Fear
High pitch, broad range, falsetto voice, fast rate Joy
Broader pitch range, faster rate, modal or tense phonation, precise articulation
Lowest recognition rates (perhaps due to intonation patterns)
Boredom Lowered mean pitch, narrow range, slow rate, imprecise
articulation
![Page 22: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/22.jpg)
Modeling Emotion in Speech
Formant synthesis (Burkhardt) Sadness
Narrow range, slow rate, breathy articulation Also raised pitch, falsetto Possible that sadness was imprecise term
Anger Faster rate, tense phonation
General results Recognition rates are comparable to natural speech,
especially when the categories from experiment 2 are recombined.
![Page 23: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/23.jpg)
Modeling Emotion in Speech
Generally: tradeoff between flexibility of modeling and naturalness: Rule-based less natural Selection-based is less flexible
![Page 24: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/24.jpg)
An Example – Ang ’02
Prosody-Based detection of annoyance/ frustration in human computer dialog
DARPA Communicator Project Travel Planning Data (a simulation) (NIST, UC Boulder, CMU)
Considers contributions of prosody, language model, and speaking style
Doesn’t begin with a strong hypothesis
![Page 25: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/25.jpg)
An Example – Ang ’02
Uses recognizer output (sort of) Examines rel. of emotion and speaking style Uses hand coded style data
Hyperaticulation, pauses, raised voice
Repeated requests or corrections Hand labeled emotion relative to speaker
Original and consensus labels
![Page 26: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/26.jpg)
An Example – Ang ’02
Emotion Class Instances Percent
NEUTRAL 41545 83.84%
ANNOYED 3777 7.62%
FRUSTRATED 358 0.72%
TIRED 328 0.66%
AMUSED 326 0.66%
OTHER 115 0.23%
NOT-APPLICABLE 3104 6.26%
Total 49553 100.0%
![Page 27: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/27.jpg)
An Example – Ang ’02
Prosodic Features Duration and speaking rate Pause, pitch, energy, spectral tilt
Non-prosodic Features Repetitions & corrections Position in dialog
Language model features Discriminated using decision trees
“Brute force iterative algorithm” to determine useful features With and without LM features
![Page 28: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/28.jpg)
An Example – Ang ’02
![Page 29: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/29.jpg)
Ang ’02 – Decision Tree Usage
Temporal features 28% Longer duration, slow speaking rate corr.
w/ frustration
Pitch features 26% Generally, high F0 correlated w/ frustration
Repeats/corrections (= system error) 26% Correlated w/ frustration
Raised Voice
![Page 30: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/30.jpg)
Ang ’02 – Results
![Page 31: Emotional Speech](https://reader036.fdocuments.net/reader036/viewer/2022062810/56815c0c550346895dc9ed4d/html5/thumbnails/31.jpg)
Ang ’02 – Results
Performance better by 5-6% for utterances on which labelers originally agreed
Use of the repeat/correction feature improves success by 4%
Frustration vs Else – very little data Only slight difference between labeled and
recognized