Acoustic Cues to Emotional Speech

24
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003

description

Acoustic Cues to Emotional Speech. Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003. Motivation. A speaker’s emotional state conveys important and potentially useful information - PowerPoint PPT Presentation

Transcript of Acoustic Cues to Emotional Speech

Page 1: Acoustic Cues to Emotional Speech

Acoustic Cues to Emotional Speech

Julia Hirschberg

(joint work with Jennifer Venditti and Jackson Liscombe)

Columbia University

26 June 2003

Page 2: Acoustic Cues to Emotional Speech

Motivation

• A speaker’s emotional state conveys important and potentially useful information– To recognize (e.g. Spoken Dialogue Systems ,

tutoring systems )– To generate (e.g. games)– If we know what emotion is and what aspects of

productions convey different types• Defining emotion in multidimensional space

– Valence: happy vs. sad– Activation: sad vs. despairing

Page 3: Acoustic Cues to Emotional Speech

• Features that might convey emotion– Acoustic and prosodic– Lexical and syntactic– Facial and gestural

Page 4: Acoustic Cues to Emotional Speech

Previous Research

• Emotion detection in corpus studies– Batliner, Noeth, et al; Ang et al:

anger/frustration in dialogue systems– Lee et al: pos/neg emotion in call center data– Ringel & Hirschberg: voicemail

• … in laboratory studies– Forced choice among 10-12 emotion categories– Sometimes with confidence rating

Page 5: Acoustic Cues to Emotional Speech

Problems

• Hard to identify emotions reliably– Variation in ‘emotional’ utterances: production

and perception– How can we obtain better training data?

• Easier to detect variation in activation than in valence– Variation in ‘emotional’ utterances– Large space of potential features– Which are necessary and sufficient?

Page 6: Acoustic Cues to Emotional Speech

New methods for eliciting judgments

• Hypothesis: Utterances in natural speech may evoke multiple emotions– Elicit judgments on multiple scales– Tokens from LDC Emotional Prosody Speech

and Transcripts Corpus• Professional actors reading 4-syllable dates

and numbers• disgust, panic, anxiety, hot anger, cold anger,

despair, sadness, elation, happiness, interest, boredom, shame, pride, contempt, neutrality

Page 7: Acoustic Cues to Emotional Speech

• Modified category set: – Positive: confident, encouraging, friendly,

happy, interested– Negative: angry, anxious, bored, frustrated, sad– Neutral

• For study: 1 token of each from each of 4 voices plus practice tokens

• Subjects participated over the internet

Page 8: Acoustic Cues to Emotional Speech

– 40 native speakers of standard American English with no reported hearing impairment

– 17 female, 23 male, all 18+– 4 random orders rotated among subjects

Page 9: Acoustic Cues to Emotional Speech

Correlations between Judgments

sad ang bor fru anx fri con hap int enc

sad .06 .44 .26 .22 -.27 -.32 -.42 -.32 -.33

angry .05 .70 .21 -.41 .02 .37 -.09 -.32

bored .14 -.14 -.28 -.17 -.32 -.42 -.27

frustrated .32 -.43 -.09 -.47 -.16 -.39

anxious -.14 -.25 -.17 .07 -.14

friendly .44 .77 .59 .75

confident .45 .51 .53

happy .58 .73

interested .62

encouraging

Page 10: Acoustic Cues to Emotional Speech

What acoustic features correlate with which emotion categories?

– F0: min, max, mean, ‘range’, stdev– RMS: min, max, mean, range, stdev– Voiced samples/all samples (VCD)– Mean syllable length– TILT: spectral tilt (2-1 harmonic over 30ms

window) of highest ampl vowel, nuclear stressed vowel

– Type of nuclear accent, contour, phrasal ending

Page 11: Acoustic Cues to Emotional Speech

Results

• F0, RMS and rate distinguish emotion categories by activation (act)– +act correlate with higher F0 and RMS, faster– do not distinguish valence (val)

• Tilt of highest amplitude vowel groups +act emotions with different val into different categories (e.g. friendly, happy, encouraging vs. angry, frustrated)

• Phrase accent/boundary tone also separates +val from -val

Page 12: Acoustic Cues to Emotional Speech

– H-L% positively correlated with -val and negatively with +val

– +val positively correlated with L-L% and -val not

Page 13: Acoustic Cues to Emotional Speech

Predicting Emotion Categories Automatically

• 1760 judgment/token datapoints (90%/10% training/test)– collapse 2-5 ratings to one

• Ripper machine learning algorithm– Baseline: choose most frequent ranking– Mean performance over all emotions 75% (22%

improvement over baseline)– Individual emotion categories

Page 14: Acoustic Cues to Emotional Speech

– Happy, encouraging, sad, and anxious predicted well

– Confident and interested show little improvement

– Which features best predict which emotion categories?

Page 15: Acoustic Cues to Emotional Speech

Best Performing Features

Emotion Feature Accuracy Angry F0*, RMS*, TILT*,

VCD 77.3/69.3%

Confident F0_range, F0_mean 76.1/75.0% Happy F0_min 81.3/57.4% Interested F0_stdev 75.6/69.9% Encouraging VCD 73.9/52.3%

Page 16: Acoustic Cues to Emotional Speech

Sad F0_max 81.3/61.9%

Anxious Tilt_RMS 78.4/55.7%

Bored Tilt_RMS 80.1/66.5%

Friendly Tilt_stress 75.0/59.1%

Frustrated F0_max 75.0/59.1%

Page 17: Acoustic Cues to Emotional Speech

Conclusions

• New features to distinguish valence: spectral tilt and prosodic endings

• New understanding of relations among emotion categories– Judgments– Features

Page 18: Acoustic Cues to Emotional Speech

Current/Future Work

• Use ML to rank rather than classify (RankBoost)• Eye-tracking task, matching tokens to ‘emotional’

pictures– Web survey to ‘norm’ pictures– Layout issues

Page 19: Acoustic Cues to Emotional Speech
Page 20: Acoustic Cues to Emotional Speech
Page 21: Acoustic Cues to Emotional Speech
Page 22: Acoustic Cues to Emotional Speech
Page 23: Acoustic Cues to Emotional Speech
Page 24: Acoustic Cues to Emotional Speech