Emotional Grounding in Spoken Dialog Systems Jackson Liscombe [email protected] Giuseppe...

38
Emotional Grounding in Spoken Dialog Systems Jackson Liscombe [email protected] Giuseppe Riccardi Dilek Hakkani-Tür [email protected] [email protected]
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of Emotional Grounding in Spoken Dialog Systems Jackson Liscombe [email protected] Giuseppe...

Emotional Grounding in Spoken Dialog

SystemsJackson Liscombe

[email protected]

Giuseppe Riccardi Dilek Hakkani-Tü[email protected] [email protected]

10.14.04 Jackson Liscombe -- CU / AT&T 2

In Spoken Dialog Systems, users can …

… start angry.

… get angry.

… end angry.

The Problem: Emotion

10.14.04 Jackson Liscombe -- CU / AT&T 3

Outline

Previous Work

Corpus Description

Feature Extraction

Classification Experiments

10.14.04 Jackson Liscombe -- CU / AT&T 4

Outline

Previous Work

Corpus Description

Feature Extraction

Classification Experiments

10.14.04 Jackson Liscombe -- CU / AT&T 5

Past Work

I. Isolated Speech

II. Spoken Dialog Systems

10.14.04 Jackson Liscombe -- CU / AT&T 6

Past Work: Isolated Speech

Acted Data Features:

F0/pitch energy speaking rate

Researchers (late 1990s - present) Aubergé, Campbell, Cowie, Douglas-Cowie,

Hirscheberg, Liscombe, Mozziconacci, Oudeyer, Pereira, Roach, Scherer, Schröder, Tato, Yuan, Zetterholm, …

10.14.04 Jackson Liscombe -- CU / AT&T 7

Past Work: Spoken Dialog Systems (1) Batliner, Huber, Fischer, Spilker, Nöth (2003)

system = Verbmobil (Wizard of Oz scenarios) binary classification features:

prosodic lexical (POS tags, swear words) dialog acts (repeat/repair/insult)

0.1% relative improvement using dialog acts

10.14.04 Jackson Liscombe -- CU / AT&T 8

Past Work: Spoken Dialog Systems (2) Ang, Dhillon, Krupski, Shriberg, Stolcke (2002)

system = DARPA Communicator binary classification features:

prosodic lexical (language model) dialog acts (repeats/repairs)

4% relative improvement using dialog acts

10.14.04 Jackson Liscombe -- CU / AT&T 9

Past Work: Spoken Dialog Systems (3) Lee, Narayanan (2004) system = Speechworks call-center

binary classification features:

prosodic lexical (weighted mutual information) dialog acts (repeat/rejection)

3% improvement using dialog acts

10.14.04 Jackson Liscombe -- CU / AT&T 10

Past Work: Summary

Past research has focused on acoustic data

But, moving toward grounding emotion in context (dialogs acts)

Summer work = extend contextual features for better emotion prediction

10.14.04 Jackson Liscombe -- CU / AT&T 11

Outline

Previous Work

Corpus Description

Feature Extraction

Classification Experiments

10.14.04 Jackson Liscombe -- CU / AT&T 12

Corpus Description

AT&T’s “How May I Help You?SM” corpus (0300 Benchmark)

Labeled with “Voice Signature” information: user state (emotion) gender age accent type

10.14.04 Jackson Liscombe -- CU / AT&T 13

Corpus Description

Statistic Training Testing

number user turns 15,013 5,000

number of dialogs 4,259 1,431

number of turns per dialog 3.5 3.5

number of words per turn 9.0 9.9

10.14.04 Jackson Liscombe -- CU / AT&T 14

User Emotion Distribution

0

10

20

30

40

50

60

70

80

90

Percent

Other Very Neg Very Frust Very Angry OtherSomewhat Neg

SomewhatAngry

SomewhatFrust

Positve/Neutral

User State

Emotion Label Distribution

10.14.04 Jackson Liscombe -- CU / AT&T 15

Emotion Labels

Original Set: Positive/Neutral

Somewhat Frustrated Very Frustrated Somewhat Angry Very Angry Other Somewhat

Negative Very Negative

Reduced Set: Positive

Negative

10.14.04 Jackson Liscombe -- CU / AT&T 16

Corpus Description: Binary User States

Statistic Training Testing

% of turns that are positive 88.1% 73.1%

% of dialogs with at least one negative turn

24.8% 44.7%

% of negative dialogs that start negative

43.5% 59.9%

% of negative dialogs that end negative

42.4% 48.7%

10.14.04 Jackson Liscombe -- CU / AT&T 17

Outline

Previous Work

Corpus Description

Feature Extraction

Classification Experiments

10.14.04 Jackson Liscombe -- CU / AT&T 18

Feature Set Space

Features

Context

Prosodic Lexical Discourse

turni

turni-1 turni

turni-2 turni-1

… … … …

10.14.04 Jackson Liscombe -- CU / AT&T 19

Feature Set Space: Context Overview Features

Context

Prosodic Lexical Discourse

turni Isolated

turni-1 turni

Differentials Prior Statisticsturni-2 turni-1

10.14.04 Jackson Liscombe -- CU / AT&T 20

Lexical Features

Language Model (ngrams) Examples of words significantly correlated

with negative user state (p<0.001) : 1st person pronouns: ‘I’, ‘me’ requests for a human operator: ‘person’, ‘talk’,

‘speak’, ‘human’, ‘machine’ billing-related words: ‘dollars’, ‘cents’ curse words: …

10.14.04 Jackson Liscombe -- CU / AT&T 21

Prosodic Features

Praat open source tool for speech analysis, synthesis,

statistics, manipulation, … Paul Boersma and David Weenink  University of Amsterdam www.praat.org

10.14.04 Jackson Liscombe -- CU / AT&T 22

Prosodic Features

Pitch (F0)1. overall minimum

2. overall maximum

3. overall median

4. overall standard deviation

5. mean absolute slope

6. slope of final vowel

7. longest vowel mean

Other8. local jitter over longest

vowel

Energy9. overall minimum

10. overall maximum

11. overall mean

12. overall standard deviation

13. longest vowel mean

Speaking Rate14. vowels per second

15. mean vowel length

16. ratio voiced frames to total frames

17. percent internal silence

Prosodic Features in Action

10.14.04 Jackson Liscombe -- CU / AT&T 24

“That amount is incorrect.”

A View Of Pitch

20020508/0508095935fhf2910

Very Frustrated

Somewhat Frustrated

10.14.04 Jackson Liscombe -- CU / AT&T 25

Person Transcription Emotion Play

… … …

Agent:I'm sorry, I didn't understand that. How may I help you?

User:please an operator please [ .hst ] may I

speak-Positive/Neutral

Agent:I'm sorry we're having trouble. Can we try one more time? Just briefly tell me

how I may help you.

User: may I speak to an assistant please Somewhat Frustrated

Agent: Are you calling from your home phone?

User: yes Somewhat Angry

20020221/0221080552atf1536

Pitch/Energy/Speaking Rate : Change

10.14.04 Jackson Liscombe -- CU / AT&T 26

20020221/0221080552atf1536

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Z score

Positive Frustrated Angry

Utterance

Median Pitch Mean Energy Speaking Rate

Pitch/Energy/Speaking Rate : Change

10.14.04 Jackson Liscombe -- CU / AT&T 27

Person Transcription Emotion Play

Agent:Hello, I am an AT&T automated assistant. You can speak naturally to me. How may I help you ?

User:I need to find out about a number that I

don't recognizePositive/Neutral

Agent:Would you like to look up a number

you don't recognize on you bill?

User: yes I would Positive/Neutral

Agent: Are you calling from your home phone?

User: yes I am Positive/Neutral

20020317/0317220739atf9051

Pitch/Energy/Speaking Rate : No Change

10.14.04 Jackson Liscombe -- CU / AT&T 28

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Z score

Positive Positive Positive

Utterance

Median Pitch Mean Energy Speaking Rate

20020317/0317220739atf9051

Pitch/Energy/Speaking Rate : No Change

10.14.04 Jackson Liscombe -- CU / AT&T 29

Feature Set Space: Baseline

Features

Context

Prosodic Lexical Discourse

turni

turni-1 turni

turni-2 turni-1

… … … …

10.14.04 Jackson Liscombe -- CU / AT&T 30

Discourse Features

Dialog Acts greeting re-prompt confirmation specification acknowledgment disambiguation …

10.14.04 Jackson Liscombe -- CU / AT&T 31

Feature Set Space: State-of-the-Art Features

Context

Prosodic Lexical Discourse

turni

turni-1 turni

turni-2 turni-1

… … … …

10.14.04 Jackson Liscombe -- CU / AT&T 32

Contextual Features

Lexical (2) edit distance with

previous 2 turns

Discourse (10) turn number call type repetition

with previous 2 turns dialog act repetition

with previous 2 turns

Prosodic (34) 1st and 2nd order

differentials for each feature

Other (2) user state of previous

2 turns

10.14.04 Jackson Liscombe -- CU / AT&T 33

Feature Set Space: Contextual Features

Context

Prosodic Lexical Discourse

turni

turni-1 turni

turni-2 turni-1

… … … …

10.14.04 Jackson Liscombe -- CU / AT&T 34

Outline

Previous Work

Corpus Description

Feature Extraction

Classification Experiments

10.14.04 Jackson Liscombe -- CU / AT&T 35

Experimental Design

Training size = 15,013 turns Testing size = 5,000 turns Most frequent user state (positive) accounts

for 73.1% of testing data Learning Algorithm Used:

BoosTexter (boosting w/ weak learners) continuous and discrete valued features 2000 iterations

10.14.04 Jackson Liscombe -- CU / AT&T 36

Performance Accuracy Summary

Feature Set AccuracyRel. Improv. over Baseline

Most Freq. State 73.1% -----

Baseline 76.1% -----

State-of-the-Art 77.0% 1.2%

Contextual 79.0% 3.8%

10.14.04 Jackson Liscombe -- CU / AT&T 37

Conclusions

Baseline (prosodic and lexical features) leads to improved emotion prediction over chance

State-of-the-Art (baseline plus dialog acts) gives further improvement

Innovative contextual features: improves emotion prediction even further

Towards a computation model of emotional grounding

Thank You