ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

12
ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE Wooil Kim and John H. L. Hansen Chun-Yu Chen

description

ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE. Chun-Yu Chen. Wooil Kim and John H. L. Hansen. Outline. Real conversational speech corpus TEO-CB-AUTO-ENV Emotional language model score Experimental results. Real conversational speech corpus. - PowerPoint PPT Presentation

Transcript of ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

Page 1: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH

BY LEVERAGING CONTENT STRUCTUREWooil Kim and John H. L. Hansen

Chun-Yu Chen

Page 2: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

Outline• Real conversational speech corpus

• TEO-CB-AUTO-ENV

• Emotional language model score

• Experimental results

Page 3: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

Real conversational speech corpus

• Neutral speech• digits , alphabets , and other words (First,

July, August)• specific information

• Angry speech• negative words (not, no, can’t, even, how)• Complaints• others(that, this, here)

Page 4: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

Real conversational speech corpus

Page 5: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

TEO-CB-AUTO-ENV

• one of the acoustic features for angry speech detection

• designed to represent nonlinear characteristics of the voiced sound production (e.g., vowels)

• The resulting vector of area coefficients has been shown to be large for neutral speech

Page 6: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

Emotional language model score

• two types of combination methods1. feature combination

MFCC feature vector is appended to the TEO-CB-Auto-Env feature vector

2. classifier combination combining the likelihood scores from both

classifiers with a scale factor

Page 7: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

• “Emotional” language models• Based on an initial language model with a

large vocabulary (HUB4)• using the transcripts of neutral and angry

speech• using HTK and CMU-Cambridge SLMT

toolkit to adapting the initial laguage model• formulate a 2-dimensional feature vector for

a “lexical” feature

Emotional language model score

Page 8: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

Emotional language model score

Page 9: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

• Collect data• 15 female and 13 male speakers• 136 segments for neutral speech and 124

segments for angry speech• Each segment has 3-6 sec

Experimental results

Page 10: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

Experimental results• Two type of model for test

1. Open-speaker• model training by all data except tester’s

2. Close-speaker• Split to two part of data• Tester only speak utterance in part A• Model is training by part B• More performance by include more data

Page 11: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

• Without EMLS• MFCC-EDZ is best

in single feature

Experimental results

Page 12: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

• With EMLS

Experimental results