Speech Emotion Recognition and Perception of...

28

Transcript of Speech Emotion Recognition and Perception of...

Page 1: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

1/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Speech Emotion Recognition

and Perception of Music

Mélanie Fernández Pradier

Prof. Dr.-Ing. Bin Yang

Supervisors: Prof. Dr.-Ing. Bin Yang

Dipl.-Ing. Fabian Schmieder

January 27, 2011

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 2: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

2/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

MotivationAim of the thesis

MotivationSpeech Emotion Recognition and Perception of Music

Emotion Recognition from Speech

Speech ∼ two-channel

linguistic

paralinguistic

Several Applications

support ASR

diagnoses

speech synthesis

entertainment

Music Perception

�language of emotion�

treatment of a�ective

disorders

treatment of speech disorders

same origin of music and

speech

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 3: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

3/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

MotivationAim of the thesis

Aim of the thesisApply Music Theory to Speech Emotion Recognition

Investigate Speech and Music similarities to derive universal features for Emotions

1 What is the link between music and speech?

2 How are emotions transmitted through music?

3 Can we apply musical knowledge to speech processing?

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 4: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

4/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

General ConceptsDescription

1 Introduction

Motivation

Aim of the thesis

2 Basic Features

General Concepts

Description

3 Musical Features

Interval and Triad Features

Based on Music Emotion Recognition

Perceptual Model of Intonation

4 Simulations and ResultsMélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 5: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

4/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

General ConceptsDescription

Pattern Recognition

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 6: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

5/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

General ConceptsDescription

Feature Generation

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 7: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

6/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

General ConceptsDescription

Basic Features Description

Local Features

ZCR

MFCC

Energytotal + bands

Pitch

Voiced-unvoiced

VAD

ZCR = 12 ·∑

N

n=1 |sgn (xn)− sgn (xn+1)|

Cepstrum =∣∣∣FFT {log (|FFT {x}|2)}∣∣∣2

Energy =∑

N

n=1 xn · x?n

Global Features

Global statistics: min, mean, max,

median, std, iqr...

directly, 1st or 2nd derivative

Energy and pitch plateaux

Combination with logical features

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 8: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

7/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation

Interval and Triad Features

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 9: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

8/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation

Interval Features

Autocorrelation of the circular

pitch density function

∫ L

0

po (modL (s + λ)) po (λ) dλ

Intervalic dissonance

DIS =

∫ L

0

d (s) ro (s) ds

where d (s) '√N (s)D (s)

0 2 4 6 8 10 12

2

4

6

8

10

12

14

Pitch Histogram

Num

ber

of P

itch S

am

ple

s

Circular frequency in ST scale

0 2 4 6 8 10 120

0.01

0.02

0.03

0.04

0.05

0.062nd order Autocorrelation

Circular frequency in ST scale

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 10: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

9/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation

Interval Dissonance

0f- f

Frequency of

Tone Sensation

Beats

Area

Roughness

AreaSmoothness

Area

Smoothness

Area

10Hz

Frequency difference

f = f2 - f1

f1

One-Tone

Sensation

Critical Bandwidth

f2

f1

Limits of

Discrimination

Two-Tone

Sensation

Two-Tone

Sensation

m2 M2 m3 M3 P4 4+/5° P5 m6 M6 m7 M70

10

20

30

40

50

60

70

Dis

so

na

nce

Intervals

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 11: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

10/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation

Triad Features

1 Direct computation

2 Extraction of �dominant pitches�

Autocorrelation Triad Features

0 2 4 6 8 10 120

0.05

0.1

0.15

0.2

0.25

Gaussian Mixture Model

Semi−Tone Scale

Gaussian Triad Features

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 12: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

11/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation

Tension and Modality

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 13: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

12/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation

Loudness, Timbre and Rhythm

Intensity Features

I (k) =

N/2∑n=0

|FFTk (n)|

Di (k) =1

I (k)

Hi∑n=Li

|FFTk (n)|

where k refers to the frame

Timbre Features

FFTk ≡ {xk1 . . . xkN}

sorted ≡{x′

k1 . . . x′

kN

}

Peak (k) = log

{1

αN

αN∑i=1

x′

ki

}

Valley (k) = log

{1

αN

αN∑i=1

x′

k(N−i+1)

}

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 14: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

13/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation

Loudness, Timbre and Rhythm

Rhythm Features

1 Compute FFT

2 Extract amplitude envelope

Ai (n) = FFTi (n)⊗ hw (n)

3 Apply Canny operator

Oi (n) = Ai (n)⊗ C (n)

C (n) = n

σ2e− n

2

2σ2

We obtain the onset sequence

Oi (n)50 100 150 200 250

2

4

6

8

10

12

14

16

18

Number of samples

Am

plit

ude

Onset Sequence

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 15: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

14/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation

Loudness, Timbre and Rhythm

Rhythm Features

Strength Average value of the

peaks

Regularity Average value of peaks

in the autocorrelation

Speed Ratio of number of

peaks and time

duration

50 100 150 200 250

2

4

6

8

10

12

14

16

18

Number of samples

Am

plit

ude

Onset Sequence

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 16: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

15/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation

Perceptual Model of Intonation

Perceptual principles

1 Segmentation E�ect

2 Glissando Threshold: minimum

amount of frequency change

gth = 0.16/T 2 [ST/s²]

3 Di�erential Glissando Threshold:

minimum di�erence in slope

dgth = a2 − a1 = 20 [ST/s]

4 Short-term integration in time

0 0.5 1 1.5 20

50

100

150

200

250

300

Time (s)

Fre

quency (

Hz)

F0 estimation

stylization 1

stylization 2

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 17: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

16/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Database - Labels - Features

Database: emoDB (TUB)

10 speakers

708 �les

6 emotions

BASIC SET

duration 16

MFCC 91

ZCR 13

harmony 3

energy 58

pitch 33

Total 214

MUSICAL SET

interval 31

autocorr.

triad

4

gaussian

triad

10

intensity 63

rhythm 15

Total 123

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 18: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

17/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Strategies for evaluation 9-1 Vs 8-1-1

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 19: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

18/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Musical Universals

0.8 1 1.2 1.33 1.5 1.6 1.75 2 2.2−18

−16

−14

−12

−10

−8

−6

−4

−2

0

2

Frequency Ratio

Me

an

no

rma

lize

d a

mp

litu

de

(d

B)

1.4

m3

unison

octave

M3

P4P5

4+or5°

m6

M6

m7

1.25 1.67

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 20: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

19/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Plain bayes classi�er - Evaluation 8-1-1

0 10 20 30 40 50

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

Number of features

Tra

inin

g h

it r

ate

(%

)

Basic set

Full set

0 10 20 30 40 50

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

Number of features

Ge

ne

raliz

atio

n h

it r

ate

(%

)

Basic set

Full set

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 21: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

20/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Nature of selected features

time

MFCC

ZCR

harmony

energy

pitch

interval

auto−correlation

triad

gaussiantriad

intensity

rhythm

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 22: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

21/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Comparison plain Vs hierarchical bayes classi�er

Activation

Valence

Potency

Valence

Potency

high low

highhigh

high highlow

lowlow

low

happy angry afraid neutralboredsad

plain

Bayes

hierarchical

Bayes

Basic 76.12 84.22

Basic+

Interval+Triad80.61 85.04

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 23: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

22/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Multi-dimensional Scaling

−15 −10 −5 0 5 10 15 20−15

−10

−5

0

5

10

15

1st Principal Component

BASIC

2n

d P

rin

cip

al C

om

po

ne

nt

Happy

Sad

Bored

Angry

−20 −10 0 10 20 30−15

−10

−5

0

5

10

15

20

25

1st Principal Component

2n

d P

rin

cip

al C

om

po

ne

nt

FULL

Happy

Sad

Bored

Angry

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 24: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

23/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Happy Vs Angry - Evaluation 8-1-1

0 10 20 30 40 500.7

0.72

0.74

0.76

0.78

0.8

0.82

0.84

0.86

0.88

Number of features

Tra

inin

g h

it r

ate

(%

)

Happy Vs Angry

Basic set

Full set

0 10 20 30 40 500.62

0.64

0.66

0.68

0.7

0.72

0.74

Number of features

Ge

ne

raliz

atio

n h

it r

ate

(%

)

Angry Versus Happy

Basic set

Full set

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 25: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

24/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Final Comparison of Musical Features

0 5 10 15 20 25 30 35 40 45 500.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8Comparison between different musical feature sets

Number of features

Accura

cy R

ate

(%

)

Musical Set

Basic Set

B+Stylization Set

B+Interval+Triad

B+Intensity

B+Rhythm

Full Set

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 26: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

25/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Conclusion

Summary

1 Literature review about speech, music and emotions

2 Theoretical background on psychoacoustics

3 Re-implementation of the basic features

4 Implementation of speech processing algorithms

5 Implementation of musical features

(music perception, MER and linguistics)

6 Simulations ⇒ Musical features can help to improve emotion

recognition in speech

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 27: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

26/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Conclusions

Further research

Environment: natural emotional speech, other languages

Pattern Recognition steps: feature transformation, pitch

extraction, classi�cation...

Improvement of musical features

Dissonance model, Perceptual model of intonation,

Emotionally meaningful moments

Systematization of feature extraction step

"Even monkeys express strong feelings in di�erent tones � anger

and impatience by low, � fear and pain by high notes."

Charles Darwin, Naturalist

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music

Page 28: Speech Emotion Recognition and Perception of Musicoa.upm.es/9982/1/vortragDiplomarbeit_Mel_part2_handout.pdf · 2014. 9. 22. · Based on Music Emotion Recognition Perceptual Model

27/27

IntroductionBasic Features

Musical FeaturesSimulations and Results

Thank you!

Looking forward to your questions. . .

Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music