Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 –...

Emotional Speech Analysis using

Artificial Neural Networks

IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland

.

Jana Tuckova & Martin Sramka Department of Circuit Theory, CTU – FEE in Prague Laboratory of Artificial Neural Network Applications

[email protected] http://amber.feld.cvut.cz/user/tuckova

1/14

Overview IMCSIT-AAIA

Acknowledgment: This work was supported by the Czech Science Foundation 102/09/0989 grant.

Wisla, Poland

Introduction Method - The patterns based on time and frequency characteristics - The patterns based on musical theory - Combination of both previous approaches Experiments and Results Conclusion and future work

2/14

IMCSIT-AAIA Wisla, Poland

Introduction

A classification of speech emotions.

Our aim:

3/14

Why ANN?

- The robustness of the solution for real methods by ANN is a great advantage, for example, in the area of noise signal processing.

- It is possible treat various input data type currently.

By a description of speech signals which are formulated by: - standard speech processing methods - music theory - combination of both methods

By ANN approach


Introduction

Which way ?

4/14MLNN KSOM


Introduction

MLNN – with one hidden layer – the input layer is given by the key linguistic parameters – the outputs are the various clasees of emotions

KSOM- SSOM

– the training algorithm: Scaled Conjugate Gradient with superlinear convergence rate

5/14

which combines aspects of the VQ method with the topology preservingordering of the quantization vectors.

only for well-known input data for well-known classes of input data

The database forANN 216 patterns for training 72 for validation 72 for test


Introduction

6/14

KSOM- SSOM

Corpus creation IMCSIT-AAIA

Wisla, Poland

Database of Utterances

7/14

Words (in

Czech)

Words - translation

Jé. Whoah.

Má ? Got it ?

Nevím. I don´t know.

Vidíš? See you ?

Povídej ! Tell me !

Poezie. Poetry.

Sentences (in Czech)

Sentences - translation

To mi nevadí. I don´t mind.

Neumím si to vysvětlit.

I don´t know to explain this.

To bude světový rekord.

It will be a world record.

Jak se ti to líbí ? How do you like it ?

Podívej se na nebe !

Look up at the heavens !

Až přijdeš, uvidíš.

When you come, you´ll see.

Corpus creation IMCSIT-AAIA

Wisla, Poland

Recorded emotion speech was subjectively evaluated by4 persons. The final database contained 720 patterns: 360 patterns for one-word sentences 360 patterns for multiword sentences)

Emotions: 1- anger, 2- boredom, 3- pleasure 4- sadness H N R S

The sentences was read by professional actors (2 f + 1 m)Speech recording: in a professional recording studio format “wav“ sampling frequence 44.1 kHz, 24bit

8/14

Method: The Patterns Based on Music Theory. IMCSIT-AAIA Wisla, Poland

The method is based on the idea of the musical interval: The frequency difference between a specific n-tone and reference tone.

Example: quint is frequency ratio of the fifth tone divided by the first tone = 1.498

Int. 1st 2nd 3rd 4th 5th 6th 7th 8th

Var. MinMaj

MinMaj

MinMaj

MinMaj

FR 1 1.0591.122

1.1891.260

1.335 1.498 1.5871.682

1.7821.888

29/14


Method: The Patterns Based on Musical Theory.

The reference frequency (F0) is given by the choices in each utterance feature.

The frequency ratios are compared with the music intervals.

fifth circle

fifth = f3/f2

geometric series 15.1 n

n xxk

tone affinity – decrease from n=1 to n=7 - increase from n=8 to n=1310/14

Experimental Results IMCSIT-AAIA

Wisla, Poland

U-matrix H - angerR - pleasure

S - sadnessN - boredom

11/14

One-word sentences Multi-word sentences

Wisla, PolandIMCSIT-AAIA

Conclusion – for music theory

Comparison to some publications:Success classifications 54-64% standard classifier 81 % ANN hight note versus 12 half tones Korea language

Our results - success classifications: 74% (MLNN) QE / TE QE / TE 0.274 / 0.014 0.275 / 0.017 (SSOM) 1 word sentence multiword sentence

12/14

Wisla, PolandIMCSIT-AAIA

13/14

Conclusion – future work

Our effort in future work:

ANN application in prosody modelling: we want to apply results from the described experiments with emotional speech to the improvement of synthetic speech naturalness

ANN application in children’s disordered speech analysis developmental dysphasia

These different domain of the application influence the database creation.

Multiword sentences are more acceptable for prosody modelling.

One-word sentences is suitable for the analysis of children’s disordered speech. WHY?

often a speech malfunction is manifested in an inability to pronounce whole sentences

Wisla, PolandIMCSIT-AAIA 0

Conclusion – future work

14/14

Wisla, Poland

IMCSIT-AAIA

Thank you for

your attention

The End

Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 –...

Documents

Transcript of Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 –...