Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 –...

15
Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland . Jana Tuckova & Martin Sramka Department of Circuit Theory, CTU – FEE in Prague Laboratory of Artificial Neural Network Applications tuckova @fel.cvut.cz http://amber.feld.cvut.cz/user/tuckova 1/14

Transcript of Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 –...

Page 1: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

Emotional Speech Analysis using

Artificial Neural Networks

IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland

.

Jana Tuckova & Martin Sramka Department of Circuit Theory, CTU – FEE in Prague Laboratory of Artificial Neural Network Applications

[email protected] http://amber.feld.cvut.cz/user/tuckova

1/14

Page 2: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

Overview IMCSIT-AAIA

Acknowledgment: This work was supported by the Czech Science Foundation 102/09/0989 grant.

Wisla, Poland

Introduction Method - The patterns based on time and frequency characteristics - The patterns based on musical theory - Combination of both previous approaches Experiments and Results Conclusion and future work

2/14

Page 3: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

IMCSIT-AAIA Wisla, Poland

Introduction

A classification of speech emotions.

Our aim:

3/14

Why ANN?

- The robustness of the solution for real methods by ANN is a great advantage, for example, in the area of noise signal processing.

- It is possible treat various input data type currently.

Page 4: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

By a description of speech signals which are formulated by: - standard speech processing methods - music theory - combination of both methods

By ANN approach

IMCSIT-AAIA Wisla, Poland

Introduction

Which way ?

4/14MLNN KSOM

Page 5: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

IMCSIT-AAIA Wisla, Poland

Introduction

MLNN – with one hidden layer – the input layer is given by the key linguistic parameters – the outputs are the various clasees of emotions

KSOM- SSOM

– the training algorithm: Scaled Conjugate Gradient with superlinear convergence rate

5/14

Page 6: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

which combines aspects of the VQ method with the topology preservingordering of the quantization vectors.

only for well-known input data for well-known classes of input data

The database forANN 216 patterns for training 72 for validation 72 for test

IMCSIT-AAIA Wisla, Poland

Introduction

6/14

KSOM- SSOM

Page 7: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

Corpus creation IMCSIT-AAIA

Wisla, Poland

Database of Utterances

7/14

Words (in

Czech)

Words - translation

Jé. Whoah.

Má ? Got it ?

Nevím. I don´t know.

Vidíš? See you ?

Povídej ! Tell me !

Poezie. Poetry.

Sentences (in Czech)

Sentences - translation

To mi nevadí. I don´t mind.

Neumím si to vysvětlit.

I don´t know to explain this.

To bude světový rekord.

It will be a world record.

Jak se ti to líbí ? How do you like it ?

Podívej se na nebe !

Look up at the heavens !

Až přijdeš, uvidíš.

When you come, you´ll see.

Page 8: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

Corpus creation IMCSIT-AAIA

Wisla, Poland

Recorded emotion speech was subjectively evaluated by4 persons. The final database contained 720 patterns: 360 patterns for one-word sentences 360 patterns for multiword sentences)

Emotions: 1- anger, 2- boredom, 3- pleasure 4- sadness H N R S

The sentences was read by professional actors (2 f + 1 m)Speech recording: in a professional recording studio format “wav“ sampling frequence 44.1 kHz, 24bit

8/14

Page 9: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

Method: The Patterns Based on Music Theory. IMCSIT-AAIA Wisla, Poland

The method is based on the idea of the musical interval: The frequency difference between a specific n-tone and reference tone.

Example: quint is frequency ratio of the fifth tone divided by the first tone = 1.498

Int. 1st 2nd 3rd 4th 5th 6th 7th 8th

Var. MinMaj

MinMaj

MinMaj

MinMaj

FR 1 1.0591.122

1.1891.260

1.335 1.498 1.5871.682

1.7821.888

29/14

Page 10: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

IMCSIT-AAIA Wisla, Poland

Method: The Patterns Based on Musical Theory.

The reference frequency (F0) is given by the choices in each utterance feature.

The frequency ratios are compared with the music intervals.

fifth circle

fifth = f3/f2

geometric series 15.1 n

n xxk

tone affinity – decrease from n=1 to n=7 - increase from n=8 to n=1310/14

Page 11: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

Experimental Results IMCSIT-AAIA

Wisla, Poland

U-matrix H - angerR - pleasure

S - sadnessN - boredom

11/14

One-word sentences Multi-word sentences

Page 12: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

Wisla, PolandIMCSIT-AAIA

Conclusion – for music theory

Comparison to some publications:Success classifications 54-64% standard classifier 81 % ANN hight note versus 12 half tones Korea language

Our results - success classifications: 74% (MLNN) QE / TE QE / TE 0.274 / 0.014 0.275 / 0.017 (SSOM) 1 word sentence multiword sentence

12/14

Page 13: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

Wisla, PolandIMCSIT-AAIA

13/14

Conclusion – future work

Our effort in future work:

ANN application in prosody modelling: we want to apply results from the described experiments with emotional speech to the improvement of synthetic speech naturalness

ANN application in children’s disordered speech analysis developmental dysphasia

Page 14: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

These different domain of the application influence the database creation.

Multiword sentences are more acceptable for prosody modelling.

One-word sentences is suitable for the analysis of children’s disordered speech. WHY?

often a speech malfunction is manifested in an inability to pronounce whole sentences

Wisla, PolandIMCSIT-AAIA 0

Conclusion – future work

14/14

Page 15: Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of.

Wisla, Poland

IMCSIT-AAIA

Thank you for

your attention

The End