Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

29
ELIS-DSSP Sint- Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 06/02/0 9 1 Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen, Jean-Pierre Martens, Marc De Bodt

description

Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen, Jean-Pierre Martens, Marc De Bodt. Intelligibility = popular measure for pathological speech assessment Perceptual assessment affected by non-speech information : - PowerPoint PPT Presentation

Transcript of Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

Page 1: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 1

Development of the SPACE intelligibility assessment method

Catherine Middag, Gwen Van Nuffelen,

Jean-Pierre Martens, Marc De Bodt

Page 2: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 2

Introduction

• Intelligibility = popular measure for pathological speech assessment

• Perceptual assessment affected by non-speech information : – familiarity of listener with speaker and type of disorder

hard to eliminate this subjective bias– guessing on the basis of linguistic context

test material design must eliminate this bias

• Replacing the human listener by an automatic speech recognizer (ASR) can solve the two problems, but is the ASR sufficiently reliable?– test case : automation of the Dutch Intelligibility Assessment (DIA)

Page 3: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 3

1 .op ø b d f g h j k l m n p r s t v w z

1. dop

2. nuis

3.

top

Dutch Intelligibility Assessment (DIA)

• 50 isolated (nonsense) words• intelligibility = percent phonemes correct

Page 4: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 4

How to apply ASR in the DIA?

• Two approaches– let ASR recognize the words and count the percentage

of correct decisions– let ASR check how well on average the acoustics

support the phonetic transcription of the target word (=alignment)

• Our experience– intelligibility emerging from first approach insufficiently

reliable– therefore we developed a system based on alignment

Page 5: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 5

System architecture : flow chart

Speech aligner

speaker features

Intelligibility Prediction

Model

objective score

acoustic feature sequence Xt

target speech transcription

Page 6: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 6

System architecture : flow chart

Speech aligner

speaker features

Intelligibility Prediction

Model

objective score

acoustic feature sequence Xt

target speech transcription

Two systems:• complex state-of-the-art HMM-based system (ASR-ESAT)• simple system with a phonological layer (ASR-ELIS) (point more directly to articulatory problems)

Acoustic models trained on speech of normal adult speakers

Page 7: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 7

ASR - ESAT

• Acoustic models– state-of-the-art Semi-Continuous HMM– triphone models trained on normal speech– states tied using decision trees + phonological questions

• Output– each frame t assigned to state st

– per frame : st, P(st|Xt)

Page 8: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 8

24 binary phonological features concerning :• voicing• manner of articulation• place of articulation

ASR - ELIS

PLF extractor

Probability product model

P(K1|Xt), …,

P(K24|Xt)

P(S1|Xt),…, P(Sn|

Xt)

Viterbi decoder

target speech transcription

Xt

st, P(st|Xt) P(K1|Xt)..P(K24|Xt)

Page 9: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 9

System architecture : flow chart

Speech aligner

acoustic feature sequence Xt

target speech transcription

Intelligibility Prediction

Model

objective score

speaker features

Three feature sets:• Phonemic features (patient has trouble pronouncing a certain phoneme)

• Phonological features (patient has problems with voicing, manner or place of articulation)

• NEW : context-dependent features (patient has problems with a desired change of voicing, manner or place of articulation)

Page 10: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 10

Extraction of phonemic features (PMF)

# : (0.7+0.5+0.3) /3

/p/ : (0.4+0.8) /2

/o/ : (0.6+0.8) /2

/l/ : 0.6

Speech aligner

=ASR-ESAT

Phonemic features

Frame Phoneme P(st|Xt)

1 # 0.7

2 # 0.5

3 /p/ 0.4

4 /p/ 0.8

5 /o/ 0.6

6 /o/ 0.8

7 /l/ 0.6

8 # 0.3

Page 11: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 11

Extraction of phonological features (PLF)

Frame Phone voicedP(K1|Xt)

backP(K2|Xt)

burstP(K3|Xt)

1 # 0.1 0.1 0.2

2 # 0.1 0.1 0.1

3 /pcl/ 0.2 0.1 0.1

4 /p/ 0.2 0.2 0.6

5 /o/ 0.8 0.7 0.2

6 /o/ 0.6 0.9 0.0

7 /l/ 0.5 0.5 0.1

8 # 0.1 0.1 0.0

Burst : 0.6

Back : (0.7+0.9)/2

Voiced : (0.8+0.6+0.5)/3

Speech aligner

=ASR-ELIS

Phonologicalfeatures

Page 12: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 12

Extraction of phonological features (PLF)

Not burst : (0.2+0.1+…

Not back : (0.1+0.1+…

Not voiced : (0.1+0.1+…

Phonologicalfeatures

Frame Phone voicedP(K1|Xt)

backP(K2|Xt)

burstP(K3|Xt)

1 # 0.1 0.1 0.2

2 # 0.1 0.1 0.1

3 /pcl/ 0.2 0.1 0.1

4 /p/ 0.2 0.2 0.6

5 /o/ 0.8 0.7 0.2

6 /o/ 0.6 0.9 0.0

7 /l/ 0.5 0.5 0.1

8 # 0.1 0.1 0.0

Speech aligner

=ASR-ELIS

Page 13: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 13

Irrelevant features for these phones

Extraction of phonological features (PLF)

Phonologicalfeatures

Frame Phone voicedP(K1|Xt)

backP(K2|Xt)

burstP(K3|Xt)

1 # 0.1 0.1 0.2

2 # 0.1 0.1 0.1

3 /pcl/ 0.2 0.1 0.1

4 /p/ 0.2 0.2 0.6

5 /o/ 0.8 0.7 0.2

6 /o/ 0.6 0.9 0.0

7 /l/ 0.5 0.5 0.1

8 # 0.1 0.1 0.0

Speech aligner

=ASR-ELIS

Page 14: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 14

Extraction of context-dependent phonological features (CD-PLF)

• How well is change in PLF realized?– use PLF target in preceding/succeeding phone as context – binary features two values for target (present/absent)– binary features restricted number of left & right contexts

• Left or right context can be– present, absent, not relevant, silence

• Model selection (preliminary)– maximum 4 * 2 * 4 = 32 CD-PLFs per PLF

768 in total– select only those CD-PLFs occurring at least twice in every test

123 in total

Page 15: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 15

Extraction of context-dependent phonological features (CD-PLF)Segment Phone voiced burst …

2 # 0.1 0.2

3 /pcl/ 0.2 0.2

4 /p/ 0.2 0.6

6 /o/ 0.6 0.1

7 /s/ 0.4 0.3

8 # 0.2 0.1

9 /m/ 0.7 0.3

10 /A/ 0.8 0.0

11 /l/ 0.6 0.1

12 # 0.1 0.1

CD-PLF features

Speech aligner

=ASR-ELIS

voicing burst

Off, on, off : +0.6 Yes, no, no : +0.1

On, on, on : +0.8 No, no, no : +0.0

Page 16: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 16

System architecture : flow chart

Speech aligner

acoustic feature sequence Xt

target speech transcription

speaker features

objective score

Intelligibility Prediction

Model

Page 17: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 17

Intelligibility prediction model (IPM)

• Objective map speaker features (PMF, PLF, CD-PLF or combinations) to

speaker intelligibility score

• Model training– train on DIA recordings– pathological speakers (+ some normal control speakers)

• Model type and size– limited number of pathological speakers– high number of features

linear regression model

feature selection

Page 18: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 18

Reference material (DIA)

• 211 speakers :– 51 normals– 60 dysarthric– 12 clefts (children)– 42 hearing impaired– 37 with laryngectomy– 7 with dysphonia– 2 others

• Pathological speakers : mean of 78,7 %

• Normals : mean of 93,3 %• Few with very low score

0 20 40 60 80 1000

10

20

30

40

50

60

human score

num

ber o

f pat

ient

s

histogram of the human scores

Page 19: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 19

Solving microphone issues

• Two microphones were used. • Difference can be found in cepstral means ( Cepstral mean

subtraction was performed) :

-20 -15 -10 -5 0 5-50

0

50-15

-10

-5

0

5

10

15 shure

sony

Page 20: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 20

Training / validation

• Models chosen with five-fold cross validation • Measure = Standard deviation (STD) : in case of

normality, 67% of the computed score lie in an interval of STD around the perceptual score

• More features = more chance of overfitting• Rule of thumb : take 1 feature for every 10 training

examples

Restrict number of features to maximum 15

Page 21: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 21

20 40 60 80 100 12020

40

60

80

100

120

Perceptual score

Com

pute

d sc

ore

20 40 60 80 100 12020

40

60

80

100

120

Perceptual score

Com

pute

d sc

ore

Results : individual systems

PMFelis : 9.52 PMFesat : 8.57

Page 22: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 22

20 40 60 80 100 12020

40

60

80

100

120

Perceptual score

Com

pute

d sc

ore

20 40 60 80 100 12020

40

60

80

100

120

Perceptual score

Com

pute

d sc

ore

Results : individual systems

PLF (elis) : 9.35 CD-PLF (elis) : 8.48

Page 23: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 23

Results : all systems

Model STD N

PMFesat 8.57 15

PMFelis 9.52 15

PLF 9.35 15

CD-PLF 8.48 15

PMFelis + PLF 8.20 15

PMFesat + PLF 8.00 13

PMFelis + CD-PLF 7.63 15

PLF + CD-PLF 8.04 15

PMFesat + CD-PLF 7.34 15

PMFelis + PLF + CD-PLF 7.48 15

• New models with CD-PLF outperform old PLF models

• CD-PLFs form best system with one feature set

• PMFesat + CD-PLF best system with combined feature sets

• Using three ELIS feature sets yields next best result and needs only one recognizer (the simplest one)

less complex system

Page 24: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 24

Results : combined system

CD-PLF + PMFesat:

STD = 7.34

20 40 60 80 100 12020

40

60

80

100

120

Perceptual score

Com

pute

d sc

ore

Page 25: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 25

Results : pathology-specific IPM

• Instead of creating one general IPM, one can create IPMs for specific pathologies :– trained on all speakers (to have enough speakers)– model selection based on performance on speakers of that

pathology (importance of features depends on type of disorder)

Page 26: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 26

Results : pathology-specific IPM (2)

Model DYS LAR HEAR

PMFesat 8.44 8.32 7.48

PMFelis 8.10 5.88 9.73

PLF 8.27 7.17 8.05

CD-PLF 6.49 5.70 6.87

PMFelis + PLF 6.97 5.14 6.63

PMFesat + PLF 6.87 6.49 6.20

PMFelis + CD-PLF 6.50 3.54 6.05

PLF + CD-PLF 6.32 5.82 6.17

PMFesat + CD-PLF 6.69 4.86 5.27

PMFelis + PLF + CD-PLF 6.32 3.68 5.73

• Very good match in case CD-PLFs are involved

• New models with CD-PLF outperform old PLF models

• CD-PLFs form best system with one feature set

• Using three ELIS feature sets yields (almost) best result and needs only one recognizer (the simplest one)

less complex system

Page 27: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 27

Results : pathology-specific IPM

• Dysarthria : 6.32 (red circles)

• Dispersion of other speakers is increased

• Largest deviations in low intelligibility area :– scarce data in that area– can be solved by adding

more weight to patients with very low intelligibility

20 40 60 80 100 12020

40

60

80

100

120

Perceptual score

Com

pute

d sc

ore

Page 28: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 28

Conclusions and future work

• PMF, PLF and CD-PLF can predict intelligibility of pathological speech:– CD-PLFs seem to play an important role :

• STD = 7.34 for general model combining CD-PLF and PMFesat• STDs less than 6.32 for pathology specific model using 3 elis feature

sets not the articulation pattern but the change in the articulation pattern

matters?– More research is needed before adding this feature set to the tool – Results on validation set compete with human inter-rater

agreements.• Future work:

– more profound articulatory assessment, which is directly related to determination of appropriate therapy

– monitoring of effectiveness of chosen therapy– using more natural speech (words, phrases) in tests

Page 29: Development of the SPACE intelligibility assessment method Catherine Middag, Gwen Van Nuffelen,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 06/02/09 29

• Questions?