The Computerised FDA Application Formulating A System of Acoustic Objective Measures for the...

Background: The paper-based FDABackground: The paper-based FDA The type and severity of a given instance of The type and severity of a given instance of

dysarthriadysarthria (disordered speech arising from impaired (disordered speech arising from impaired articulator control) is diagnosable by an articulator control) is diagnosable by an assessment procedure known as the assessment procedure known as the Frenchay Frenchay Dysarthria AssessmentDysarthria Assessment (FDA) tests. (FDA) tests.

Two of the three FDA intelligibility tests are concerned with the Two of the three FDA intelligibility tests are concerned with the measurement of intelligibility…but what exactly is intelligibility measurement of intelligibility…but what exactly is intelligibility anyway? anyway?

““The degree of success in establishing communication The degree of success in establishing communication between the sender and intended recipient of a between the sender and intended recipient of a

messagemessage””

Intelligibility, a very variable perceptIntelligibility, a very variable percept Are both of these speech samples equally Are both of these speech samples equally

intelligible?intelligible? Initially, a listener will find it more difficult to understand a Initially, a listener will find it more difficult to understand a

newly encountered accent than a familiar one. Nonetheless, newly encountered accent than a familiar one. Nonetheless, increased exposure to the initially unfamiliar speaking style increased exposure to the initially unfamiliar speaking style will usually invoke a subconscious adaptation, a will usually invoke a subconscious adaptation, a learning learning effecteffect, making that speech easier to understand. This holds , making that speech easier to understand. This holds

true even for dysarthric speech.true even for dysarthric speech.

05

101520253035404550

Judge1

Judge2

Judge3

Judge4

Judge5

Judge6

Judge7

Judge8

Judge9

Judge10

Mea

n S

core

Imp

rove

men

t (%

)

Naïve Listeners

Expert Listeners

Learning Effect from Repeated Exposure to Dysarthric Speech Data - Mean Score Improvement: Round 1 vs.

Rounds 2-5

(from ABI Corpus, Birmingham Uni.)

Modelling the Naïve ListenerModelling the Naïve Listener If the learning effect alters a listener’s perception of a particular If the learning effect alters a listener’s perception of a particular

individual’s speaking style, is that listener’s judgement still individual’s speaking style, is that listener’s judgement still representative of the naïve listener? representative of the naïve listener?

• If the learning effect introduces an inevitable bias, can a computer If the learning effect introduces an inevitable bias, can a computer model be built which behaves like an “eternal” naïve listener (i.e. model be built which behaves like an “eternal” naïve listener (i.e. never adapting to an unfamiliar speaking style and therefore never adapting to an unfamiliar speaking style and therefore always consistent in assessment)?always consistent in assessment)?

Possible Solution: Using HMM Models to Emulate the Naïve listenerPossible Solution: Using HMM Models to Emulate the Naïve listener• A hidden Markov Model (HMM) is, essentially, a statistical A hidden Markov Model (HMM) is, essentially, a statistical

representation of a speech unit at the phone/word/utterance level. representation of a speech unit at the phone/word/utterance level. HMM models are “trained” by analysing the acoustic features of HMM models are “trained” by analysing the acoustic features of multiple utterances representing the specified speech unit.multiple utterances representing the specified speech unit.

Multiple Speech Samples from multiple speakers

Goodness of FitGoodness of Fit Once trained, an HMM word model can be used to Once trained, an HMM word model can be used to

estimate the likelihood that a given speech sound estimate the likelihood that a given speech sound could have actually been produced by that word could have actually been produced by that word model. This likelihood is called a model. This likelihood is called a goodness of fitgoodness of fit (GOF) and can be expressed as a log likelihood, e.g. (GOF) and can be expressed as a log likelihood, e.g. 1010-35 -35 (or simply (or simply -35-35).).

Mr. HMM Model, could you’ve been my daddy?

Hmm, with a log likelihood of 10-55, I’m not so sure…

The more acoustically dissimilar an utterance is from what the IE has been trained on, the lower the GOF score

Using Forced-Alignment GOF Using Forced-Alignment GOF scoring to measure Intelligibilityscoring to measure Intelligibility

Since two of the FDA intelligibility tests require the Since two of the FDA intelligibility tests require the repetition of words/phrases from a pre-selected vocabulary, repetition of words/phrases from a pre-selected vocabulary, HMM utterance models can be built for these HMM utterance models can be built for these words/phrases. words/phrases.

Furthermore, the incoming speech can be matched to the Furthermore, the incoming speech can be matched to the corresponding utterance model to determine the goodness corresponding utterance model to determine the goodness of fit. This matching of a speech sample to a specific of fit. This matching of a speech sample to a specific utterance model and only that model is called utterance model and only that model is called forced forced alignmentalignment. .

We hypothesise that force-aligning a speech sample with its We hypothesise that force-aligning a speech sample with its corresponding “everyman” word model will yield GOF corresponding “everyman” word model will yield GOF scores which are systematically related to that speech scores which are systematically related to that speech sample’s intelligibilitysample’s intelligibility. When HMMs are used in this way, . When HMMs are used in this way, we call them we call them intelligibility estimatorsintelligibility estimators. .

……so, how does it work in practice?so, how does it work in practice?

IE utterance models are trained on normal speech from a IE utterance models are trained on normal speech from a variety of speakers and a range of GOF scores for normal variety of speakers and a range of GOF scores for normal speech test data is established: typically between -5 and -10.speech test data is established: typically between -5 and -10.

Ranges have been established for moderate and low Ranges have been established for moderate and low

intelligibility (which, in an FDA diagnostic context = dysarthric) intelligibility (which, in an FDA diagnostic context = dysarthric) speech, typically with GOF scores between -11 and -20 speech, typically with GOF scores between -11 and -20

(moderately intelligible) and < -20 (low intelligibility).(moderately intelligible) and < -20 (low intelligibility). These These scores are relative to the maximum likelihood utterance (i.e. scores are relative to the maximum likelihood utterance (i.e. the speech file with the highest GOF score) in the IE’s training the speech file with the highest GOF score) in the IE’s training set.set.

Sample GOF scoresSample GOF scores

-40

-35

-30

-25

-20

-15

-10

-5

0

5

No

rmal

ised

GO

F S

core

s (R

elat

ive

to M

LU

)

Normal Speaker 1

Normal Speaker 2

Normal Speaker 3

Normal Speaker 4

Normal Speaker 5

Normal Speaker 6

Normal Speaker 7

Normal Speaker 8

Normal Speaker 9

Normal Speaker 10

Dysar. Speaker 1

Dysar. Speaker 2

Dysar. Speaker 3

-45

-40

-35

-30

-25

-20

-15

-10

-5

0

5

No

rmal

ised

GO

F S

core

s (R

elat

ive

to M

LU

)

Normal Speaker 1

Normal Speaker 2

Normal Speaker 3

Normal Speaker 4

Normal Speaker 5

Normal Speaker 6

Normal Speaker 7

Normal Speaker 8

Normal Speaker 9

Normal Speaker 10

Dysar. Speaker 1

Dysar. Speaker 2

Dysar. Speaker 3

GOF scores for isolated single words

GOF scores for short sentence utterances

Problem: How do we make IEs truly naïve?Problem: How do we make IEs truly naïve? ““Everyman’ HMM utterance models are not really ‘everyman’, it’s Everyman’ HMM utterance models are not really ‘everyman’, it’s

not feasible to train them on speech data representing all the not feasible to train them on speech data representing all the world’s anglophone accents. In this experiment, the utterance world’s anglophone accents. In this experiment, the utterance models have been trained on speech principally from the South models have been trained on speech principally from the South Yorkshire region, thus accents not represented in the HMM training Yorkshire region, thus accents not represented in the HMM training data could receive GOF scores which do not truly reflect that data could receive GOF scores which do not truly reflect that speech sample’s intelligibility as perceived by a naïve listener.speech sample’s intelligibility as perceived by a naïve listener.

A non-trivial problem: Certain anglophone accents, due to their A non-trivial problem: Certain anglophone accents, due to their prestige, are more universally intelligible than others, e.g. Estuary prestige, are more universally intelligible than others, e.g. Estuary English and RP, while others are a lot less intelligible internationally English and RP, while others are a lot less intelligible internationally (e.g. the Glaswegian accent). What mix of accents should be used (e.g. the Glaswegian accent). What mix of accents should be used to train an HMM word model to make it truly representative of a to train an HMM word model to make it truly representative of a ‘typical’ naïve listener?‘typical’ naïve listener?

Objective #2: Overall DiagnosisObjective #2: Overall Diagnosis After collecting data from all the 28 FDA sub-tests, how do we After collecting data from all the 28 FDA sub-tests, how do we

arrive at a dysarthria sub-type diagnosis?arrive at a dysarthria sub-type diagnosis? Usually by template matching and symptom categorisation (e.g. Usually by template matching and symptom categorisation (e.g.

““At-rest tasks performed better than in-speech tasks? If so, At-rest tasks performed better than in-speech tasks? If so, spastic dysarthria most likelyspastic dysarthria most likely”).”).

Can these processes be automated? Yes, via a neural network Can these processes be automated? Yes, via a neural network combined with an expert system. The neural network does the combined with an expert system. The neural network does the basic pattern matching while the rule-based expert system basic pattern matching while the rule-based expert system attempts to disambiguate diagnostic information not directly attempts to disambiguate diagnostic information not directly represented in the FDA letter grades.represented in the FDA letter grades.

Uncontrollably Rapid Speech Rate?

Hypokinetic DysarthriaMost likely of 5 types

Slow Speech Rate?

Extrapyramidal Dysarthria less likely than other 4 types

Yes No

Flaccid Dysarthria most likely of 5 types

YesNo

Example of CFDA Expert system rule-based data disambiguation

Diagnostic Accuracy of Hybrid Diagnostic Accuracy of Hybrid SystemSystem

0

20

40

60

80

100

Ataxic Extrapyramidal Flaccid Mixed Spastic

Dysarthria sub-type

Clas

sific

atio

n ac

cura

cy (%

)

FDT Classif icationCorrectness

MLP Classif icationCorrectness

Hybrid SystemClassif icationCorrectness (1stchoice)

Hybrid SystemClassif icationCorrectness (1stor 2nd choice)

The automated diagnostic system will even The automated diagnostic system will even tell you why it came to a given decision…tell you why it came to a given decision…

Future WorkFuture Work Acquisition of HMM Technology which (for the Acquisition of HMM Technology which (for the

Intelligibility Estimator) doesn’t have prohibitively Intelligibility Estimator) doesn’t have prohibitively high license fees.high license fees.

Collection of dysarthric data to build an FDA-Collection of dysarthric data to build an FDA-specific dysarthric speech database.specific dysarthric speech database.

More interviews with experienced speech More interviews with experienced speech therapists to increase the diagnostic expert therapists to increase the diagnostic expert system’s knowledge database.system’s knowledge database.

Results of NHS Field Trials of the CFDA Results of NHS Field Trials of the CFDA applicationapplication

The Computerised FDA Application Formulating A System of Acoustic Objective Measures for the...

Documents

Transcript of The Computerised FDA Application Formulating A System of Acoustic Objective Measures for the...