Quantifying aberrant phonation using approximate entropy in electrolaryngography

Speech Communication 47 (2005) 312–321

www.elsevier.com/locate/specom

Quantifying aberrant phonation using approximateentropy in electrolaryngography

Kathiresan Manickam a,*, Christopher Moore a,Terry Willard b, Nicholas Slevin c

a North Western Medical Physics, HQ at Christie Hospital NHS Trust, Withington, Manchester M20 4BX, UKb South Manchester University Hospitals, Withington, Manchester M20 2LR, UK

c Clinical Department of Radiation Oncology, Christie Hospital NHS Trust, Manchester M20 4BX, UK

Received 8 July 2004; received in revised form 28 January 2005; accepted 24 February 2005

Abstract

Vocal fold vibration during vowel phonation can be used to characterise voice quality. This vibration can be mea-

sured using a laryngograph, which produces a waveform of highly correlated trans-larynx impedance variations, col-

lectively termed the electroglottogram (EGG). Using approximate entropy (ApEn) in the EGG spectral domain,

earlier work has been able to explain the meaning of ‘‘voice normality’’ and also to begin quantifying the impact that

radiotherapy treatment has on the voicing of larynx cancer patients. In this paper ApEn is used to quantify pathological

voicing in radiotherapy patients using the EGG in the time domain. Since ApEn is a viable single figure of merit, it has

the potential to make assessment of aberrant voicing both more concise and objective than the subjective analysis

adopted by speech and language therapists (SALTs).

� 2005 Elsevier B.V. All rights reserved.

Keywords: Larynx cancer; Voicing; Electroglottogram; Approximate entropy; Speech and language therapy

1. Introduction

It is common practice for speech and language

therapists, SALTs, to assess the quality of a pa-tient�s aberrant voice before and after treatment,

0167-6393/$ - see front matter � 2005 Elsevier B.V. All rights reserv

doi:10.1016/j.specom.2005.02.008

* Corresponding author. Tel.: +44 161 4463717.

E-mail address: [email protected].

ac.uk (K. Manickam).

to suggest the changes in underlying physiology,

and to decide on appropriate rehabilitation. In

support of this process a recognised protocol is

used, one example of which is the Voice ProfileAnalysis Scheme (VPAS). The VPAS has multiple

parameters that collectively quantify voicing and

connected speech, but the final reduction to voice

grade or category is an expert decision where the

proliferation of parameters can hinder rather than

ed.

mailto:[email protected]. ac.uk

mailto:[email protected]. ac.uk

K. Manickam et al. / Speech Communication 47 (2005) 312–321 313

assist in objectively supporting a final recommen-

dation. However, through auditory perceptual

analysis it is by no means straightforward to objec-

tively evaluate important physiological pheno-

mena such as incomplete vocal fold closure, whichis simply manifested as breathiness. Similar com-

ments arguably apply to jitter, shimmer and irreg-

ularity in larynx opening and closing phases (Titze,

1994). Indeed, there are no well defined reference

standards that an expert can deploy to assist in

their evaluations. This is reflected in the results

from recent studies, which suggest that the expert

analytical approach lacks consistency for monitor-ing the effect of larynx cancer and patient recov-

ery after treatment by radiotherapy (John and

Enderby, 2000; John, 2002).

Radiotherapy inevitably affects the voice

quality of laryngeal cancer patients because it de-

grades vocal fold functionality, most obviously

by inflammatory response to ionising radiation.

The vibration of the vocal folds in patients canbe measured non-invasively by exploiting the

highly correlated phenomenon of trans-larynx

electrical impedance variation, which can be mea-

sured using an electrolaryngograph (Fourcin,

1986). The output from this device is a time series

in the form of an electroglottogram (EGG) wave-

form. This EGG is less complicated than the

acoustic waveform because it is usually free fromvocal tract resonance effects (Fourcin and Ptok,

2003). An earlier study by the authors quantified

the normality of voicing in the frequency domain,

which was then used to successfully differentiate

the healthy population from the specific disease

group of larynx cancer patients (Moore et al.,

2004). However, although severity of voice degra-

dation was indicated, the study did not suggestphysiological causation.

If the electroglottogram was to be more widely

available then it would be expected that the sim-

plicity of the EGG temporal waveform if not its

Fourier processing would be attractive to the ex-

pert SALT. The EGG reflects such features as tim-

bre, through spectral slope, and pitch range, which

are conventionally extracted by Fourier analysis.Given the inverse relationship between the spectral

and time domains, and the evidence for temporal

as well as frequency pattern processing in the audi-

tory neural system (Cheveigne, 2003), the time do-

main deserves further investigation in the interests

of both scientific knowledge and utility in clinical

practice.

The main focus of this study is to reduce thereliance on auditory assessment involving multiple

parameters and to compute a single figure using

approximate entropy to characterise pathological

voicing. It builds on our recent investigations that

were able to concisely characterise voice normal-

ity. Investigating the pathological features from

the raw electroglottogram, in the time domain

rather than the spectral domain, could enhancevoice quantification and potentially be very much

quicker. Here the authors observe that, at least

in their experience, speech and language specialists

are rarely familiar with interpreting the subtleties

of the complete spectral domain pattern. Rather

their experience is with pre-selected features, the

most common being fundamental frequency.

Hence, for the EGG, it is common practice toassess time-series waveforms for small detail that

betray pathology.

2. Background knowledge

Common statistics such as the median, mean

and standard deviation have been widely used invoicing and speech analysis, though they are rarely

recognised as simple variability statistics. For linear

systems e.g., periodic signals, mean and standard

deviations are sufficient. However, more compli-

cated signals require an adaptive method of quanti-

fication. There are a number of candidate statistics

suitable for non-linear applications, such as Shan-

non�s entropy, maximum entropy and approximateentropy, ApEn. The latter is certainly the most

tractable and has been used to good effect in some

earlier studies (Moore et al., 2004). This is an exam-

ple of the little known and recently developed reg-

ularity statistics that have a potential advantage

over variability statistics, since they can explicitly

account for irregularity seen in sequential segments

of data rather than statistics determined from theentire data set without regard to ordering.

ApEn reliably discriminates between regularity

and irregularity in signals, called complexity, and

Fig. 1. Comparison uncomplicated and complex signal. Ordinate: amplitude (X), Abscissa: time (ms).

Fig. 2. (a)–(d) Comparison of different phonation. Left: section of electroglottogram, Ordinate: amplitude (X), Abscissa: time (ms);

Right: entire approximate entropy, Ordinate: complexity, Abscissa: frame.

314 K. Manickam et al. / Speech Communication 47 (2005) 312–321

Fig. 2 (continued)


practical implementation was pioneered for time

series by Pincus (1988). Local pattern seen in the

data itself is used to quantify signal complexity,

which makes the approach highly adaptable. This

clearly suggests that ApEn could be effective in

understanding the way in which disease might per-

turb the voicing pattern in a patient. A low ApEn

indicates a less complicated signal while a largeApEn value indicates more complex signals as

shown in Fig. 1. Phonation consistency and noise,

relating to for example breathiness, creakiness, etc.

should all be amenable to analysis using ApEn as a

single quantifier. Further details of ApEn compu-

tation can be found in Appendix A.

The effect of phonation onset, pauses and also

termination are of clinical interest. Fig. 2a demon-strates that if the vocal folds vibrate quasi-period-

ically with a regular pattern, then the complexity is

low. However, the majority of voice impaired sub-

jects actually halt momentarily during phonation,

which is known clinically as phonation pause,

and is mainly a result of insufficient breathing sup-

port. Some individuals also tend to slow down or

gradually stop if they are able to anticipate the

end of the sought after 4 s of phonation. This is

analogous to the runner who reduces pace just be-fore the finishing line. As phonation approaches

termination the vocal fold functionality reduces,

corresponding EGG features become less complex

and ApEn is lowered. The reverse is true for indi-

viduals who take some time to pick up momentum

for sustained vowel phonation. Hence, slow pho-

nation onset and termination as well as pausing

result in a skewed ApEn distribution. Fig. 2billustrates this phenomenon. Pauses and natural

responses in phonation should have an effect on


measured distributions of ApEn, perhaps leading

to skew. Therefore, somewhat unusually, these

have been included in this study.

Most healthy people would be expected to

develop steady phonation perhaps well within0.5 s or so of an attempt at voicing. However, ear-

lier work suggests that up to 30% of the healthy

population are less than ideal at phonating and

might in fact show compromised phonation, possi-

bly with some evidence of late phonation. In gen-

eral though, the median and mean ApEn statistics

should correspond closely and there should be little

skew in the underlying ApEn distributions. Patho-logical cases, in our case taken from radiotherapy,

should differ markedly. Persistent changes in the

maximum larynx opening height; vocal folds

remaining open for longer than expected or cycle

to cycle variation, e.g., perturbations such as jitter

and shimmer, are much in evidence and should act

to increase ApEn. Even voice changes from

breathy to creaky, or to rough, are reflected as anincrease in ApEn. Fig. 2c is a specific example that

illustrates the effect of tensed, irregular vibration,

perhaps caused by a heavy vibrating mass, and

produces a ‘‘creaky’’ voice. In other subjects, the

maximum laryngeal height can change. This is

demonstrated in Fig. 2d for a patient prior to

radiotherapy (in fact patient G in Fig. 4a), corre-

lates to a large inter-quartile range for ApEn, butthe mean and median ApEn hardly differ. The high

complexity in this instance relates to defective func-

tionality of the larynx during the opening phase. A

closer examination of the EGG, cycle by cycle,

clearly reveals a noisy open phase consistent with

a breathy voice. These specific examples provide

the reader with some feel for the way in which

ApEn might be affected by the pathology.

3. Process

A group of 81 healthy male volunteers were re-

cruited through institutional advertising. Another

group of 38 male larynx cancer patients from the

Christie Hospital were analysed before and afterradiotherapy. An EGG was acquired for each sub-

ject using sensors attached across the thyroid

cartilages and connected to a PC controlled elec-

trolaryngograph under the expert guidance of a

speech and language therapists. Each subject was

asked to phonate the vowel /i/, e.g., ‘‘ee’’ from

the word ‘‘heed’’, for 4 s. The EGG and the acous-

tic signals were recorded and digitised at a sam-pling rate of 20 kHz. The data-files were

transmitted by network to a Pentium-4 PC system

for alphabetically coded and anonymous file nam-

ing, storage, visualisation and analysis using soft-

ware written in scientific language IDL from

Research Systems. The acoustic signals were used

to auditory purposes. The EGG was segmented

into shorter frames of 1000 points. Approximateentropy, was calculated for each data frame of

the EGG time series, based on N = 1000, m = 2

and r = 0.2 · r. Speech and language therapists

(SALTs) categorised subject voice quality, before

radiotherapy and one year after therapy, using

local perceptual protocols, and placing each pa-

tient into one of 7 voice categories. Category zero

(CAT 0) represented entirely normal and categoryseven (CAT 7) severely abnormal.

4. Results and discussion

The median of the ApEn was then computed

from the frames since the distribution of complex-

ities did not conform to normal distribution.

4.1. Healthy male population

The box plot in Fig. 3 shows for healthy males

A–CC. Eighty-three percent of the healthy sub-

jects� exhibited phonation complexity below 0.3.

Eight subjects showed significant skew, which is

the result of delay in achieving phonation onsetand also noisy opening phases. The remaining

17% of the population have median ApEn above

0.3 reflecting variations in their maximum laryn-

geal opening rather than the noisy characteristics

that might be expected from a pathological

population.

4.2. Patient population

ApEn results for patientsA–Z, toAA andAL are

presented as box plots in Fig. 4a for pre-treatment

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

A B C D E F G H I J K L M N O P Q R S T U V W X Y ZA

AA

BA

CA

D AE

AF

AG

AH AI

AJ

AK

AL

AM AN

UPP QUARTILE

MIN

MEAN

MAX

MEDLOW QUARTILE

Com

plex

ity

Healthy Subjects

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

AO AP

AQ

AR

AS

AT

AU

AV

AW AX

AY

AZ

BA

BB

BC

BD BE

BF

BG

BH BI

BJ

BK

BL

BM BN

BO BP

BQ

BR

BS

BT

BU

BV

BW BX

BY

BZ

CA

CB

CC

UPP QUARTILE

MIN

MEAN

MAX

MEDLOW QUARTILE

Com

plex

ity

HealthySubjects

Fig. 3. Box plot representing the complexity values for each frame for healthy male population. The ordinate refers to the complexity

and the abscissa refers to the subjects.


and Fig. 4b for one year after treatment (post-treat-

ment). The ordinate is implicitly arranged in order

of deteriorating categorisation (CAT) by speechand language therapist (SALT), with CAT 0 to

the left and CAT 6 to the right. Double character

coding is applied to the most severely impaired

patients. A patient retained the same alphabetic

code for both pre- and post-treatment evaluation.Voice-impaired subjects have ApEn values that

contrast stronglywith the healthy population. Early

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

A B C D E F G H I J K L M N O P Q R S T U V W X Y ZA

AA

BA

CA

D AE

AF

AG

AH AI

AJ

AK

AL

UPP QUARTILE

MIN

MEAN

MAX

MEDLOW QUARTILE

Patients

Com

plex

ity

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

B D F M O X ZA

F A C G H J K Q AB AE

AH E N P Y

AA

AK I L V W AD S T

AG AI R U AJ

AC AL

UPP QUARTILE

MIN

MEAN

MAX

MEDLOW QUARTILE

Com

plex

ity

Patients

(a)

(b)

Fig. 4. Box plot representing the complexity values for each frame for diseased male population. The ordinate refers to the complexity

and the abscissa refers to the subjects: (a) pre-treatment results, (b) post-treatment results. Ordinate arranged in speech therapists

categorisation.


categories (CAT 0, CAT 1 and CAT 2) and finalcategories (CAT 5&6) correlate well with the

complexity analysis for the pre-treatment cases.

They show substantial variation in the median,

mean, inter-quartile, maximum and minimum

ApEn. Prior to treatment, just 29% lie below the

healthy threshold of ApEn 0.3. This indicates thataround a quarter of patients will be expected to

present within the better voice categories as deter-

mined by ApEn as well as SALT. One year after

treatment this doubles to 59% with ApEn below

0.3 indicating improved over all voicing following

0

0.5

1

1.5

0 0.5 1 1.5

Pos

t tr

eatm

ent

Com

plex

ity

Pre treatmentComplexity

Fig. 5. Graph showing median ApEn of pre-treatment vs. post-

treatment. Ordinate: post-treatment median ApEn, Abscissa:

pre-treatment median ApEn.


radiotherapy. Note that the double character coded

patients, all of whom are in the worst voice catego-

ries as defined by ApEn and SALT prior to treat-

ment, clearly shuttle left towards the better voice

categories. Collectively, the healthy ApEn thresh-old of 0.3 appears to be a valuable measure of

change to normal or improved voicing in its own

right. However, the range and skew of the distribu-

tions, the fine detail if you will, is also of interest.

The majority of members in the healthy male popu-

lation have extremely compact distributions, mea-

sured in terms of ApEn range, and generally very

little skew. Note, however, that the patient popula-tion, pre- and post-treatment, does not show this

compaction or symmetry, even when the median

ApEn reaches the healthy threshold of 0.3. In fact,

just two subjects in the pre-treatment stage, H

and I, and ten subjects from the post-treatment

stage exhibited such phonation characteristics

that are compact and un-skewed. ApEn varia-

tion is, therefore, directly suggestive of pathology,whe-ther due to disease of to the residual effects of

therapy.

It is also noteworthy that speech therapists� sub-jective categorisation for the near normal and

most impaired cases correlates well with the com-

plexity analysis for the pre-treatment. There were

three interesting exceptions in the form of patients

X, Z and AF, who were all perceptually classifiedas CAT 0 one year after treatment but showed

ApEn well above the nominal healthy population

complexity level of 0.3. Despite CAT 0 classifica-

tion all three patients were below the average

(35,000 X) maximum larynx opening height, as

defined from the peak to peak electric impedance

variation seen in the EGG (X (11,875 X), Z

(6520 X) and AF (32,546 X)). All three subjectsalso exhibited problems in open phase. A cycle

by cycle analysis of the EGG revealed that subject

X usually had vocal folds open for less than 35% of

a cycle. Hence, the vocal folds remained closed for

twice as long as the open phase instead of the nor-

mal 50:50 balance expected for normal voicing.

Subject AF did not show any indication of opening

and subject Z showed a vibratory open phase i.e.,noisy opening.

Fig. 5 shows the median ApEn values plotted

post-treatment (ordinate) against pre-treatment.

(abscissa). As the pre-treatment complexity in-

creases, 22 (58%) patients exhibited low complexity

(around 0.3) one year after treatment. The ratio of

pre- to post-treatment complexity below 0.3 is 1:2

and complexity below 0.5 is 2:3. Seven patients

showed an increase in complexity in post-treat-ment. Out of the seven, two patients had extremely

large complexity of 1.5. Four patients had high

complexity estimates for both treatment stages.

Pre- and post-treatment two of these patients were

classified in CAT 2 and CAT 3, and the other two

were classified in severely abnormal CAT 5 and

CAT 6 by speech therapists. Clearly, the CAT 2

and CAT 3 cases are extreme and pleasingly rareexamples of divergence between perceptual and

objective ApEn evaluations. On close examination

of their EGG, both patients demonstrated high

level of noise during their post-treatment stage.

However, the EGG is undeniably noisy in

appearance.

The Wilcoxon rank sum test showed that the

difference in median ApEn values, pre-treatmentto healthy and post-treatment to healthy, allows

either of the two patient populations (pre-treat-

ment and one year after radiotherapy) to be distin-

guished from the normal voicing of the healthy

population itself (P � 0.005), as in Fig. 6. In addi-

tion the healthy, pre-treatment and post-treatment

patient�s median ApEn values were analysed to see

if they represented distinct populations in theirown right using the Kruskal–Wallis rank sum test.

This indicated that the complexity for the healthy

0

1

2

0 0.4 0.8 1.2 1.6

HEALTHYPOSTPRE

Pro

babi

lity

Den

sity

Fun

ctio

n

MEDIAN ApEn

Fig. 6. Graph showing the probability density function of

median ApEn for healthy, pre-treatment and post-treatment

groups. Ordinate: normal distribution, Abscissa: median ApEn.

Legend: HEALTHY—healthy male population, PRE—patient

population before treatment, POST—patient population after

treatment.


group is low with a rank of 56.2; for the pre-treat-ment group with a high rank of 114.7 and one year

after radiotherapy treatment the rank has reduced

to 91.9, suggesting a definite improvement to pa-

tients� voicing one year after radiotherapy, but

not to fully normal voicing levels.

5. Conclusion

This study has shown that tracking EGG com-

plexity in the form of a single metric, ApEn, can be

used to differentiate healthy from radiotherapy

patient populations, either pre- or post-treatment,

and to make a first level assessment of underlying

pathology. The compactness and symmetry of

ApEn variations are also key factors. When ApEnis analysed sequentially, cycle by cycle, it also

appears to be possible to narrow down the under-

lying dysfunction of the vocal folds. Temporal

EGG ApEn quantification, a single figure of merit,

also correlates with speech therapists� categorisa-tion based on multi-parameter subjective evalua-

tion, especially for early (good voicing) and late

(poor voicing) categories. The ApEn results alsoshow that one year after radiotherapy the patients�voicing is improved compared to their pre-treat-

ment situation, but the improvement is not gener-

ally to fully normal levels.

Acknowledgments

This work was supported in part by EPSRC

Grant GR/R04713/01, ‘‘Automated Voice Quality

Monitoring for Differentiating Cancer Therapy,Recovery Patterns and Rehabilitation’’.

We would like to thank Susan Jones, Head of

Speech and Language Therapists in Withington

and Wythenshawe NHS Trusts for recording the

acoustic and electroglottogram signals and assess-

ing the patient�s voice quality.

Appendix A

The mechanics of ApEn computation are rela-

tively simple. A short segment of the temporal

data is selected and similarly patterned segments

are sought and counted, guided by the statistics

of the test series. Segments are usually no more

than data pairs or triplets. The key parametersfor ApEn computation are, N, the number of

points ai in the EGG time series, m, the embedding

dimension and the noise filter, r, which makes

approximate entropy robust. The value r is calcu-lated from k · r where k is an arbitrary value and

r is the standard deviation of the EGG time series

under observation. A typical value for constant kis 0.10–0.25 as suggested by Pincus (Pincus, 1988,1995, 2001; Pincus et al., 1999). The first two data

points from the EGG time series (a1,a2) are

selected as a test template which is compared to

all subsequent data pairs (ax,ay) taken from the

same data frame i.e., (a1,a2), (a2,a3), . . . , (aN�m,

aN�m+1). A match is said to exist when both data

pairs fall within the noise filter margin (�ve r to

+ve r). The total number of matches is recordedin a variable, count1. Immediately following each

paired matching, the test pair is extended from

(a1,a2) to triplet (a1,a2,a3) and this is then com-

pared with a similarly extended matching location,

which now changes from (ax,ay) to (ax,ay,az).

Each time the third elements correlate well with

each other i.e., the third point from the test pair

and the third element from the pair previouslyidentified matched pair which both falls within

the same noise margin, another counting vari-

able, count2, is incremented. The conditional


probability of finding matching third elements at

points with matched pairs is ln(count2/count1)1for the first run. The second test pair (a2,a3) is

then sequentially compared to (a1,a2), (a2,a3), . . .,(aN � m,aN � m+1) to identify additional matchpoints. Similarly the counts for the pairs and

triplets matches are added and the conditional

probability is calculated for the second run.

This procedure is repeated for N � m runs

producing conditional probabilities ln(count2/

count1)N � m+1. The sum of the conditional proba-

bilities is then divided over the total number of

runs, N � m, to yield an average value which isthe Approximate Entropy, ApEn.

References

Cheveigne, A.D., 2003. Time domain auditory processing of

speech. Journal of Phonetics 31, 547–561.

Fourcin, A.J., 1986. Electrolaryngographic assessment of vocal

fold function. Journal of Phonetics 14, 435–442.

Fourcin, A., Ptok, M., 2003. Closing and Opening Phase

Variability in Dysphonia, Proceeding Papers in the Confer-

ence, Advances in Quantitative Laryngology. Voice and

Speech Research, Hamburg.

John, A., 2002. The ROSA Project (reliability of speech

assessment) presented at the Craniofacial Society of Great

Britain and Ireland, Annual Scientific Conference, April

10–12, East Grinstead, UK.

John, A., Enderby, P., 2000. Reliability of speech and language

therapists using therapy outcome measures. International

Journal of Communication Disorder 35 (2), 287–

302.

Moore, C.J., Manickam, K., Willard, T., Jones, S., Slevin, N.,

Shalet, S., 2004. Spectral pattern complexity analysis and

the quantification of voice normality in healthy & radio-

therapy patient groups. Medicine Engineering Physics 26

(4), 291–301.

Pincus, S.M., 1988. Approximate entropy as a measure of

system complexity. Proceedings National Academic Science,

USA (6), 2297–2301.

Pincus, S.M., 1995. Approximate entropy (ApEn) as a com-

plexity measure. Journal of Chaos 5, 110–117.

Pincus, S.M., 2001. Assessing serial irregularity and its impli-

cations for health. In: Proceedings Demography and Epi-

demiology: Frontiers in Population Health and Ageing,

George Town University Centre for Population and Health

Conference.

Pincus, S.M., Hartman, M.L., Roelfsema, F., Thorner, M.O.,

Veldhuis, J.D., 1999. Hormone Pulsatility Discrimination

via Coarse and Short Time Sampling. The American

Physiological Society, pp. E948–E957.

Titze, I.R., 1994. Principles of Voice Production. Prentice Hall,

New Jersey.

Quantifying aberrant phonation using approximate entropy in electrolaryngography

Documents

Transcript of Quantifying aberrant phonation using approximate entropy in electrolaryngography